A Violin Controller for Real-Time Audio Synthesis
Links revised 2004-09-14, 2014-08-21, 2019-07-02.
Videos transcoded 2020-06-19.
(2004 Aug. 28)
(This document is at http://camille-g.com/eviolin.html,
Study #1, 2'30
Study #2, 2'30
1. System Overview
This "violin controller" is used not as a sound-producing device, but as a
gesture-input device. It is no more a violin than a Yamaha synthesizer
is a piano: both take standard gestures as input, both produce nonstandard
sounds as output. (With this disclaimer out of the way, I'll still continue
to call it a violin in what follows.)
The violin's bridge produces a signal with
varying pitch and loudness (depending on all the usual
violinist's bowing and fingering tricks); a computer
measures this changing pitch and loudness and uses these
changing numbers to affect an (entirely different) sound
being simultaneously computed and played.
Another gesture input is spatial: the position and
orientation of violin and bow are measured on the fly,
producing another stream of changing numbers which can modify the sound.
Particular systems used:
1.1 The Jensen violin
Quick facts about my model (quite a variety is available from just
this manufacturer, never mind others like Zeta and Jordan):
General playing observations
- Five strings (violin + C viola)
- No body, so no acoustic feedback problems
- Kun shoulder rest
- Machine tuners near chinrest, not at scroll
at it from all angles in this quicktime-vr image.
- Costs about US$2000.
It weighs a little more than an acoustic violin: the neck is massive, and
the fingerboard doesn't float above the body of the instrument.
The Barbera pickup bridge is not as round as a regular bridge, by default;
combined with the extra string,
this makes single-string bowing a little harder (open strings ring
if you're not careful) and triple-stopping much easier.
Since there is no body to impede the bow, you may want a rounder bridge:
fortunately its shape is adjustable.
With monophonic pitch tracking, ringing open strings are less of a problem;
but if they're loud enough they will still confuse the pitch tracking.
Weaving a folder dollar bill through the strings above the bridge,
and creatively applying foam rubber and elastic bands to the strings,
reduces open-string ringing significantly.
(The bridge can't accomodate conventional mutes.)
The L.R. Baggs preamp is expensive ($100) but definitely worth it.
The spacing between strings feels slightly wider, which affects the shape
of the hand when double- and triple-stopping.
With the machined tuners the Dominant strings holds their tune
quite well, sometimes for several days.
When an acoustic slips off the shoulder a bit, the palm of the left hand
can push the instrument body back up. You can't do this with a
bodyless electric, though. Eric Jensen has three solutions for this:
a rest for the left hand,
a custom shoulder rest with about three times the contact area of a Kun rest,
and a method of securing the violin behind your back with a cord.
1.2 Virtual Sound Server (VSS)
VSS is both the sound synthesis engine and the glue that holds the
whole system together.
For this project I run VSS on a generic Intel Linux box:
an SGI is still too expensive to dedicate to this single project,
and MS Windows has unacceptable latency (130 msec) for real-time audio.
Linux (Red Hat 5.2, with ALSA sound drivers) is fast, cheap, and reliable.
What VSS is and does... that's a long story. Try the
1.3 The "fiddle" pitch tracker
Miller Puckette's published code does polyphonic tracking of pitch and
amplitude. I've seen the algorithm guess pitches incorrectly for
enough cases of double- and triple-stopping that for now I've left it
as monophonic. In this project it runs as part of VSS -- I extricated
it from Puckette's original Pure Data implementation.
I also made a few changes to its behavior:
Tracking latency is around 1/30 second.
- Floating-point output of frequency and loudness replace quantized
- The vibrato model is extended to handle glissandi of arbitrary
size and nearly arbitrary speed.
- Note endings are recognized by reduction of amplitude as well as
changing of pitch.
1.4 The SpacePad motion tracker
Full motion capture requires fancy hardware like this.
This system matches
the demands placed on it by the violin: low weight, low restriction
of mobility, low tracking latency, high accuracy, small size, and
low cost ($1400). For this you get an ISA card, two "receivers" whose
position is measured, manuals and software.
Fancier systems from Ascension, Polhemus and InterSense cost
three to thirty times as much and require much coddling. (Prices in
this industry are not falling much; the SpacePad is 4-year-old technology,
Simpler systems have been designed around mercury switches and
force-sensing resistors (particularly for dancer interfaces) and
radio field sensing (Mathews Radio Drum, MIT projects like the Sensor Chair).
Most recently, micromachined accelerometers show promise as very inexpensive
motion sensors. Dan Trueman has experimented with them in his
Their main disadvantage is that they cannot distinguish between
acceleration and gravity. Dan says, "If you are bent way over and
bowing real fast, who knows what those things are saying...".
If you try to work around it and do the math, you'll find that you need
three separate triaxial accelerometers to track position accurately,
which is actually more expensive than the SpacePad and much more work.
The SpacePad tracks the position and orientation of two sensors
(where they are and where they're pointing). It does so by generating a
time-varying magnetic field from an antenna you're instructed to make
(my prototype portable model is made from 30 feet of speaker wire,
a garbage bag, some dowels and string, and looks like a stunt kite).
The field sensors are little half-inch boxes on the end of 10-foot cables.
One is taped to the bottom of the violin; the other is sewn into a
fingerless glove worn on the violinist's bowing hand.
Ideally, the positions and orientations are measured with 13-bit resolution.
This falls somewhat because the size of the measured performing space is
not adjustable (hey, it's a "budget" system), and because large metallic
objects nearby (furniture, rebar in concrete floors, music stands) distort
the magnetic field.
Accuracy remains well above the 7 bits provided by MIDI, though.
Tracking latency is 1/60 second, half that of video-based systems.
If we track only one sensor, the latency drops to 1/120 second.
The drivers for the ISA card are written
for MS-DOS 3 using an old Borland compiler. Ascension graciously released
the source code to me, and I ported the drivers to Linux using gcc.
Visual feedback is given to the performer with a single-pixel color display,
mounted near the bridge so it is always at the edge of the performer's
field of view. The brightness and full-spectrum hue can be updated with
a latency of 3 to 7 msec, to warn the performer that they are approaching
a boundary in the space (or, in a simpler demonstration, to tell them if
they're playing sharp or flat!). Its location on the instrument itself,
its simplicity (no dials or numbers to read -- even focusing isn't necessary),
and its fast response make it intuitively feel like part of the instrument.
The display is implemented with an array of red, green, and
blue LEDs mounted in a light diffuser and driven by an 8-bit latch from
the Linux PC's parallel port.
1.5 The CAVE motion tracker / audio-visual display
The SpacePad is portable to the concert stage, but I'm also investigating
what a full virtual environment offers. I've implemented an environment
where the violinist can move from place to place to play "into" several
microphones. The microphones are simulated, of course; each mic's pickup
pattern is actually a long cylinder, 2 feet wide, which very slowly moves
around the CAVE. Each mic feeds a filter/echo combination.
As such a dynamic environment is overwhelming at first, it has a
"complexity slider" which gradually introduces features as it is turned up.
When the violinist plays at the tip of the bow, the echoing sounds suddenly
drop out. The complexity slider also controls which sound goes into the
microphones, the natural sound of the violin or a synthesizer controlled
by the violin.
Three technical terms:
These examples proceed from very simple extensions of standard
violin technique through more advanced demonstrations of what is
possible when you put software between the performer's physical input and
the instrument's acoustic output.
- Azimuth is which way something is
pointing in the horizontal plane ("compass heading").
- Elevation is how far up or down something is pointing.
- Roll is how far something has rotated along its axis
(think of an airplane performing a roll).
- The violin's position controls the position of its sound source.
As the violinist walks left and right, the apparent
position of the sound source (the actual output signal from the
violin's bridge) moves correspondingly between two speakers.
This could of course be elaborated in a multi-speaker system, where the
performer's spatial gestures could "fling" sounds around the hall
by pointing the instrument in various directions instead of by
walking around the stage.
- The relative position of bow and violin act as a Theremin.
Pitch is determined by how far the bow is from the violin.
Loudness is determined by the elevation of the bow.
Timbre (filtering of a band-limited square wave) is determined by
the azimuth of the violin.
The bow does not need to touch the strings (just to emphasize the
difference between this and an acoustic violin).
The violinist need not even hold the bow, since the position sensor is
attached to the bow-hand, not the bow itself.
- An FM "lead synth" sound tracking the violin's pitch and amplitude.
Brightness (amount of carrier feedback) is driven by the loudness of the signal.
Upstage/downstage position acts as an "octave switch" to allow playing in
The violin's azimuth controls how much signal goes into several reverbs.
Combining position with azimuth lets you wail out a few low pedal notes
which stick around for half a minute while you noodle around on top of them.
- Rescaled pitch.
The same FM synth sound as before, but instead of an octave switch and
reverb, pitch is remapped. (The octave switch and reverb could have been
left in, at the risk of confusing both performer and listener at this "demo"
The open strings sound as if they are a major ninth apart (two fifths instead of
one). The open D string remains as is; the A above it sounds an E, and the
E sounds the F# in the ledger lines above the treble clef. G and C strings
similarly go down to C in the bass clef and B flat below the bass clef,
Fingering quarter-tones produces sounding semitones.
So in first position, the violinist can reach from low B flat to the G sharp
three and a half octaves above middle C, a range of about six octaves.
- A physical model of a clarinet.
Pitch and amplitude generally track the violin's signal, but
the clarinet's behavior gradually changes as the violinist walks around.
Stage right, vibrato and portamento ("glissando") are possible.
Stage left, pitch is rounded to the nearest well-tempered semitone.
Downstage, dynamics are limited (p to f).
Upstage, dynamics are exaggerated (ppppp to ff).
So you can use just as much control as you need for different passages
in a piece, even varying control within a single phrase or long note.
The violin's signal controls several instruments. As the violinist
walks around, these instruments get louder and softer.
In this demonstration, a bass clarinet, clarinet up one octave,
flute, and flute up two octaves play in the four corners of a square.
As the violinist moves to each corner, that instrument becomes more
The math behind this is a technique called high-dimensional interpolation.
It generalizes to controlling more than four instruments, and to
using more than two parameters (here, latitude and longitude) as control data.
- "Hammond organ" additive synthesis.
Instead of controlling four amplitudes, now the violinist's position
controls the amplitudes of the first ten harmonics of an additive-synthesis
sound. As the harmonics fade in and out, it sounds like an old "drawbar"
This "space" of sound is again controlled by high-dimensional interpolation,
in this case using two parameters to control ten. A dozen different timbres
are placed at different points on the floor, and as the violinist moves
around, the resulting timbre is a smooth average of the timbre-points near it.
The actual timbres were chosen automatically, not manually. I let a "timbre
rover" explore a few thousand timbres of the additive synthesis instrument,
with instructions to choose a dozen timbres that differed enough from each
other to adequately represent the whole space. The definition of "difference"
comes from a psychoacoustic model of the human hearing system (critical
bands, Fletcher-Munson curves, etc.).
(Overly technical description of the timbre rover for the curious:
it takes the Fletcher-Munson corrected amplitude-versus-frequency plot of a sound, divides it into critical bands, and notes the loudness present in each band. Each band is divided into a general noise floor and zero or more spectral peaks. The noise floor is defined as the median loudness of all frequencies in the band. Spectral peaks are defined as local maxima exceeding the noise floor by more than one average deviation, consisting of a frequency, loudness, and width. The distance between two sounds is then defined as the Minkowski metric (p = 5) of the critical band loudnesses.)
The positions of the timbres on the floor were also chosen automatically.
A genetic algorithm finds points in two-space whose distance
approximates (as closely as possible) the estimated difference
between their corresponding sounds.
The motivation behind all this technology is to create a space of timbres
which is meaningful in terms of how we hear, instead of merely what
numbers are going into the synthesis algorithm. Also, to do so
without tedious manual crafting of the space. (Manual polishing after
the fact is possible, though for demonstration purposes I have not
altered the raw results here.)
- The singing violin ("Real Synthesis").
This is a more involved example of controlling a synthesis algorithm,
the CHANT vocal model from IRCAM.
The timbre rover explored nine
parameters: the amplitude, center frequency and bandwidth of each of
three formants. It was told to find 15 representative timbres.
Otherwise this is identical to the previous example.
The gestures possible with moving a violin or other (small) orchestral
instrument around while playing it have certain constraints.
Given these constraints, 4-octave arpeggios are unlikely.
Gestures which take advantage of the full range of motion
will tend to be slow (no shorter than half a second)
and of somewhat coarse resolution. This suggests letting the
instrument "behave" in different ways, corresponding to different
positions. Changes of behavior can be simply acoustic,
as with horns playing "bells up", or it can be functional,
as with the clarinet example above. We can generalize the latter case
to one of trading off fine control for ease of playing along various
dimensions (accuracy of pitch, dynamics, (rhythm?!), timbre).
This allows the performer to "adjust" the instrument
to whatever level of control is required for a particular passage
and for their particular skill level.
- The mass of the instrument prevents the very quick "leaps" which the
unencumbered hand can do (as is the case with strings and
- The lack of tactile feedback about position (and lack of fine visual
feedback, as on a keyboard or fingerboard) makes fine adjustments
difficult, as anyone who's tried to play a Theremin will confirm.
- The range of motion is limited. Some instruments cannot be played
while standing and walking. Those which can, still must deal with the
tether cable; this becomes impractical with a twenty-foot cable menacing
the performer at every step in the later stages of a performance.
Gestures which do not use the full range of motion
are also possible: small taps, bumps, and bobbles in various directions.
These can be quite fast and synchronized precisely with fast gestures
of conventional playing technique.
An elaborate "alphabet" of gestures could be used, by analogy with
computer handwriting recognition, to simulate a large collection of
buttons, sliders, etc.; this would obviously demand more rehearsal
from the performer. But it would be far more practical than an
equivalent array of physical buttons and sliders.
Continuous parameters are more naturally controlled than
discrete parameters, from the continuous nature of physical space.
The performer can "aim" at a value by acoustic feedback alone.
If you want to control discrete parameters, it may be worthwhile to
make discrete the performer's perception of physical space
by use of nonacoustic feedback markers
(visual like the stencilled keyboard of an Ondes Martenot,
or tactile via buzzers taped to the skin).
Tracking the motion of an instrument takes advantage of presently unused
"output bandwidth" from the human organism: the muscles of the legs and trunk.
In this sense it is compatible with most existing playing techniques.
This is obviously the case for handheld instruments; for nonhandheld
instruments such as piano or percussion, the position of the
actual performer can be measured, torso or hands as appropriate
(performer as dancer).
4. Further Reading
- R. Bargar, I. Choi, S. Das, C. Goudeseune, "Model-Based Interactive Sound for an Immersive Virtual Environment." Proc. 1994 Int'l Computer Music Conf. San Francisco: Computer Music Assn., pp. 471-474.
- I. Bowler, P. Manning, A. Purvis, N. Bailey. "On Mapping N Articulation Onto M Synthesizer-Control Parameters." Proc. 1990 Int'l Computer Music Conf. San Francisco: Computer Music Assn., pp. 181-184.
- M. Bromwich, "The Metabone: An interactive sensory control mechanism for virtuoso trombone." Proc. 1997 Int'l Computer Music Conf. San Francisco: Computer Music Assn., pp. 473-475.
- W. Buxton, "There's More to Interaction Than Meets the Eye: Some Issues in Manual Input", in User Centered System Design: New Perspectives on Human-Computer Interaction. Norman and Draper, eds. 1986. Hillsdale, NJ: Erlbaum, pp. 319-337.
- B. Cariou, "The aXi0 midi Controller." Proc. 1994 Int'l Computer Music Conf. San Francisco: Computer Music Assn., pp. 163-166.
- I. Choi, R. Bargar, C. Goudeseune, 1995. "A Manifold Interface for a High Dimensional Control Space." Proc. 1995 Int'l Computer Music Conf. San Francisco: Computer Music Assn., pp. 385-392.
- P. Cook. "A Meta-Wind-Instrument Physical Model, and a Meta-Controller for Real Time Performance Control." Proc. 1992 Int'l Computer Music Conf. San Francisco: Computer Music Assn., pp. 273-276.
- B. Feiten & S. Gunzel, "Distance Measure for the Organization of Sounds". Acustica 78(3): 181-184, April 1993.
- G. Garnett, C. Goudeseune. "Performance Factors in Control of High-Dimensional Spaces". Proc. 1999 Int'l Computer Music Conf. San Francisco: Computer Music Assn.
- L. Haken, E. Tellman, P. Wolfe. "An Indiscrete Music Keyboard." Computer Music Journal 22(1), pp. 30-48, Spring 1998.
- T. Kohonen. Self-Organizing Maps, 2nd ed., Springer, Berlin, 1997.
- C. Langmead, A Theoretical Model of Timbre Perception Based on Morphological Representations of Time-Varying Spectra. M.A. Thesis, Dartmouth College, Hanover, New Hampshire, 1995.
- R. Moog, "Position and force sensors and their application to keyboards and related control devices." Proc. Audio Eng. Soc. 1987. New York: A.E.S., pp. 173-181.
- J. Paradiso. "Electronic Music Interfaces", http://www.spectrum.ieee.org/select/1297/muse.html#s1. (Elaboration of "Electronic Music Interfaces", IEEE Spectrum 34(12), 1997.)
- M. Puckette, T. Apel, D. Zicarelli. "Real-time audio analysis tools for Pd and MSP." Proc. 1998 Int'l Computer Music Conf. San Francisco: Computer Music Assn., pp. 109-112.
- D. Trueman, The Trueman-Cook R-Bow. http://dtrueman.mycpanel.princeton.edu/rbow/.