A Violin Controller for Real-Time Audio Synthesis

Camille Goudeseune
1999-04-20
Revised 2000-08-23. Links revised 2004-09-14, 2014-08-21, 2019-07-02, 2023-11-02. Videos transcoded 2020-06-19.

(2004 Aug. 28)

(This document is at http://camille-g.com/eviolin.html,
formerly http://zx81.isl.uiuc.edu/camilleg/eviolin.html,
formerly http://zx81.ncsa.uiuc.edu/camilleg/eviolin.html.)

Videos:
Study #1, 2'30
Study #2, 2'30

1. System Overview

This "violin controller" is used not as a sound-producing device, but as a gesture-input device. It is no more a violin than a Yamaha synthesizer is a piano: both take standard gestures as input, both produce nonstandard sounds as output. (With this disclaimer out of the way, I'll still continue to call it a violin in what follows.)

The violin's bridge produces a signal with varying pitch and loudness (depending on all the usual violinist's bowing and fingering tricks); a computer measures this changing pitch and loudness and uses these changing numbers to affect an (entirely different) sound being simultaneously computed and played.

Another gesture input is spatial: the position and orientation of violin and bow are measured on the fly, producing another stream of changing numbers which can modify the sound.

Particular systems used:

Jensen 5-string electric violin (see below)
Off-the-shelf Linux PC
Virtual Sound Server software
Miller Puckette's "fiddle~" real-time pitch tracker (in ICMC '98 Proceedings)
Ascension Technologies SpacePad motion tracker; also, NCSA's CAVE virtual environment.

1.1 The Jensen violin

Quick facts about my model (quite a variety is available from just this manufacturer, never mind others like Zeta and Jordan):

Five strings (violin + C viola)
No body, so no acoustic feedback problems
Kun shoulder rest
Machine tuners near chinrest, not at scroll
Look at it from all angles in this quicktime-vr image.
Costs about US$2000.

General playing observations

It weighs a little more than an acoustic violin: the neck is massive, and the fingerboard doesn't float above the body of the instrument.
The Barbera pickup bridge is not as round as a regular bridge, by default; combined with the extra string, this makes single-string bowing a little harder (open strings ring if you're not careful) and triple-stopping much easier. Since there is no body to impede the bow, you may want a rounder bridge: fortunately its shape is adjustable.
With monophonic pitch tracking, ringing open strings are less of a problem; but if they're loud enough they will still confuse the pitch tracking. Weaving a folder dollar bill through the strings above the bridge, and creatively applying foam rubber and elastic bands to the strings, reduces open-string ringing significantly. (The bridge can't accomodate conventional mutes.)
The L.R. Baggs preamp is expensive ($100) but definitely worth it.
The spacing between strings feels slightly wider, which affects the shape of the hand when double- and triple-stopping.
With the machined tuners the Dominant strings holds their tune quite well, sometimes for several days.
When an acoustic slips off the shoulder a bit, the palm of the left hand can push the instrument body back up. You can't do this with a bodyless electric, though. Eric Jensen has three solutions for this: a rest for the left hand, a custom shoulder rest with about three times the contact area of a Kun rest, and a method of securing the violin behind your back with a cord.

1.2 Virtual Sound Server (VSS)

VSS is both the sound synthesis engine and the glue that holds the whole system together.

For this project I run VSS on a generic Intel Linux box: an SGI is still too expensive to dedicate to this single project, and MS Windows has unacceptable latency (130 msec) for real-time audio. Linux (Red Hat 5.2, with ALSA sound drivers) is fast, cheap, and reliable.

What VSS is and does... that's a long story. Try the tutorial or the reference manual.

1.3 The "fiddle" pitch tracker

Miller Puckette's published code does polyphonic tracking of pitch and amplitude. I've seen the algorithm guess pitches incorrectly for enough cases of double- and triple-stopping that for now I've left it as monophonic. In this project it runs as part of VSS -- I extricated it from Puckette's original Pure Data implementation.

I also made a few changes to its behavior:

Floating-point output of frequency and loudness replace quantized MIDI values.
The vibrato model is extended to handle glissandi of arbitrary size and nearly arbitrary speed.
Note endings are recognized by reduction of amplitude as well as changing of pitch.

Tracking latency is around 1/30 second.

1.4 The SpacePad motion tracker

Full motion capture requires fancy hardware like this. This system matches the demands placed on it by the violin: low weight, low restriction of mobility, low tracking latency, high accuracy, small size, and low cost ($1400). For this you get an ISA card, two "receivers" whose position is measured, manuals and software.

Fancier systems from Ascension, Polhemus and InterSense cost three to thirty times as much and require much coddling. (Prices in this industry are not falling much; the SpacePad is 4-year-old technology, in fact.)

Simpler systems have been designed around mercury switches and force-sensing resistors (particularly for dancer interfaces) and radio field sensing (Mathews Radio Drum, MIT projects like the Sensor Chair). Most recently, micromachined accelerometers show promise as very inexpensive motion sensors. Dan Trueman has experimented with them in his R-Bow. Their main disadvantage is that they cannot distinguish between acceleration and gravity. Dan says, "If you are bent way over and bowing real fast, who knows what those things are saying...". If you try to work around it and do the math, you'll find that you need three separate triaxial accelerometers to track position accurately, which is actually more expensive than the SpacePad and much more work.

The SpacePad tracks the position and orientation of two sensors (where they are and where they're pointing). It does so by generating a time-varying magnetic field from an antenna you're instructed to make (my prototype portable model is made from 30 feet of speaker wire, a garbage bag, some dowels and string, and looks like a stunt kite). The field sensors are little half-inch boxes on the end of 10-foot cables. One is taped to the bottom of the violin; the other is sewn into a fingerless glove worn on the violinist's bowing hand.

Ideally, the positions and orientations are measured with 13-bit resolution. This falls somewhat because the size of the measured performing space is not adjustable (hey, it's a "budget" system), and because large metallic objects nearby (furniture, rebar in concrete floors, music stands) distort the magnetic field. Accuracy remains well above the 7 bits provided by MIDI, though.

Tracking latency is 1/60 second, half that of video-based systems. If we track only one sensor, the latency drops to 1/120 second.

The drivers for the ISA card are written for MS-DOS 3 using an old Borland compiler. Ascension graciously released the source code to me, and I ported the drivers to Linux using gcc.

Visual feedback is given to the performer with a single-pixel color display, mounted near the bridge so it is always at the edge of the performer's field of view. The brightness and full-spectrum hue can be updated with a latency of 3 to 7 msec, to warn the performer that they are approaching a boundary in the space (or, in a simpler demonstration, to tell them if they're playing sharp or flat!). Its location on the instrument itself, its simplicity (no dials or numbers to read -- even focusing isn't necessary), and its fast response make it intuitively feel like part of the instrument. The display is implemented with an array of red, green, and blue LEDs mounted in a light diffuser and driven by an 8-bit latch from the Linux PC's parallel port.

1.5 The CAVE motion tracker / audio-visual display

The SpacePad is portable to the concert stage, but I'm also investigating what a full virtual environment offers. I've implemented an environment where the violinist can move from place to place to play "into" several microphones. The microphones are simulated, of course; each mic's pickup pattern is actually a long cylinder, 2 feet wide, which very slowly moves around the CAVE. Each mic feeds a filter/echo combination.

As such a dynamic environment is overwhelming at first, it has a "complexity slider" which gradually introduces features as it is turned up. When the violinist plays at the tip of the bow, the echoing sounds suddenly drop out. The complexity slider also controls which sound goes into the microphones, the natural sound of the violin or a synthesizer controlled by the violin.

2. Demonstrations

Three technical terms:

Azimuth is which way something is pointing in the horizontal plane ("compass heading").
Elevation is how far up or down something is pointing.
Roll is how far something has rotated along its axis (think of an airplane performing a roll).

These examples proceed from very simple extensions of standard violin technique through more advanced demonstrations of what is possible when you put software between the performer's physical input and the instrument's acoustic output.

The violin's position controls the position of its sound source.
As the violinist walks left and right, the apparent position of the sound source (the actual output signal from the violin's bridge) moves correspondingly between two speakers.
This could of course be elaborated in a multi-speaker system, where the performer's spatial gestures could "fling" sounds around the hall by pointing the instrument in various directions instead of by walking around the stage.
The relative position of bow and violin act as a Theremin.
Pitch is determined by how far the bow is from the violin.
Loudness is determined by the elevation of the bow.
Timbre (filtering of a band-limited square wave) is determined by the azimuth of the violin.
The bow does not need to touch the strings (just to emphasize the difference between this and an acoustic violin). The violinist need not even hold the bow, since the position sensor is attached to the bow-hand, not the bow itself.
An FM "lead synth" sound tracking the violin's pitch and amplitude.
Brightness (amount of carrier feedback) is driven by the loudness of the signal.
Upstage/downstage position acts as an "octave switch" to allow playing in different registers.
The violin's azimuth controls how much signal goes into several reverbs.
Combining position with azimuth lets you wail out a few low pedal notes which stick around for half a minute while you noodle around on top of them.
Rescaled pitch.
The same FM synth sound as before, but instead of an octave switch and reverb, pitch is remapped. (The octave switch and reverb could have been left in, at the risk of confusing both performer and listener at this "demo" stage.)
The open strings sound as if they are a major ninth apart (two fifths instead of one). The open D string remains as is; the A above it sounds an E, and the E sounds the F# in the ledger lines above the treble clef. G and C strings similarly go down to C in the bass clef and B flat below the bass clef, respectively.
Fingering quarter-tones produces sounding semitones.
So in first position, the violinist can reach from low B flat to the G sharp three and a half octaves above middle C, a range of about six octaves.
A physical model of a clarinet.
Pitch and amplitude generally track the violin's signal, but the clarinet's behavior gradually changes as the violinist walks around.
Stage right, vibrato and portamento ("glissando") are possible.
Stage left, pitch is rounded to the nearest well-tempered semitone.
Downstage, dynamics are limited (p to f).
Upstage, dynamics are exaggerated (ppppp to ff).
So you can use just as much control as you need for different passages in a piece, even varying control within a single phrase or long note.
Orchestration.
The violin's signal controls several instruments. As the violinist walks around, these instruments get louder and softer.
In this demonstration, a bass clarinet, clarinet up one octave, flute, and flute up two octaves play in the four corners of a square. As the violinist moves to each corner, that instrument becomes more prominent.
The math behind this is a technique called high-dimensional interpolation. It generalizes to controlling more than four instruments, and to using more than two parameters (here, latitude and longitude) as control data.
"Hammond organ" additive synthesis.
Instead of controlling four amplitudes, now the violinist's position controls the amplitudes of the first ten harmonics of an additive-synthesis sound. As the harmonics fade in and out, it sounds like an old "drawbar" electric organ.
This "space" of sound is again controlled by high-dimensional interpolation, in this case using two parameters to control ten. A dozen different timbres are placed at different points on the floor, and as the violinist moves around, the resulting timbre is a smooth average of the timbre-points near it.
The actual timbres were chosen automatically, not manually. I let a "timbre rover" explore a few thousand timbres of the additive synthesis instrument, with instructions to choose a dozen timbres that differed enough from each other to adequately represent the whole space. The definition of "difference" comes from a psychoacoustic model of the human hearing system (critical bands, Fletcher-Munson curves, etc.).
(Overly technical description of the timbre rover for the curious: it takes the Fletcher-Munson corrected amplitude-versus-frequency plot of a sound, divides it into critical bands, and notes the loudness present in each band. Each band is divided into a general noise floor and zero or more spectral peaks. The noise floor is defined as the median loudness of all frequencies in the band. Spectral peaks are defined as local maxima exceeding the noise floor by more than one average deviation, consisting of a frequency, loudness, and width. The distance between two sounds is then defined as the Minkowski metric (p = 5) of the critical band loudnesses.)
The positions of the timbres on the floor were also chosen automatically. A genetic algorithm finds points in two-space whose distance approximates (as closely as possible) the estimated difference between their corresponding sounds.
The motivation behind all this technology is to create a space of timbres which is meaningful in terms of how we hear, instead of merely what numbers are going into the synthesis algorithm. Also, to do so without tedious manual crafting of the space. (Manual polishing after the fact is possible, though for demonstration purposes I have not altered the raw results here.)
The singing violin ("Real Synthesis").
This is a more involved example of controlling a synthesis algorithm, the CHANT vocal model from IRCAM.
The timbre rover explored nine parameters: the amplitude, center frequency and bandwidth of each of three formants. It was told to find 15 representative timbres. Otherwise this is identical to the previous example.

3. Discussion
The gestures possible with moving a violin or other (small) orchestral instrument around while playing it have certain constraints.
- The mass of the instrument prevents the very quick "leaps" which the unencumbered hand can do (as is the case with strings and keyboard instruments).
- The lack of tactile feedback about position (and lack of fine visual feedback, as on a keyboard or fingerboard) makes fine adjustments difficult, as anyone who's tried to play a Theremin will confirm.
- The range of motion is limited. Some instruments cannot be played while standing and walking. Those which can, still must deal with the tether cable; this becomes impractical with a twenty-foot cable menacing the performer at every step in the later stages of a performance.
Given these constraints, 4-octave arpeggios are unlikely. Gestures which take advantage of the full range of motion will tend to be slow (no shorter than half a second) and of somewhat coarse resolution. This suggests letting the instrument "behave" in different ways, corresponding to different positions. Changes of behavior can be simply acoustic, as with horns playing "bells up", or it can be functional, as with the clarinet example above. We can generalize the latter case to one of trading off fine control for ease of playing along various dimensions (accuracy of pitch, dynamics, (rhythm?!), timbre). This allows the performer to "adjust" the instrument to whatever level of control is required for a particular passage and for their particular skill level.
Gestures which do not use the full range of motion are also possible: small taps, bumps, and bobbles in various directions. These can be quite fast and synchronized precisely with fast gestures of conventional playing technique. An elaborate "alphabet" of gestures could be used, by analogy with computer handwriting recognition, to simulate a large collection of buttons, sliders, etc.; this would obviously demand more rehearsal from the performer. But it would be far more practical than an equivalent array of physical buttons and sliders.
Continuous parameters are more naturally controlled than discrete parameters, from the continuous nature of physical space. The performer can "aim" at a value by acoustic feedback alone. If you want to control discrete parameters, it may be worthwhile to make discrete the performer's perception of physical space by use of nonacoustic feedback markers (visual like the stencilled keyboard of an Ondes Martenot, or tactile via buzzers taped to the skin).
Tracking the motion of an instrument takes advantage of presently unused "output bandwidth" from the human organism: the muscles of the legs and trunk. In this sense it is compatible with most existing playing techniques. This is obviously the case for handheld instruments; for nonhandheld instruments such as piano or percussion, the position of the actual performer can be measured, torso or hands as appropriate (performer as dancer).

4. Further Reading
- R. Bargar, I. Choi, S. Das, C. Goudeseune, "Model-Based Interactive Sound for an Immersive Virtual Environment." Proc. 1994 Int'l Computer Music Conf. San Francisco: Computer Music Assn., pp. 471-474.
- I. Bowler, P. Manning, A. Purvis, N. Bailey. "On Mapping N Articulation Onto M Synthesizer-Control Parameters." Proc. 1990 Int'l Computer Music Conf. San Francisco: Computer Music Assn., pp. 181-184.
- M. Bromwich, "The Metabone: An interactive sensory control mechanism for virtuoso trombone." Proc. 1997 Int'l Computer Music Conf. San Francisco: Computer Music Assn., pp. 473-475.
- W. Buxton, "There's More to Interaction Than Meets the Eye: Some Issues in Manual Input", in User Centered System Design: New Perspectives on Human-Computer Interaction. Norman and Draper, eds. 1986. Hillsdale, NJ: Erlbaum, pp. 319-337.
- B. Cariou, "The aXi0 midi Controller." Proc. 1994 Int'l Computer Music Conf. San Francisco: Computer Music Assn., pp. 163-166.
- I. Choi, R. Bargar, C. Goudeseune, 1995. "A Manifold Interface for a High Dimensional Control Space." Proc. 1995 Int'l Computer Music Conf. San Francisco: Computer Music Assn., pp. 385-392.
- P. Cook. "A Meta-Wind-Instrument Physical Model, and a Meta-Controller for Real Time Performance Control." Proc. 1992 Int'l Computer Music Conf. San Francisco: Computer Music Assn., pp. 273-276.
- B. Feiten & S. Gunzel, "Distance Measure for the Organization of Sounds". Acustica 78(3): 181-184, April 1993.
- G. Garnett, C. Goudeseune. "Performance Factors in Control of High-Dimensional Spaces". Proc. 1999 Int'l Computer Music Conf. San Francisco: Computer Music Assn.
- L. Haken, E. Tellman, P. Wolfe. "An Indiscrete Music Keyboard." Computer Music Journal 22(1), pp. 30-48, Spring 1998.
- T. Kohonen. Self-Organizing Maps, 2nd ed., Springer, Berlin, 1997.
- C. Langmead, A Theoretical Model of Timbre Perception Based on Morphological Representations of Time-Varying Spectra. M.A. Thesis, Dartmouth College, Hanover, New Hampshire, 1995.
- R. Moog, "Position and force sensors and their application to keyboards and related control devices." Proc. Audio Eng. Soc. 1987. New York: A.E.S., pp. 173-181.
- J. Paradiso. "Electronic Music Interfaces", http://www.spectrum.ieee.org/select/1297/muse.html#s1. (Elaboration of "Electronic Music Interfaces", IEEE Spectrum 34(12), 1997.)
- M. Puckette, T. Apel, D. Zicarelli. "Real-time audio analysis tools for Pd and MSP." Proc. 1998 Int'l Computer Music Conf. San Francisco: Computer Music Assn., pp. 109-112.
- D. Trueman, The Trueman-Cook R-Bow. http://dtrueman.mycpanel.princeton.edu/rbow/.