Electrical/mechanical representation of instruments and space


Help, I'm stuck at the juncture of physics, mechanics, electricity, psycho-acoustics, and the magic of music.

I understand that the distinctive sound of a note played by an instrument consists of a fundamental frequency plus a particular combination of overtones in varying amplitudes and the combination can be graphed as a particular, nuanced  two-dimensional waveform shape.  Then you add a second instrument playing, say, a third above the note of the other instrument, and it's unique waveform shape represents that instrument's sound.  When I'm in the room with both instruments, I hear two instruments because my ear (rather two ears, separated by the width of my head) can discern that there are two sound sources.  But let's think about recording those sounds with a single microphone.  The microphone's diaphragm moves and converts changes in air pressure to an electrical signal.  The microphone is hearing a single set of air pressure changes, consisting of a single, combined wave from both instruments.  And the air pressure changes occur in two domains, frequency and amplitude (sure, it's a very complicated interaction, but still capable of being graphed in two dimensions). Now we record the sound, converting it to electrical energy, stored in some analog or digital format.  Next, we play it back, converting the stored information to electrical and then mechanical energy, manipulating the air pressure in my listening room (let's play it in mono from a single full-range speaker for simplicity).  How can a single waveform, emanating from a single point source, convey the sound of two instruments, maybe even in a convincing 3D space?  The speaker conveys amplitude and frequency only, right?  So, what is it about amplitude or frequency that carries spatial information for two instruments/sound sources?  And of course, that is the simplest example I can design.  How does a single mechanical system, transmitting only variations in amplitude and frequency, convey an entire orchestra and choir as separate sound sources, each with it's unique tonal character?  And then add to that the waveforms of reflected sounds that create a sense of space and position for each of the many sound sources?

77jovian
Still thinking about this.  Let me give another example. 

Like a synthesizer, you could combine a series of pure, sinusoidal, tones in a particular combination of fundamental and overtones with varying amplitudes and fine-tune it until it sounds like a flute.  Why do we hear the sound of a flute instead a bunch of sine waves at varying frequencies and amplitudes?  Why do we hear the whole instead of the components?

To reverse the example, is there a single whole sound that is the sound of the components of an orchestra, if you get my meaning?  Why do we hear the single sound as the sound of many instruments?
Hi 77jovian,

First, kudos on your thoughtful question.

IMO, though, the answer is fairly simple. When we listen to an orchestra, or some other combination of instruments and/or vocalists, what our hearing mechanisms are in fact hearing is a combination of sine waves and broadband sounds ("broadband sounds" being a combination of a vast number of sine waves), both of which of course vary widely from instant to instant in terms of their amplitudes, timings, and phase relationships.

So to the extent that the recording and reproduction chains are accurate, what is reproduced by the speakers corresponds to those combinations of sine waves and broadband sounds, and our hearing mechanisms respond similarly to how they would respond when listening to live music.

Best regards,
-- Al
 
I think it’s an excellent question and actually a question that has a whole lot to do with the questions I’ve been asking on another thread: What is the audio signal in the system prior to the point where the speakers produce the acoustic waveform of the entire orchestra? AND how do better speaker cables, better power cords better fuses, vibration isolation affect the “audio signal,” whatever it is.
The microphone acts no different than your ear drum or speaker cone. How a single cone produces overtones is simple. Say you are listening to a 20 hz tone. The speaker cone moves back and forth 20 times a second. For a 100 hz tone it’s a hundred times a second.

What about the two tones played at the same time to produce a different sound? As the cone moves forward 20 times per second, it also moves back and forth 100 times per second. (Wave your hand back and forth with your arm still. Then move your arm while your are waving your hand. Then walk forward while waving your hand and moving your arm. The air displaced is a pressure representation of the combined motion).

The combination of pressure waves creates one wave at any point in time and over a certain time period it contains all the other waves (overtones) creating a particular sound. Add them all together as a function of time and there is your orchestra in your living room.