Image depth


Can anyone offer a technical explanation of how a stereo system recreates image depth? Why are some center images behind the speakers, and others in front of the speakers, for example.
Should there be any depth to a mono recording, or should the image be directly in line with the speakers?
cakids
Your ears/brain use several things for placement of a sound:
  • The difference in arrival time for approximately the same sound between your two ears gives you angular position. This works best between about 120Hz to about 1500Hz, but predominantly below 800Hz. This is that highly accurate timing you hear assigned to the ears/brain, but keep in mind it is phase difference, not absolute timing.
  • Spectral notches due to head shape provide front/back cues, emphasis on cues. It is not perfectly accurate.
  • Most of depth comes from volume cues
  • Volume cues can also give angular position predominantly at high frequencies, predominantly >1500Hz, but again starting at about 1000Hz. One side of your head shields the other side resulting in a level difference based on frequency.
  • Some filtering of frequencies by your torso and pinna provide some level of height cues, but the ear/brain is not great at height detection.

With most recording techniques, even for live music, most of the angular position information is lost. What you perceive in the recording is artificial, put their by the recording engineer.

There are microphone techniques, both with torso/head simulators and stereo microphones that can capture or theoretically capture the differential timing of what a human would hear. Big However, being able to use that on playback is pretty much limited to headphones, though you may get lucky with speaker placement and the odd recording and extract some of that.

Height information is highly specific to individuals, so capturing it on 2 channel is pretty much impossible, and actually not done.
So what does this mean? Most of the sound-stage, imaging, etc. even in live recordings indicative of the recording and far more influenced by the recording/mixing process. That that end, most of what audiophiles described is "simulated" especially height.

It also means when someone says wall-wall sound-stage, that is probably either hyperbole, or a pleasant, but highly inaccurate representation of the music.
What I’m getting from reading about recording techniques, is that godd recording techniques will fool the ear-brain location finding function to create an approximate aural image, but does not actually duplicate the exact sonic signature (amplitude, phase) that would impinge on the ears during a live performance.


Right. It doesn’t have to be the exact same information. It just has to be enough of the essential information.

The (highly recommended) XLO Test CD has a wonderful track where Roger Skoff simply describes the room dimensions and microphone placement and where is in the room. As he’s talking you realize what you are hearing is exactly as if you are there in the room. Not your room, his. He walks and talks and occasionally hits a clavis (wood block) letting you hear the acoustic signature of the room. At one point he walks to the extreme back of the room and hits the clavis. It sounds as if he is behind you.

How technically is it possible for two speakers in front of you to create the illusion not only of depth front to back but even behind?

The answer is the sound from further away, its not just that there’s a time difference between the direct and reflected sounds, its that there is also a frequency and volume difference as well. Everything about the sound changes depending on where in the recording space the sound originates. That is the lesson of the XLO track. You’d never hear him behind you if speakers were needed back there. Instead all that’s needed is to faithfully reproduce in detail the full acoustic signature of the event. Especially including the faint room echoes.

The speakers must of course be symmetrical and equidistant. Matters more L to R than front to back. I’m simplifying, obviously. But really, its the spacial information captured in the acoustic signature that does the trick more than anything else.
While room echoes can help in locating, much of what you experienced is psycho-acoustic (suggestion, coupled with familiarity) unless they used a proper torso simulator for the capture and if my memory serves, that was not done for this test recording.


You’d never hear him behind you if speakers were needed back there. Instead all that’s needed is to faithfully reproduce in detail the full acoustic signature of the event. Especially including the faint room echoes.

"Then one "BIG" step further is to remove the back wall from in between the speakers like I did, leaving a little 1-2mt behind each speaker for bass loading"

Awesome George!
Thank you all for filling in a lot of holes in my understanding. I can remember my first experience of depth and soundstage size. It was in a small hotel room at a NY show years ago. The walls disappeared and there was a live orchestra stretching about 40 or 50 feet behind the speakers. Happened to be Swans speakers and Boulder electronics. I know that images can be behind or in front of the speakers, but didn’t have quite as clear a technical understanding of what made it happen.