"They are here" vs. "You are there"


Sometimes a system sounds like "they are here." That is, it sounds like the performance is taking place IN YOUR LISTENING ROOM.

Sometimes a system sounds like "you are there." That is, it sounds like you have been transported to SOME OTHER ACOUSTICAL SPACE where the performance is taking place.

Two questions for folks:

1. Do you prefer the experience of "they are here" or "you are there"?

2. What characteristics of recordings, equipment, and listening rooms account for the differences in the sound of "they are here" vs. "you are there"?
bryoncunningham
Thanks Bryon. Yes, you interpreted my point no. 3 as I intended it, that the inclusion of what was direct sound in the recording space in an omnidirectional listening space presentation represents a significant inaccuracy, which must be traded off against the benefits of the omnidirectional listening space presentation.

The reason you didn't see no. 3 previously is simple -- it wasn't there when you started composing your previous response :-). As I mentioned in my subsequent post, I added it in sometime after initially submitting the post to which it was added, and by that time you were obviously working on your response (as shown by the fact that you referred to the headphone part of my post as item 3, rather than item 4 which it subsequently became).

I suppose that the bottom line in the tradeoff we are referring to comes down to matters of degree, which in turn are dependent on the speakers, the constraints imposed by the particular listening space, the types of recordings that are listened to, and the preferences of the listener.

The only exception I would take with respect to your last post would be the statement that:
I believe that headphones (in the absence of a binaural recordings) illustrate that, in that headphones will give you the most ACCURATE sound of the ambient cues of the recording, but not an OMNIDIRECTIONAL presentation of those cues.
I am doubtful that on non-binaural recordings headphones can be said to give an accurate reproduction of ambient cues, or anything else, because of the fact that they bypass the pinnae, and inject the sound from the sides instead of from the front. Although of course they can be extremely revealing and analytical. And once again a tradeoff is involved, because their accuracy (in the sense that you are using the term here) is aided by the absence of room effects.

Best regards,
-- Al
Hello all, another interesting thread from Bryon. As a sidebar, after years of casual listening to quite decent Sennheiser HE60 electrostatics through the stock head amp, I recently had an opportunity to hear an all-out custom tube head amp driving current top-model Sennheiser dynamic headphones. For the first time I think I "got it" regarding what headphones can achieve in terms of disintermediating electronics and room affects from the music. The key insight was that I had never heard a headphone set-up that approached my regular stereo in quality. Most audiophiles outside of the "head-case" niche are likely in the same boat. IME the gain in detail and separation out-weighed the loss of natural acoustic space.

On a lark I set about modifying my Headroom line-level processor to the point where I felt that the advantage of the crossfeed process was not off-set by degradations in the electronics that had relegated this unit to storage for some years. Briefly, crossfeed has the effect of shifting forward and tightening images that in normal listening appear furthest to the side and rear. Only images that are way out to the side and rear seem affected. The effect is to make a headphone "sound stage" analogous to the experience of a conventional listening room. So I am inclined to agree with Al, that this more forward sound stage is natural, with the caveat that a process like crossfeed can spook the ear into hearing a natural sense of the room acoustic, while preserving the advantage that headphones have in being unfettered by reflected sound. As with so much in hi-fi, its all about the implementation.
Bryon, regarding your recent post on ambience cues, directionality and listening rooms, I think you may be overlooking some aspects of what is going on with respect to the cues in the recording versus the cues from the listening room.

Consider doing the playback in exactly the same space as the recording. You set up the speakers and the equipment to optimally reproduce the soundstage, and put the listener in the position of the microphone that recorded the performance. Thus, your listening space exactly reproduces the recording space. Is this the optimal space for creating the “you are there” experience? I don’t think so, but it illustrates some issues:

1) Consider a single drum hit. From the optimal listening position, the stereo effect tells you that there is a drum set on the stage, left of center. What does the wall directly to the right of the speakers see? It sees two sources (the left and right speakers), separated in time by the distance between the speakers. The reflections along the wall will see a delay between the two sources that varies something like the sine of the takeoff angle. The same for the left wall, other objects in the room, etc. This effect does not exist in the original performance. These echoes come to your ears as something other than what the single source on the recording produced. Let’s call it “source distortion.”

2) Now let’s replace the pair of speakers with a single speaker in the position of the drum set. The drum hit now behaves as a single source: the direct wave travels from the speaker to the listener as it should, and then hits (say) the back wall and comes back to the listener at exactly the same time as the echo in the recording gets to the listener as a direct wave. Thus, you have achieved your goal of reinforcing the primary cue. But the recorded echo itself then travels to the rear wall and comes back to the listener as a secondary echo that did not exist in the original performance. Let’s call this “echo distortion.”

3) Of course, your room is not exactly the configuration of the recording room, so on top of #1 and #2, you hear your primary room echo and the echo on the recording at different times. Let’s call this “temporal distortion.”

In general, to get ambience cues on the recording to be omnidirectional in your listening space, you would have a) primary echoes from your listening room that were stronger than the secondary recorded echoes, and thus dominant, b) recorded ambience cues reflected by your room that arrived at your ears too late (i.e., the reflected ambience cues will be out of sync with the directly radiated (from the speakers) ambience cues), and c) many of the reflections suffering from source distortion.

I see this as a continuum. If you succeed in recreating a recording space perfectly, you get source and echo distortion with it. If your space is some average of the spaces you prefer (say, a generic jazz club), or you listen to recordings recorded in more than one place, you’ll also get temporal distortion. If you manage to suppress echo and temporal distortion (or the recording has weak ambience cues), then the direct echoes from your room will dominate, and you’ll actually get a “they are here” effect, rather than the desired “you are there” effect. If you suppress your room so that the recorded cues dominate, you get “you are there” cues but they’ll be bidirectional (but only if the recording has sufficient cues -- if it doesn’t you may get a somewhat dead or recording studio sound).

So you have a range of recordings (from heavy cues to none), and a range of rooms (from live to dead), but it doesn’t seem possible to have an optimal room for both ends of the spectrum (which I think you’ve said), and it doesn’t seem possible to get time/phase correct omnidirectional ambience cues that aren’t dominated by your room, rather than the recording (short of electronic intervention, which you and Learsfool have said is not desirable).

To sum up, I think to the extent that you succeed in making the ambience cues from the recording omnidirectional, they’ll be mis-timed, out of phase, and probably polarity flipped. And that is on top of all of the very strong room cues that you will necessarily generate to get the recorded cues to be omnidirectional. Or, to put it another way, I don’t think it is possible to get the recorded cues to be omnidirectional without seriously compromising the “you are there” effect.

So, my theory:
1) Strong recorded cues + live room = a mess tending toward “they are here”
2) Strong recorded cues + dead room = “your are there” but bidirectional cues
3) Weak recorded cues + live room = “they are here” but if the room is sufficiently like the recording space, you approximate “you are there” for that space
4) Weak recorded cues + dead room = “they are here” (or in a studio)

All of this comes with the caveat that what I say may be true for certain kinds of cues and not others.
...I composed my response, above, before seeing Bryon's most recent post. But I think everything still stands.
I am doubtful that on non-binaural recordings headphones can be said to give an accurate reproduction of ambient cues, or anything else, because of the fact that they bypass the pinnae, and inject the sound from the sides instead of from the front.

This is a good point, Al. I should have chosen an anechoic chamber rather than headphones to illustrate my view that omnidirectional ambient cues are more valuable than strictly accurate ambient cues for creating the illusion that “you are there.”

On a side note, how do you edit a post after it’s been posted? Can you give me a link to instructions here on A’gon?

Dgarretson – Glad that you joined in. I had never heard of the crossfeeding process you describe. I would love to hear it some day. Connecting it to this discussion, I would say that, like binaural recordings, it once again illustrates the importance of the DIRECTIONALITY of ambient cues for creating the illusion that “you are there.”

Cbw – Wow! A lot of great thoughts and insights.

Consider doing the playback in exactly the same space as the recording. You set up the speakers and the equipment to optimally reproduce the soundstage, and put the listener in the position of the microphone that recorded the performance. Thus, your listening space exactly reproduces the recording space. Is this the optimal space for creating the “you are there” experience? I don’t think so…

The real goal in this approach is not to PHYSICALLY replicate the recording space, but rather to approximate it in some important ACOUSTICAL parameters, including: relative balance of direct and indirect sound, relative balance of reflected/diffused/absorbed sound, time delay of first indirect sound, reverberation time, and so on. Optimizing these acoustical parameters of the listening space so that they have values that approximate those of the recording space is the kind of “resemblance” I have in mind. I should probably drop the word “resemblance” altogether from this discussion, because it does conjure up images of physical likeness. I should stick to words like “emulate,” to avoid the idea that this approach is about PHYSICAL resemblance. It’s not. It’s about ACOUSTICAL resemblance.

You are quite right to point out that acoustical resemblance during playback cannot be achieved simply by creating a facsimile of the recording space. Creating an acoustical resemblance between the listening space and the recording space takes into account things like the number of sound sources in the listening room (two, assuming you are listening in stereo) and the position of the listener relative to those sources and to room boundaries. It also takes into account a host of other variables, the manipulation of which, with any luck, results in a listening space that acoustically emulates the recording space, at the listening position. With that in mind...

The various kinds of room colorations you mention, what you are calling “source distortion,” “echo distortion,” and “temporal distortion,” are definitely things to be addressed. But it seems to me that these are precisely the kinds of things that an acoustically treated room DOES address. “Source distortion” is typically addressed by absorption or diffusion at the first order reflection points on the side walls and the ceiling. “Echo distortion” is typically addressed with diffusion behind the speakers. “Temporal distortion” is typically addressed by balancing the ratio of absorption to diffusion to achieve a specific reverberation time.

In light of this, I do not believe that the various kinds of distortion you mention are, in themselves, reason to believe that this approach is doomed to failure. IF this approach were tantamount to constructing a listening space that was a PHYSICAL replica of the recording space, then I would agree with you that it would be doomed. But the approach is to construct a listening space that, in important ACOUSTICAL respects, emulates the recording space, AS HEARD FROM THE LISTENING POSITION. It seems to me that that approach is not doomed to failure, though it is certainly bounded by constraints, both practical and theoretical.

So you have a range of recordings (from heavy cues to none), and a range of rooms (from live to dead), but it doesn’t seem possible to have an optimal room for both ends of the spectrum (which I think you’ve said)…

I agree that it is not possible to have an optimal room for all recordings. A person must choose on the basis of the recordings they tend to listen to, or the ones they are the most interested in optimizing, for whatever reason.

To sum up, I think to the extent that you succeed in making the ambience cues from the recording omnidirectional, they’ll be mis-timed, out of phase, and probably polarity flipped. And that is on top of all of the very strong room cues that you will necessarily generate to get the recorded cues to be omnidirectional. Or, to put it another way, I don’t think it is possible to get the recorded cues to be omnidirectional without seriously compromising the “you are there” effect.

This is an interesting argument. As I understand it, you are saying that the measures required to create omnidirectional ambient cues in the listening space would, in effect, destroy the accuracy of the ambient cues of the recording, as heard at the listening position. In a way, you are saying what Al said in point (3) of his post from 9/13 - what he described as a “tradeoff.” So my response to your argument is the same as my response to his observation: My view is that omnidirectional ambient cues are more valuable than strictly accurate ambient cues for creating the illusion that "you are there." Having said that, I guess I’m not as skeptical as you, Cbw, about the possibility of constructing a listening space whose acoustics allow for omnidirectional ambient cues that are REASONABLY ACCURATE to the recording. I wish I had the resources to build some rooms and put these theories to the test!

So, my theory:
1) Strong recorded cues + live room = a mess tending toward “they are here”
2) Strong recorded cues + dead room = “your are there” but bidirectional cues
3) Weak recorded cues + live room = “they are here” but if the room is sufficiently like the recording space, you approximate “you are there” for that space
4) Weak recorded cues + dead room = “they are here” (or in a studio)

Now this is a nice way of organizing things! But I don’t agree with it all. I think you are absolutely correct about scenarios (3) and (4). But, as I've indicated above, I don’t think category (1) would necessarily result in the “mess” you anticipate, provided that careful attention were paid to acoustical design. I am also doubtful that scenario (2) would result in the illusion that “you are there,” for the reason I have stated many times in this thread, namely that I don’t believe the bidirectional presentation of ambient cues can create the illusion that "you are there." In effect, scenario (2) is an approximation of an anechoic chamber, and I don’t believe you can create the illusion that “you are there” under those conditions.

Having said all this, I should reiterate something I mentioned earlier in this thread, but that may have been lost in the discussion by now: I don't believe that constructing a listening space that emulates a particular recording space is the BEST approach to building a listening room, for many of the reasons that have been pointed out, and some that have not. I do believe that it is a VALID approach, especially for audiophiles who tend to listen to one type of music. For folks who listen to a wide range of music with vastly different recording spaces, constructing a listening space that emulates a particular recording space is probably NOT the best approach. In the latter case, the best approach is probably a balance of:

(1) Emulation of some set of recording spaces.
(2) Creation of a listening space that provides a balance of attributes important for the hearing exactly what is on the recording.

To the extent that an audiophile chooses (1), he is favoring colorations over accuracy. To the extent that he chooses (2), he is favoring accuracy over colorations. (1) is the approach of some audiophiles who are primarily interested in creating a playback space that they themselves find interesting; (2) is the approach of recording studios, where accuracy is the Order of the Day.

The use of the word "coloration" above is not pejorative. Although I am an outspoken (read: notorious) advocate of the absence of colorations in equipment, I have a much more mixed view of colorations in the listening room. Although many listening room colorations are destructive (think: room modes, flutter echo, comb filtering, etc.), some room colorations, I believe, are beneficial. Among other things, they can enhance the illusion that "you are there."