speakers for 24/96 audio


is it correct to assume that 24/96 audio would be indistinguishable from cd quality when listened to with speakers with a 20khz 3db and rapid hi frequency roll-off?

Or more precisely, that the only benefit comes from the shift from 16 to 24 bit, not the increased sample rate, as they higher freq content is filtered out anyhow?

related to this, which advice would you have for sub $5k speakerset with good higher freq capabilities for 24/96 audio?

thanks!
mizuno
This has been a very interesting thread, and I've learned a lot. I have a question that bears on the value of high resolution audio formats, particularly the value of sampling rates higher than 44.1. Here is the question:

Is the preference for high resolution audio formats (24/96, 24/192, etc.) partly attributable to the fact that those formats have better temporal resolution?

I don't know the answer to this question, but it's been on my mind since reading a number of papers with passages like this:

It has also been noted that listeners prefer higher sampling rates (e.g., 96 kHz) than the 44.1 kHz of the digital compact disk, even though the 22 kHz Nyquist frequency of the latter already exceeds the nominal single-tone high-frequency hearing limit fmax∼18 kHz. These qualitative and anecdotal observations point to the possibility that human hearing may be sensitive to temporal errors, τ, that are shorter than the reciprocal of the limiting angular frequency [2πfmax]−1 ≈ 9 μs, thus necessitating bandwidths in audio equipment that are much higher than fmax in order to preserve fidelity.

That quote is from a paper by Milind Kunchur, a researcher on auditory temporal resolution. More can be read in this article from HIFI Critic. Kunchur's research is somewhat controversial, but I have found a number of other peer reviewed papers that seem to confirm that the limits of human temporal resolution is quite low, on the order of MICROseconds.

If that is true, then part of the advantage of high resolution audio formats might be the fact that they have superior temporal resolution, thereby providing more information about very short alterations in the music, i.e., transients. Or so the argument goes.

Anyone have an opinion about this?

Bryon
Byron,

There is no solid evidence for this - so it is indeed controversial. If a mere few microseconds were important then speaker and listener position would be dependent down to a millimeter or less than a tenth of an inch. It is generally accepted that 1 msec is the point at which time differences become audible (roughly 1 foot). Our ears are roughly 6 to 8 inches apart. Since temporal differences are detected by the difference in arrival at each ear - this all suggests that our "resolution" is close to that length which is about 0.5 msec in time ( at the speed of sound in air).

What these findings may be related to is "jitter" - it has been shown mathematically that non random time errors can produce audible "sidebands" around musical signals and that jitter of 1 microsecond can be quite audible due to our ability to hear these non-musical sounds or tones or sidebands. If you increase the sample rate then you will change the way jitter affects the sound - a significantly higher sample rate would likely reduce the deleterious effects of jitter. Some sample rates are noted for being better than others for reducing audible jitter. Benchmark found that 110 Khz worked better than other rates with the DAC chip they use.
Hi Bryon,

Interesting question, and an interesting paper, which I read through. It strikes me as very intelligently and knowledgeably written, and I see no obvious flaws in the details he presents. And intuitively it does strike me as plausible that our ability to resolve timing-related parameters might be somewhat better than what would be suggested by the bandwidth limitations of our hearing mechanisms.

However, looking at his paper from a broader perspective I have several problems with it:

1)He has apparently established that listeners can reliably detect the difference between a single arrival of a specific waveform, and two arrivals of that waveform that are separated by a very small number of microseconds. I have difficulty envisioning a logical connection between that finding, though, and the need for hi rez sample rates. There may very well be one, but I don’t see it.

2)By his logic a large electrostatic or other planar speaker should hardly be able to work in a reasonable manner, much less be able to provide good reproduction of high speed transients, due to the widely differing path lengths from different parts of the panel to the listener’s ears. Yet clean, accurate, subjectively "fast" transient response, as well as overall coherence, are major strengths of electrostatic speakers. The reasons are fairly obvious – very light moving mass, that can start and stop quickly and follow the input waveform accurately; no crossover, or at most a crossover at low frequencies in the case of electrostatic/dynamic hybrids; freedom from cone breakup, resonances, cabinet effects, etc. So it would seem that the multiple arrival time issue he appears to have established as being detectable under certain idealized conditions can’t be said on the basis of his paper to have much if any audible significance in typical listening situations.

3)More generally, it seems to me that there are so many theoretical, practical, recording-dependent, and equipment-dependent variables that would have to be reckoned with and controlled in any attempt to make a meaningful comparison involving hi rez vs. redbook sample rates, that reaching a definitive conclusion about the degree to which this particular factor may be audibly significant under real-world listening conditions is probably not possible.

All best regards,

--Al
I agree with Al.

This shows how good we are at hearing sounds and nothing to do with temporal resolution.

The wavelength at 7KHz is 5cm. Therefore in order to get the direct sound completely out of phase at the listener one need only move one speaker back by 2.5 cm (half a wavelength). This will result in the direct sound being Zero and will probably reduce the SPL level to be clearly audible. The fact that only a 2.9 mm movement was audible suggests that reflections may also have played a role here too.

The use of pure signal of a single tone with no (audible) harmonics can often gives surprising results! This is not reflective of musical instruments that have many harmonics so it is hard to draw any conclusion other than a test tone produces an audible result. Anyway my money is that there is enough of an amplitude difference here to make it audible in the case of a pure test tone. A pure test tone will fluctuate as you move around the room (you get peaks and suckouts depending on how it all adds up (reflection and direct sound).
"Irv, keep in mind that it is generally accepted that signal can be perceived at levels that are significantly below the level of random broadband noise that may accompany the signal. 15db or more below, iirc. So amplifier noise floor is not really a "floor" below which everything is insignificant."

Maybe, but it is very difficult to believe this is the case when listening to music or other complex sounds, like movie dialog or foley. I've always been leery of effects 70db or more below the music level, regardless of the component in question.