How Science Got Sound Wrong


I don't believe I've posted this before or if it has been posted before but I found it quite interesting despite its technical aspect. I didn't post this for a digital vs analog discussion. We've beat that horse to death several times. I play 90% vinyl. But I still can enjoy my CD's.  

https://www.fairobserver.com/more/science/neil-young-vinyl-lp-records-digital-audio-science-news-wil...
128x128artemus_5
I think we should be asking the question, "How many samples per waveform are required to reduce the RMS error to below, say, 5%, which is the sort of error achieved during the heyday of the vinyl years?"

Some types of error may be more or less objectionable, but let's start simple. Let's just find out how much RMS error there is for a given sampling scheme.

Surprisingly enough, it's not that hard to calculate. But shockingly, nobody seems to bother.

To calculate, begin with observing that the Fourier theorem shows that all periodic functions are built up as a sum of sine waves, so that to consider music, all we have to consider are sine waves (aka pure tones). Further, it is not hard to compute the difference between a sine wave and its sampled value at any point, for any fixed number N of samples per wave form. You can approximate by just slicing the waveform into N intervals and then calculating the difference at the midpoint of each interval.

It is also easy to square these differences and add them up. You could use calculus, but the above is an adequate approximation.

That is the essence of a computation yielding the RMS error of the sampling scheme per waveform.

Returning to our question, the answer I get is 250 samples per waveform for step-function decoding. At 20 KHz, that means sampling at 5MHz - with infinite precision, of course.

Exotic decoding algorithms can improve on this for pure tones, but how well do they work for actual music? I doubt if anyone knows - certainly I've never seen discussed, even the first question about samples per waveform. I think we should.
@erik_squires "Microtime, as the article envisions it, is not a thing."

Don't agree. It seems to me that he defines it quite clearly in terms of microsecond (neural) phenomena. And also, it seems to me that someone with a Ph.D. in this area is likely to know something about this area.

Where he could be clearer is about the relationship between math and science - like how to not screw it up when applying math to the physical world. But that's a highly technical subject all on its own (for access to the literature see Theory of Measurement by Krantz et al, Academic Press, in 3 volumes), and surprise, many scientists get it quite wrong. Let alone engineers.
It is an interesting article, and I certainly will not fault his credentials w.r.t neurobiology, though it sounds like his knowledge w.r.t. the auditory processing system is 2-skin layers deep but no doubt still deeper than mine. But, even that I will not fault.

What I will fault is his knowledge of signal processing and how that relates to analog/digital conversion and analog signal reconstruction. He seems to process the same limitations in his knowledge as Teo_Audio illustrates above with his record example, that Millercarbon alludes to, and whoever did not calculation w.r.t. bandwidth.

I will start off with the usual example. Records, almost all of them made in the last 2 decades (and longer) were recorded, mixed, mastered on digital recording and processing systems. Therefore, whatever disadvantages you think apply to digital systems w.r.t. this timing "thing" absolutely and unequivocally apply to records recorded in digital.

So back to the paper, Teo’s error in logic / knowledge, miller’s interpretation. The most recent research shows that us lowly humans can time the difference of arrival of a signal to each ear to about 5-10 micro-seconds. Using that mainly, and other information, we can place the angle of something in front of us to about 1 degree. 5usec =~1.5mm of travel. Divide the circumference of the head by 1.5mm and you get about 360, or 1 degree of resolution. Following?

So how does the brain measure this timing? By the latest research, it appears to have 2 mechanisms, one, that works on higher frequencies, higher than the wavelength of the head’s size, that is based on group delay / correlation, i.e. the brain can match the same signal arriving to both ears and time the difference and another mechanism for lower frequencies, that can detect phase, likely by a simple comparator and timing mechanism. The two overlap. Still following? You will not this happens with relatively low frequencies, i.e. still frequencies within the range identified for human hearing. I know know ... but the timing, what about the timing. So let’s talk about that.

First a statement: In a bandwidth limited system (as digital audio systems are), any signal on those two (or more) channels will be time accurate to the jitter and SNR limit of the system, and NOT the sampling rate. Let me state that another way. Any difference in timing captured by a digital audio system, assuming the signal is within the frequency limits of that system, will be captured. Let me state that a 3rd way with an example. We have a 96KHz ADC with 10 pico-second jitter. We have two identical signals, bandwidth limited to say 10Khz. One signal arrives at the first ADC 1-microsecond before it arrives at the other ADC. We then store it and play it back. What will we get? ... We will get 2 signals, essentially exactly the same, with one signal delayed by 1-microsecond.

So, all those arguments the neurobiologist made in that extensive article, all his knowledge, are all for naught because he does not understand digital signal processing and ADC systems and analog reconstruction. If he did, he would have known that digital audio systems, within the limits of bandwidth, are not limited in inter-channel timing accuracy to the sample rate, but to the jitter. Whether the signals leave both channels at time A or time B does not matter, as long as the relationship in timing between the two channel is accurate .... which it is in digital analog systems.

.... and if you are reading this GK, not once did I need to consult wikipedia :-) ...
David, I don't quite follow your third last paragraph.

Your "first statement" is indeed a statement, capable of being true or false, but is it true? It needs justification, it seems to me. It is not the same at all as the sentence following, "Let me state that another way." And the sentences following "Let me state that a 3rd way ..." do not convince me that the phenomenon is independent of sampling rate.

The nature of the signals is irrelevant. It is the relative timing of the encoding that matters. If the sampling rate is not high enough, or the jitter rate not low enough, then two signals differing by 1 microsecond will be encoded as identical.

Perhaps an example will help you to understand my confusion. It seems to me that if sampling is done at a frequency of 1Hz, and two signals differing by 1 us are detected, they will be encoded in the same pulse about 999,999 times out of 1,000,000. Which logically implies that sampling rate is intrinsic to the issue. 

Perhaps you could point out the source of my confusion.
New here but I found his points, yes his science, very intriguing.  To the point I thought, heck, hes got it right.  But I cant help but wonder, even a fully digital stream/source/path ultimately has to be reproduced through a vibrating speaker.  It seems that this is a massive integration or smoothing, each connected (albeit complex) peak and trough lasting way longer than the neural timing. Accepting his points, maybe this is digital's way to get by as well as it does.  BTW, I'm not picking sides, just the way I stated it.