Just read this lighter version towards the maths and physics of these two formats:
Scientific American 2017 article
Has anyone been able to define well or measure differences between vinyl and digital?
It’s obvious right? They sound different, and I’m sure they measure differently. Well we know the dynamic range of cd’s is larger than vinyl.
But do we have an agreed description or agreed measurements of the differences between vinyl and digital?
I know this is a hot topic so I am asking not for trouble but for well reasoned and detailed replies, if possible. And courtesy among us. Please.
I’ve always wondered why vinyl sounds more open, airy and transparent in the mid range. And of cd’s and most digital sounds quieter and yet lifeless than compared with vinyl. YMMV of course, I am looking for the reasons, and appreciation of one another’s experience.
Just read this lighter version towards the maths and physics of these two formats: Scientific American 2017 article
|
There seems to be competing explanations of the recording file format characteristics. This is a maths and physics question and should be able to be resolved. Then there are subjective experiences and these are equally conflicting. There are areas of agreement such as higher or airier mid-range of vinyl and the less noisy cd format and some agreement of SACD as better than cd. Ideas like some music styles may better suit certain formats better, classical with vinyl for example depend on technology justification that isn’t yet fully accepted. On the other hand, the civility in this conversation despite differences has been glorious and I hope that we can continue this way.
|
I agree with these two statements of yours. What I was referring to are approaches where limitations of the classic theory were disregarded. Let me try to explain from a different angle what I meant, using a concrete example. Imagine an audio signal which is a sinusoid of frequency 12 KHz, with amplitude described as piecewise function of two segments linear on dB scale. First segment goes from 0 to 100 dB SPL during first half cycle of the the sinusoid. Second segment goes from 100 db SPL to below quantization noise during the next four cycles of the sinusoid. Try to sample it with 16/44.1. Then try to reconstruct the signal from the samples. Then shift capture time of the first sample by 1/8 of the sinusoid period. Repeat the exercise. What you’ll find is that, first, reconstruction will be pretty rough, and second, that it will be wildly changing with even small shifts of the first sample capture time. From Fourier Analysis viewpoint, this is an example of a signal with spectrum extending significantly beyond 20 KHz, which makes sampling at 44.1 KHz untenable, and result of reverse transform unpredictable. Yet from human hearing system standpoint, such a signal is perfectly valid, and will result in physiological reactions inside several inner hair cells. Most likely, if it manages to evoke a sensation of pitch in a particular individual, perceived pitch frequency will be close to the intended 12 KHz. An analog system doesn’t care about the sampling frequency, and at what precise moment of time the first sample happens to be taken, and would capture this signal fully, with some distortions of course, yet nevertheless it will capture the shape definitively. And it will be reconstructed definitively as well. Imagine further, that some short time later, another signal comes in, which is exact reversal of the first one. Depending on the time difference of the signals start, the sampled values of the second signal may range from exact opposites of the first set of sampled values to something seemingly unrelated. Once again, human hearing system, with its half-wave rectification capability, will react to the second signal in a similar way it reacted to the first. And once again, the analog system, not restrained by sampling and time shift considerations, will capture the second signal fully. If, on the other hand, we significantly increase the piece-wise linear segments duration: let’s say first segment goes up for 100 cycles, and the second one goes down for 1,000 cycles, then the 16/44.1 sampling with consequent reconstruction will produce much more agreeable result. So, I gave an example of a signal which is meaningful and definitive both from the hearing systems and analog recording standpoints, yet non-definitive from the digital sampling standpoint. Also, an example of a signal with the same general shape, yet with different duration of its characteristic segments. Which happens to be both meaningful and definitive from all three standpoints. Which illustrates the limitations of digital sampling and classic Fourier-analysis-based DSP: they work well enough in most practically encountered cases, yet not always. In contrast, analog may be worse in most cases in terms of distortions and noise, yet it works consistently in all practically encountered cases, which may be important for recording and reproduction of certain genres of music. Increasing the sampling rate effectively rescales the problem: certain signal fragments and components which couldn’t be perceptually transparently captured at a lower sampling rate are now captured well enough at the increased sampling rate. At the limit, sampling at increasing rate becomes perceptually equivalent to analog recording, sans the distortions and noise. At which point does it happen? It depends greatly on the characteristics of music, and on critical listening abilities of the person who tries to enjoy that music.
Let me clarify. I wrote "encoded" meaning that we could use the remaining still available stream of one-bit values to encode in the same way that DSD does. Of course bits are used differently by PCM and DSD - pulse vs delta etc. That was to illustrate the point that the amount of information per second remaining available, in the case if we’d decided to use 15 bits for encoding of dynamic range, is indeed equivalent to a very low-fi format.
To understand what I meant, look at the physical bits of the quietest in this context PCM-encoded signal. All the upper bits, which I called "used for encoding dynamic range", will be zero. It is not that these specific bits of PCM stream would be always used for encoding dynamic range. What counts is the number of bits that we have to keep unused while encoding the quietest segment of music. Secondly, please take into account that human hearing system is capable of adjusting its sensitivity, and symphony composers tend to use this factor fully. The symphonies typically have quiet segments, when a neighboring spectator shuffling her purse may be pretty distracting, and they also have short bursts of apotheosis, with SPL falling just short of hearing system pain threshold. In the context of a quiet segment, the perceived distortion level threshold is scaled down. That’s why I do indeed consider it as if it was a full-scale signal. There are other factors of course: e.g. the equivalent loudness curve shifts.Yet if we only consider the most stable part of the curve, at mid-frequencies, the rule-of-thumb calculations generally work, plus-minus a bit.
That would depend on nature of the music fragment, right? And on my hearing ability. In general, I didn’t claim anything of the sort. Only that, as an order of magnitude estimation, an amp with 0.3% THD is usually considered low quality, an amp with 0.003% THD very high quality. The middle on logarithmic scale: 0.03%, was considered in enough accounts I found credible as a threshold of quality.
Dithering is helpful in most practical cases. Yet, if you look at the mathematical derivations of the common dithering schemes, you’ll see that the characteristic duration of signal stability is a factor in calculations. Similarly to the examples I gave earlier in this reply. If a signal is composed of slowly changing sinusoids, dithering helps a lot. It a signal consists mostly of harmonic components quickly changing their amplitudes, non-harmonic transients, and frequently appearing/disappearing components, dithering is not as effective.
I believe at that point I provided enough explanations. Your reactions are quite typical of engineers who consider the classic DSP based on Fourier Analysis the only true paradigm. From my perspective, it is only absolutely true for abstract mathematical constructs. It is nothing but useful approximation of real world. One ought to be very careful with the corner cases, where the abstractions stray too far away from the phenomena they are supposed to model.
As I highlighted, the approach you are advocating doesn’t address the need of having some bits left available for encoding the shape of signal faithfully enough to be perceived as distortions-free. The theory I use explains well enough why the so-called Loudness Wars can be considered a rational, professionally responsible, reaction to deficiencies of the most widely used at the time audio recording format - CD. This theory explains why some listeners still prefer listening to LP for some genres of music, despite the fact that, according to the classic theory, CD is vastly superior. Once again, this is a rational and responsible reaction. The theory explains with good enough for me personally precision why most professional sound mixing and mastering studios didn’t advance beyond the 24/192 PCM format. It also explains why some modern symphony recording engineers moved to 24/384 and DSD256 formats. And other otherwise unexplainable for me phenomena.
DSD is a delta format. Formally, general DSD has unlimited bit depth, and thus dynamic range. It is only constrained in specific versions of the format to correspond to a set bit depth at an PCM-equivalent sampling rate. The noise considerations started to amuse me lately. Practical examples were a trio of class-D power amplifies, highly regarded by ASR. I bought them over the years, evaluated, and quickly got rid of, due to intolerable for me distortions. Yet SINAD of these amplifiers was excellent. Which made me look closely at SINAD measurement procedures. Long story short, SINAD is predicated on taking Fourier transform over a very long window, of a signal comprising of a set of sinusoids with equal and unchanging amplitudes. Where all three failed miserably for me was reproduction of low-signal-level transients, something SINAD doesn’t capture all that well. Yet the theory I use explained their behavior rather precisely. It also predicted what power amplifiers would be more acceptable to me.
It depends on the nature of noise and nature of signal, doesn’t it? For white noise and a short sinusoidal burst, I’d agree with you. I’m more interested in a typical music signal, with spectrum close to pink noise, masked by pink noise. In that case, having it 6 dB over the noise floor results in more reliable perception.
Not on assumptions. On theories. Fitting experimental facts. The theory I use is more sophisticated than the classic one, taking into account analog characteristics of human hearing system. On its simplest level, instead of considering just dynamic range, it also considers the shape of what the dynamic range is applied to. Once this is done, preference for LP in certain situations ceases to be a mystery. Cochlea is not a Fourier transforming machine. In some regards it is more crude, yet in others it is far more advanced. As an example, it starts noticeably reacting only after observing two cycles of a pure sinusoid, virtually irrespective of frequency. For higher frequencies, at 44.1 KHz sampling rate, this may correspond to only a few samples. The shape of a quickly changing signal can’t be faithfully captured by such small number of samples. Once we get into signals comprised of quickly appearing and disappearing components, the simple intuition good enough for the previous example no longer works, and math becomes much heavier, yet fundamentals remain: the higher the sampling rate (assuming equal quantization accuracy), the deeper the bit depth (assuming equal timing accuracy), the better it gets. And yes, I’m aware of the oversampling nature of practical ADC and DAC. Of the fact that internally they are sampling/reconstructing signal at significantly higher rates, and then encode adjustments not only into the slower-sampled values within the signal time range, but also outside it. Still, Information Theory is a bitch. If there isn’t enough bits to encode the changes in the signal that would be noticed by cochlea, some meaningful information would be lost. I did some experiments on fragments of music that I recorded and mixed myself. The distortions of 16/44 compared to 24/192, albeit subtle, mostly manifested themselves as uneven rhythm of smaller-volume transients. |
@fair ,
From where I am sitting you have not provided one explanation because every single explanation or example you have used is wrong, stacking misunderstanding on top of misunderstanding. Fourier analysis is not a paradigm, it is a mathematical translation from time to frequency, it just is. The accuracy, as I previously wrote, is based on suitable bandwidth limitations, and appropriate windowing functions, much which occur naturally in audio, but are still supplemented by the appropriate analog filters, over sampling, and digital processing. People are not just guessing at the implementation and not considering what the underlying waveforms can and do look like. Let me break just one section down to illustrate your logic flaws and misunderstandings. It carries through to the rest of what you have wrote:
You start with a flawed premise, proceed to a flawed understanding of digitization, and finish with an incorrect understanding of reconstruction. Flawed premise: 12 KHz sine wave do not suddenly appear, starting at 0. As I previously wrote, we are dealing with a bandwidth limited and defined system. You cannot go from 0, silence, directly into what looks exactly like a sine wave. That transition exceeds the 20KHz (or whatever we are using). Also, the digitizer, filters, etc. will have been running and settled to required accuracy by the time this tone burst arrives. Whatever you send it, will have been limited in frequency, by design, by the analog filters preceding the digitizer. Flawed understanding of Digitization: As written above, the digitizer was already running when the tone burst arrives. Whether the sample clock is shifted globally the equivalent of 1/8 of a 12KHz tone, or not, will have no impact on the digitization of the information in the band limited analog signal. Flawed understanding of reconstruction: When I reconstruct the analog signal, using the captured data, whether I use the original clock, or the shifted one, the resulting waveform that results will be exactly the same. In relationship to the data file, all the analog information will be shifted by about 10 useconds. That will happen equally on all channels. The waveforms will look exactly the same either case. One set of data files will have an extra 10 useconds of silence at the front of them (or at the end).
I am sure you believe this, but you used flawed logic, a flawed understanding of the waveform, and a flawed understanding of digitization, reconstruction, and the associated math. I went back and looked looked at the research. In lab controlled situations, humans can detect, a very specific signal up to 25db below the noise floor, A-weighted. That is not listening to music, that is an experiment designed to give a human the best possible chance. For vinyl, that means in a controlled experiment, maybe you could hear a tone at -95db referencing 0db as max. With CD, the same would be true at -110db (or more) due to the 100% use of dithering.
To be sure we are on the same page. Class-D amplifiers are analog amplifiers. They are not digital. I will correct you. Perception of distortion. You are making an assumption of something that is there, without proof it is there.
Which theory is it that you are using? I noted many flaws in your understanding of critical elements of digital audio, and assertions that are also incorrect. I have already falsified your theory.
Perhaps not important to this discussion, but 16/44.1 is a delivery format. From what my colleagues tell me, is has not been used as a digitization format in decades, and depending on your point of demarcation, it has not been used as a digitization format since the 1980’s, as all the hardware internally samples at a higher rate and bit depth.
|