I think we should be asking the question, "How many samples per waveform are required to reduce the RMS error to below, say, 5%, which is the sort of error achieved during the heyday of the vinyl years?"
Some types of error may be more or less objectionable, but let's start simple. Let's just find out how much RMS error there is for a given sampling scheme.
Surprisingly enough, it's not that hard to calculate. But shockingly, nobody seems to bother.
To calculate, begin with observing that the Fourier theorem shows that all periodic functions are built up as a sum of sine waves, so that to consider music, all we have to consider are sine waves (aka pure tones). Further, it is not hard to compute the difference between a sine wave and its sampled value at any point, for any fixed number N of samples per wave form. You can approximate by just slicing the waveform into N intervals and then calculating the difference at the midpoint of each interval.
It is also easy to square these differences and add them up. You could use calculus, but the above is an adequate approximation.
That is the essence of a computation yielding the RMS error of the sampling scheme per waveform.
Returning to our question, the answer I get is 250 samples per waveform for step-function decoding. At 20 KHz, that means sampling at 5MHz - with infinite precision, of course.
Exotic decoding algorithms can improve on this for pure tones, but how well do they work for actual music? I doubt if anyone knows - certainly I've never seen discussed, even the first question about samples per waveform. I think we should.