As long as human beings are analog, the initial & final music will always be analog. What's in between can be digital. Digital is a compromise for an analog signal - no doubt. How good or bad it is depends on how well the digital system is engineered for the 20Hz-20KHz bandwidth. Digital is chosen mostly for its cost effectiveness (scalability of the DSP engines with shrinking CMOS technology) & what Carver Mead once pointed out - its tremendous noise immunity. Corrupting a stream of digital data to the point of making it useless is very difficult as it requires a lot energy to flip a bit. Some bits do get flipped but the overall context of the message is very much retrievable by using various error correction algorithms. This is hardly the case with a purely analog music signal.
Having more sampling points with an estimation filter allows the digital to better track the analog waveform. Whatever benefits one accured with over/upsampling could be lost by distortions in the analog reconstruction filter. Hence, the above mention of implementation. Having somebody engineer a good re-produced sound CDP solution is priceless (for everything else there's MasterCard!).
Yes, if one converts from analog->digital->analog, one does degrade the original sound. That is to be expected as we take only a finite # of samples (hence the term "quantization"). BTW, if we had infinite # of samples, it would analog! In the redbook CD format, the powers-that-were decided in all their infinite wisdom to Nyquist sample the data onto the CD disc . Thus, no matter how much one oversamples, one can never undo this. Hence the rise of "hi-res" music formats. In fact, if the over/upsampling was A1 perfect, you'd get *exactly* what was on the CD, which is Nyquist sampled!! How good is taking just 2 samples of a dynamically waveform music signal? Not very good I'm afraid!
Eldartford cited his experience: 4 samples was worth every effort. I've found 5-8 samples is worth the effort. The difference is that my work is voice-related. Not hi-res by any standards but when people hear another voice at the other end, they do want to recognize it. Need more samples for this.
FWIW. IMHO.