Why does a Music Server require high processing CPU power?


I noticed that some music servers use, for example, a dual multicore CPU’s running under a custom assembled operating system.  In addition, the server is powered by a linear power supply with choke regulation and a large capacitor bank utilizing the highest audiophile grade capacitors.  Various other music servers have similar high CPU processing capabilities.  

I know that music is played in real-time so there is not much time to do any large amounts of processing.  I also know that the data stream needs to free of jitter and all other forms of extra noise and distortion.   I believe that inputs and outputs are happening at the same time (I think).

I also know that Music Servers needs to support File Formats of FLAC, ALAC, WAV, AIFF, MP3, AAC, OGG, WMA, WMA-L, DSF, DFF, Native Sampling Rates of 44.1kHz, 48kHz, 88.2kHz, 96kHz, 176.4kHz, 192kHz, 352.8kHz, 384kHz, 705.6kHz, and 768kHz and DSD formats of DSD64, DSD128, DSD256 and DSD512 including Bit Depths of 16 and 24.  

Why does a music server require high processing power?   Does the list above of supported formats etc. require high processing power?  Assuming the Music Server is not a DAC, or a pre-amp, what is going on that requires this much processing power?   

What processing is going on in a music server?  How much processing power does a music server require?  

Am I missing something?   Thanks.   


hgeifman
Nearly all the CPU-intensive activities you cite (except for equalization) are the responsibility of the DAC.


Um, what?? Depends. In my world, a DAC takes S/PDIF in, or USB in, and produces an analog signal out.

While upsampling may be done by the DAC, file filtering and streaming does not. Many manufacturers may chose to implement their own upsampling algorithms before the DAC.

I’d also like to point out that the latency issue you raised does not affect sound quality but the time it takes after pressing Play until the music starts.

You misunderstand how I meant latency. I did not mean latency to user input, I meant the time between a CPU starting to process a sample and the time it is done with it, also known as CPU time. In order to do any processing a CPU takes time. We can estimate the worst case boundaries. For a 44.1kHz signal, the CPU must complete all of it’s work in 1/44,100 of a second, and have that sample of data ready for the DAC. If the signal is a higher sample, or is being upsampled, the CPU must complete it’s work in 1/88200 of a second for an 88.2kHz sample/upsample.

That’s’ about 11 microseconds. That’s the absolute maximum amount of time the CPU is allowed to handle this, or else the output will not keep up with the input. It must do at least the format conversion, EQ and upsampling. This is in addition to any network housekeeping, and UI interactions, and in addition to responding to the DAC clock saying "Gimme the next one!"

So, if a CPU core takes EXACTLY 11 microseconds to process this sample, it has ZERO time to process anything else. It has no time for network / house keeping, library management or UI responding. Any additional work would get queued and the audio output would stall.
If the CPU takes 11/2 usecs (microseconds), then the CPU core is only 50% utilized, and has processing power to schedule other work.

It’s important to stress that you must have input/output flow balance. That is, the rate of processing must be equal to or faster than the rate of output. You can’t make up for this with longer delay before a track starts, or larger buffer sizes, unless they are infinitely large. Imagine a CPU that takes 22 uSeconds to process an 88 kHz sample, and that the track is 5 minutes long. That means that the CPU won’t finish with the last sample for 10 minutes. You’d need a five minute buffer.  You press play, wait five minutes until the buffer fills, and then the music starts.  10 minutes after you hit play, the 5 minute track finally finishes playing.

Lastly, we should of course note that we are talking about a stereo sample. That is, at 88.2 kHz we must complete the decompression, EQ and upsampling of 2 samples every 11 microseconds.

So, if you ask me, how much CPU power do you need?

Well, you need enough CPU power to completely process every sample in real time, plus handle all the additional work.

We can calculate the maximum allowed compute time this way:

seconds = 1 / sample rate

So, you must use a CPU capable of meeting this, AND still have enough power to handle the other events that are happening in semi-real time. This calculation provides the minimum boundary.

Of course, for many reasons, having the CPU take 100% is not a good thing in an interactive or RTOS, so a target of 50% CPU utilization in an embedded CPU is reasonable. The amount of CPU power, if we think of this in terms of MIPS or SPECints or FLOPS is therefore dependent on the work that must be done in real time. You do less work, you get a cheaper, slower CPU.

I don’t really think it’s "a lot" either. I mean, we live at a time when 8 GByte/4 core 64-bit ARM based Raspberry PI sell for $120 and are fully capable of handling a desktop PC. It is important however to note that unless you know the architecture, clock speed, clocks and threads you don’t really know much about "how much" compute power you have. You can’t say "well I have 4 cores, so I can run grand theft auto." The’re all small, rectangular, and have a lot of pins on them. That doesn’t make them interchangeable in terms of the work they can do per unit time.

For instance, your average router has a pretty beefy, multicore Broadcomm system on a chip (including CPU) in it too, and all it’s doing is moving packets around. Is it "too much"? I doubt it.


Best,E

@lalitk, Sorry, I do not understand and need more clarification.
  
Let us assume a normal ‘single' purpose MUSIC streamer (no ROON, DAC or Preamp) that needs to support an input stream of albums with File Formats of FLAC, ALAC, WAV, AIFF, MP3, AAC, OGG, WMA, WMA-L, DSF, DFF, Native Sampling Rates of 44.1kHz, 48kHz, 88.2kHz, 96kHz, 176.4kHz, 192kHz, 352.8kHz, 384kHz, 705.6kHz, and 768kHz and DSD formats of DSD64, DSD128, DSD256 and DSD512 including Bit Depths of 16 and 24.   I agree this is a lot of information.

What processing is the streamer doing on these files, for example, an AIFF coded album stream?    

FIRST, I know the streamer needs do to prepare the input for outputting USB, AES/EBU, etc. formats (I think).   Does this process require high amounts of processing power (probably yes)?  

SECOND, AND, in addition, what does the streamer do to the steaming input to prepare it to be accepted by the DAC?   Let us assume an AIFF Coded album.   What exactly is involved in preparing the AIFF Album file (or any of the other formats) for inputting to the DAC and does this require large amounts of processing power (probably yes)?

Your post above (and someone else's) said "It takes a tremendous amount of processing power to compute all the different ways digital has tried to keep up with analog“.  I think you are saying is that in order to support the large amount of input formats, high amounts of processing power is required to prepare the AIFF coded album for output to the DAC.  Is this correct?   

Assuming the above is true, it means the servers processing power is used to prepare the input data stream for correct formatting for the USB, AES/EBU, formats, etc. AND then format it, as needed, for input to the DAC.   Based on my understanding, I think this sentence answers my original question.   Am I correct?

Or, are there other processes going on in the streamer that requires high processing power?   If yes, what are they?   Thank you very much.


To get back to the OPs question there is nothing going on in a dedicated  home based music server to necessitate the need for dual 20 core Xeon processors and 48 GB of ram unless you're starting your own streaming service. A NAS can run a music server perhaps not Roon but Plex or something similar. 
Roon may over-estimate it’s CPU requirements, mostly because they want to specify a CPU that is capable of doing all the upsampling you might ask it to do.

I’m running an 8 y.o. AMD A10 with no issues, but if I attempt high level DSD sampling I won’t be able to. It also sounds bad IMHO, so I lose nothing.

Also, Roon does all the EQ and upsampling in the server, so for every DAC end point you need to account for that.
But overall, I think it’s pretty light. I’ve also seen Logitech Media Server run on routers. That’s extremely lightweight.
Let us assume an AIFF Coded album.   What exactly is involved in preparing the AIFF Album file (or any of the other formats) for inputting to the DAC and does this require large amounts of processing power (probably yes)

Think of audio compression like a zip file, it's already been compressed the codec uncompresses then software and other firmware in the device send it on to the DAC. Unless some sort of manipulation of the file is being done like upsampling or EQ it's not a very CPU intense operation.