Okay, I'll take another stab at helping some of the folks around here comprehend what is going on under the hood with digital audio.
First off, the guys who are in the camp of "no perceivable difference" when it comes to streaming seem to have huge gaps in their thought process.
And I've run many people over the years who work for various manufacturers and think along the same lines. They are engineers. I've even personally witnessed the CEO of a very well known high-end audio music server company tell an individual who had no interest in audio, he didn't understand why people spent so much money on his reference product.
As a side note, when it comes to digital audio on the software side (software engineers), I know of many who work for these audio streaming companies who have very little experience with high performance audio systems in real-world situations. Depending on the brand, they may not even be involved in listening to anything as a part of their process, and leave that up to the guys further down (or is it up?) the chain. I've sat in a room full of DTS engineers for example, and none of these guys had any experience with reference level systems in domestic, real world environments. I've had audio recording engineers who produce award winning music tell me point blank they've never run a live event and wouldn't feel comfortable doing so. My point is, there are many people who work within the industry (let alone consume within it) simply parroting information they've gathered from colleagues and such who haven't actually spent any real time with audio systems to be able to make the claims they do.
Anyway, back to the gaps, and the topic at hand -
A network streamer is a computer. This computer can be a typical PC/Mac or what have you, an integrated chipset as a part of a DAC board, a Raspberry Pi, etc.
The computer requires an operating system to run. This can be a software suite like Mac OS or Windows, or a headless OS like Linux. It could be a customized Linux software designed specifically for music (such as Roon), or an off the shelf distribution like Ubuntu or Archlinux.
Next you have the audio playback software. If it's a PC or Mac, it's any number of programs one can download and install. If it's Roon, it's Roon. If it's MPD on a headless Linux box, it's MPD. if it's Sonos, it's whatever Sonos is using for this process (likely MPD or something similar). If it's Pro Tools, it's Pro Tools.
At this stage any number of processes can be applied such as bypassing the kernel audio in the case of a traditional OS. This is also where the metadata information is gathered into usable information by the application.
Also the digital audio information can be treated in a number of different ways by the player software before sent out as packet data over whatever interface one is using. Up-sampling/oversampling, DEQ, compression, etc.
This is where things get tricky. At this point the player software needs to send the audio data downstream to the DAC. It needs to do this over an interface. The interface can be serial packet data sent over USB, Firewire, Ethernet, or other proprietary serial links. The interface can also be SPDIF stream data, or I2S linked data.
How the player software delivers the data to the interface, and what interface is being used by the streamer to the DAC, can dramatically affect performance. Please consider the following.
In order to link the data stream to the DAC the DAC needs to properly determine the clock signal (word length and sampling rate) embedded in the data. The DAC can clock the data to itself using an Asynchronous transfer method in which the DAC clock is the master clock (this is how most USB and Firewire interfaces function). If using SPDIF, the clock in the DAC needs to use a phase-locked-loop to synchronize the frequencies of the sending clock with itself. If the DAC is well designed for use with SPDIF it will typically have independent PLL circuits for each sampling rate. If using I2S, a master clock is shared between the DAC and sender.
In the case of USB Asynchronous transfer there is potential for data loss within the transmission which can result in intersample-overs or just plain dropouts/scratchiness in the signal. I've experienced this with so many different varieties of USB DACs over the years I'm surprised it is still such a popular interface for audio enthusiasts.
Firewire might still be used in studios and was a better link at the time for professionals in most applications but it's mostly irrelevant now so not going to spend time on that much.
In the case of Ethernet transmission on a typical local area network, the streamer can send the real-time audio data out over the network using a number of different methods. As far as I am aware, nearly all of these methods are UDP-based protocols - AES67, Dante, Ravenna, AirPlay, Roon, Songcast, etc.
Alternatively, if the streamer (computer and player software) is embedded on the device containing the receiving DAC, then the streamer can use TCP/IP to download the file information into a memory buffer where it can be managed better. Essentially, this is what Sonos, BluSound, Marantz, Yamaha, Sony, Linn, Naim, etc. are doing when you are using their native applications to "stream" audio from any number of services, or from your local content library. In the case of real-time internet radio streams, or services like Pandora, the streams are "captured" using UDP protocol from the sending player. This is why so many users of any of the above products I've mentioned may experience dropouts and/or interruptions in the stream.
This brings us to the topic of - where is the streamer (computer/player software) obtaining the file from? The files can come from a network share on the local or wide area network, a local storage attachment (USB, SATA, SD Card, etc.), or a "captured" stream of a file being played on a remote server somewhere. All of these different solutions have potential for degradation of the audio depending on how the data is sent out/obtained from the server by the client. For example, Spotify's entire library is at the very least lossless (I happened to have had a conversation with someone who manages their storage arrays), but the audio is transcoded to 320kbps OGG before it is received by their API on a client.
This all becomes far more complicated when you introduce other devices on the network all competing for the same protocol traffic.
Anyone who wishes to still believe, after this thorough analysis, that "all streamers are the same", simply hasn't done their homework.