@dougthebiker , thanks.
Digital representation of sound is not by square waves. There are no square waves in digital audio. There are square waves (before filtering) in inverters generating AC from DC but that’s a totally different story.
The sound level at a sampled point in time is represented by a numerical value in binary form. No sound is 0 and maximum sound level is the highest value possible with the number of bits used. The more bits (0’s or 1’s) are used to represent the numerical value (e.g. 24 vs 16) the better resolution is possible (discrimination between the closest sound levels). When we have 16 bits, we can have 2^16 distinct values, or sound levels, from 0 to 65535.
The sampling rate, or how frequently these numbers are sampled, defines the frequency response, or rather, what is the highest frequency that can be transmitted. For instance, CD sampling rate is 44,100 times (values) per second, and the highest frequency is around 20 kHz.
What you get in digital sound is a stream of numerical values in binary format which represent the sound level at each sampled point in time during the music. The frequency, or pitch of the sound, is determined by the fluctuation of consecutive numerical values. Converting these numerical values to analog voltages and "connecting" or "smoothing" them is done by the DAC. Digital transmission devices do not modify the numerical stream, only ensure it is received as it was sent or report errors.
For simplicity, if you use 4 bits, then 0001 = 1, and 1111 = 15. So 1000 (which = 8) is not greater by 1 vs 0000 (which = 0). The packet transfer protocols check for errors both in wifi and ethernet. If your wifi signal is strong and stable and you have sufficient bandwidth you get the same stream as via ethernet, otherwise you get drops or buffering.
For whatever it’s worth, there is a technical review of the UpTone Audio EtherRegen in audiosciencereview.com. You can look it up.