@mihalis yes, asynchronous USB receives just data and uses own independent clock for D/A conversion. Still, any electrical connection can inject noise and alter this clock timing. As for optical connection - it helps, but it is not a panacea. Bits are still bits but timing of arrival can be altered. Any system noise can make transition (including light intensity) "jagged" varying exact moment in time of level change recognition (threshold). I don't know how to explain it better, but let's try this - Imagine in slow motion filing your tub with water. It supposed to stop when crossing certain fill level, but water have waves (noise) and every time you repeat it - it stops a little bit sooner or later. That is time jitter.
Perhaps you know all this, but imagine that you receive 1 kHz pure sinewave. Big electrical noise added to signal is causing different moment of level recognition and stream to "vibrate" in time. When words fed to D/A converter vary in time then it will result in creation of additional signal - sideband frequencies. When 1kHz stream delivery vary in time by 20 milliseconds (50Hz) the output of D/A converter will produce 1000Hz, 950Hz and 1050Hz - three frequencies instead of one (and many more at lower levels). Amplitude of these additional signals will depend on range of time vibration, but it is very low. It is still very audible, since not harmonically related to root frequency. Many noise frequencies and many offended frequencies (music) results in added noise - less audible for random jitter (uncorrelated) than jitter induced by particular frequency (correlated).
Perhaps you know all this, but imagine that you receive 1 kHz pure sinewave. Big electrical noise added to signal is causing different moment of level recognition and stream to "vibrate" in time. When words fed to D/A converter vary in time then it will result in creation of additional signal - sideband frequencies. When 1kHz stream delivery vary in time by 20 milliseconds (50Hz) the output of D/A converter will produce 1000Hz, 950Hz and 1050Hz - three frequencies instead of one (and many more at lower levels). Amplitude of these additional signals will depend on range of time vibration, but it is very low. It is still very audible, since not harmonically related to root frequency. Many noise frequencies and many offended frequencies (music) results in added noise - less audible for random jitter (uncorrelated) than jitter induced by particular frequency (correlated).