The developer of this player did answer questions of another person who asked similar questions. I guess he is getting repeated questions, and hence not finding time to respond. It loads to RAM, does an "optimization" (specifics not described, but looks like its there in the code), waits for a couple of minutes and then stores back to drive.
Regarding user preference, I actually am not fond of results of first optimization - it sounds a bit distant and veiled, even though it clearer than original file. But run the same through optimization process 3-4x and all the veil is gone, and the clarity remains (and actually gets better) and now it is definitely much better than stock file on all aspects. If the preference to original files among the users were in comparison to first optimization, I recommend giving 3-4x optimization a try. At present it doesn’t seem possible to do optimization for multiple files at once, so that’s a cumbersome task. I can hear the differences for sure, and I am working on getting a true double blind test done (its not an easy task to do one that doesn’t have loop holes).
Regarding why it works, I think it is well within conventional physics, we just need to analyze it deeper than our current FFT analysis methods (we are analyzing only a very small subset of test tones at present mostly and I don’t think much conclusive results can be obtained from this). In a normal storage disk, every bit is stored as a set of charges in a cell (typically a floating gate nand cell), and the scenario in which the write action happens can likely manifest in differences in the structure of charges and magnetic fields stored in the cell that the next access after optimization may have either lesser noise or lesser correlated noise. Also to note that RAM and normal storage work in different ways. PC RAM works as a Dynamic Random Access Memory unit with constant refreshes (volatile memory) and Normal storage is non volatile and retains data once stored.
Digital circuits work just with thresholds. Above a certain threshold it is 1, below it it is 0 (or vice versa in some implementations), and there are boundary conditions which the designers have to work hard to ensure data integrity is maintained. This is the reason why you don’t magically get infinite clock speeds. There’s more to it in modern devices (they are multi, triple layer cells etc) and there’s a lot of algorithmic stuff that goes on to it.
There’s a lot of hardwork in making a reliable working digital system, but it’s even harder when you get into analog systems. The problem with analog/mixed signal systems though is that it’s not merely working on thresholds. A fair amount of noise may be mostly harmless in a digital system but will cause significant issues with an analog/mixed signal systems as every single flaw/deviation will cause deviations in the analog circuit (the dacs) and later get amplified in the buffer and amplification stages. So any of the activity you do has a potential manifestation in the analog circuit, and any task that reduces noise at source can be beneficial. Grounds act as common points to transfer noise from one place to another. You can claim optical isolation but it is more fairytale than reality. They have their own jitter and noise footprints and any attempt to correct it will have its own jitter and noise footprints. If you’re thinking transformer coupled isolation, they have non linearities (real world magnets don’t magically follow an ideal abstractions), and other leakage phenomenon (AC noise leakage over copper ethernet has been measured and demonstrated). And I would like to add that the improvements to SQ by this player is audible even through ifi micro idsd bl which does have some form of galvanic isolation afaik.
Any circuit can always be tweaked to fake numbers to specific scenarios while not being truly capable in other scenarios, and hence measurement charts get unreliable. It is impossible to get full test coverage for any analog design at present. I think of audio measurements generally shown to be similar to some vague synthetic CPU benchmark tweaked to show as if a cell Phone CPU beats a supercomputer (Maybe it does at that specific calculation in that specific optimized software, but not likely for a real world task that the cell phone CPU cannot handle, or an emulation layer on the supercomputer with the same code might run faster!).
Yes there are massive amount of layers, buffers and Phys present through the chain. And of course software abstractions, and each abstraction layer = generally longer non optimal code = more processor and component activity = more switching noise, and of course there’s more considering the speculative execution etc and these are accounted for with many of these audio software. There are many of them try to work at a lower level language with less abstractions (some written in even assembly level language code), and hence lesser noise (one general example is using kernel streaming). So the whole thing actually reinforces the benefits of a customized software system.
It indeed is phenomenal that the data storage access noise seems to pass through all these layers but if you consider the path, none of them have anything to compensate for the fluctuations, and as long as it is within thresholds of digital circuit operation it’ll be passed through (but analog and mixed signal systems are picky). It indeed is profound that this distinct improvement is not buried within noise generated from the rest of the link.
Now if you were considering issues from other CPU activity during idle tasks, like say displaying a wallpaper, it would be a gross approximation to think CPU generates all pixels at every instant of time and loads into GPU memory for displaying, then there is no purpose for a GPU. GPU has a parallel pipeline to generate these, has its own architecture that might have its own noise patterns (need not be as high as cpu for the same task) and send via hdmi port, but it could very well be almost completely decoupled from the CPU data lines going to USB! Do they influence each other? Very likely. Can one completely mask the differences of the other? May or may not be! It’s about reducing issues in any area that is feasible. There’s also something known as correlation. Certain types of noise correlate more to audio issues (8khz tizz from 125us polling if the system priority is too high, or other issues which cause sudden spike during these polling) than others. So it’s not quite as direct as things may seem, and of course this area is too profound so we don’t have any well established conclusive correlation metrics yet (and unlikely anytime soon, we haven’t even figured out human hearing beyond a certain basic abstraction). Also not to mention, a lot of the computer tweaks do have modes to remove image displaying load on the cpu, or even going fully headless/commandline.
What about the abundance of switching components throughout the motherboard? PC pcb design is generally very high level stuff (very large multi layer PCB), and the power supply design (regulators etc) is are extremely sophisticated, especially the ones feeding the CPU. A 12V supply is regulated in multiple stages to ensure that there is enough buffer in place to take any disruption that changes power consumption would bring and it is generally very low noise because it’ll have to run through multiple layers in the CPU. Can they be improved by a better power supply input? Surely yes, and a better power supply input can also help the rest of the pcb, but I will have to say they are generally extremely well designed. There’s massive developments on this front on the low power area, and it has also been successfully expanded to certain areas in audio - The new Burson Audio amps uses a SMPS design that sounds very good. You can afford to do this much level of buffering and filtering because it is power (a specific fixed voltage and current with some transient deviation). But you can’t do this multiple levels with data which is a switching sequence of pulses or else you’ll be losing speed. There’s not much ways to fully control the noise on the data line other than controlling your software.
Ok why not a raspberry pi instead? Well just because something is lower power doesn’t necessarily mean it is lower noise. The consideration in most budget SBCs are mass production at a very affordable price and the components used are unlikely to be of any quality comparable to say a high end motherboard, let alone a server motherboard. In fact you’ll likely be getting worse aberrations even on the data integrity (unlikely to be an issue with data rate of audio though) and will need just as much software changes/usability compromises anyway. As mentioned above, the research on the components for Desktop Motherboards are extremely high level. One can try to customize everything from ground, like many companies doing digital transports do, but it’ll get crazy expensive pretty quickly, or leverage all the development on Desktop PCs, and just try to control the few aspects they didn’t optimize for with respect to audio and noise (will have to give up speed and ease of use in that scenario, but just a reboot into another OS and you’re back with a fully functional PC that can be used for any other task).