The types of tests I'm referring to would be quantification of subjective responses. And the test would have to consist of more than simply saying this is Tidal on my system or this is a ripped CD on my system. I've only been at this a few months and can do that sort of thing myself.
The type of test I would find useful is testing several controls; several sets of cables or several streaming services, a few DACs etc. Common criteria would be set and presented on paper with gradations on a scale of 1-10 or 1-5 for each presentation of the variable to be ranked by the subjects. Music (one piece) would be played to test subjects multiple times for each variable in randomized order, in other words you could hear the same variable several times in a row, etc. The test would have to be double blinded meaning the test designer would not know which variables were being tested at the time of the test. That would be the only significantly tricky part.
For grins I would have each variable reviewed by each of the test subjects in open fashion (they would know what they are reviewing and would not be comparing it to anything else at the time they listened) before the double blinded test.
All of this could be completed in an afternoon with one system in one room with maybe one or two "blinded" assistants and a computer to randomize the order of variable presentation.
Assemble the data and report it. Application of statistical analysis would probably not even be needed if the number of subjects were small enough, say 8 or 10.
Such a study would have some inherent weaknesses and its scope would be very narrow. There is a chance that the data would not give a clear statement on the variables........but as stated above THAT in itself would be valuable to those looking for something other than someone's opinion about potentially expensive gear and/or services.