One of the topics that always comes up when objectivists from ASR and elsewhere are challenged is to present the alternative to machine measurement being blind or DBX listening tests. The conversation usually devolves into a recitation of 1) the general failure to back up claims of “sounds better” with procedurally and statistically valid proof, and 2) when such studies have been attempted and published they are generally inconclusive or fail to refute the null hypothesis that there is no difference between pieces of equipment A and B. Over time, these “tests” have been done with human subjects on electronics and cables, with an often cited study conducted in the 1980s by a marketing professional testing perceived differences in the sound of various amplifiers across a range of prices.
I have been perplexed by “findings” that routinely show no perceived difference between different gear or wires when limited blind tests I have done with friends show clear differences (not always tracking linearly with price). When I report the findings of my admittedly amateur analysis as a counterpoint to the objectivists, my methods are invariably questioned. The consensus feedback is “you must be wrong, you made a mistake, or you are lying”.
So what is going on here? Why do so many in this hobby insist they hear subtle and not so subtle differences when they swap a cable or a DAC? Especially when objectivists like Amir can’t measure a difference THAT SHOULD MATTER, and what blind testing that has been done is either inconclusive or unsupportive of a real performance difference. I have some ideas and they generally fall into these categories: 1) some people have better hearing or listening skills (this is where the original wine tasting analogy in this thread makes a comeback), 2) the gear, room and/or listening position are not optimized to allow for a difference to be heard, and 3) the experimental or sampling design is flawed and has not been optimized to detect subtle differences.
I will explain. The portion of the world population that are “audiophiles” is small. And just because a person likes music and can afford to lavish time and money on the hobby does not guarantee they have strong listening skills - and if they love going to rock concerts or even spent a lifetime on a classical concert stage or in practice rooms, their hearing may be compromised. In the general public, there is a gradient of hearing acuity but generally poor training in listening to reproduced music, so who is included in such a test matters.
Choice of gear matters. What is the synergy between the components selected? Are they selected in such a way to accentuate the differences in the piece of equipment being assessed? Is the room well designed and acoustically neutral? is the power supply from the wall clean and adequate? Does the system have power conditioning? Is the seating position optimized to allow for maximum resolution by the test subject? Testing a group of people in a room at the same time where only one person is located in the sweet spot of the speakers would not be the ideal way to test the soundstage reproduction characteristics of a DAC or cable. Using headphones for the test could reduce this variable, but headphones are generally a poor substitute for well-set up and sourced speakers in good room for testing reproduction of soundstage.
Finally, what is the test regime for the subjects. Are they allowed adequate time to acclimate to the sound characteristics of the system and room before making a change for the test? Most audiophiles spend multiple days analyzing the sound of a new component and cable, swapping them in and out before deciding if there is a difference or an improvement.
If I was tasked with developing listening tests for high end audio gear, I would screen the listeners to determine both their hearing acuity and their listening skills. The target audience for these products are not teenagers listening to poorly produced streamed music with their smart phone and ear buds on the school bus. I would select the system, the room and the seating position to optimize any inherent differences in the items to be tested. I would probably throw in some headphone listening as well to remove many of those variables, as there are elements of reproduction that headphones excel at. And I would partner with an expert in human subject tests, and optimize the testing regime to maximize the likelihood of a statistically valid test of the null hypothesis that there is no difference between item A and item B or no change (X) with adequate controls.
That is a lot of work to determine with some statistical rigor if a Mola Mola DAC sounds better than, the same as, different from, or worse than a Topping DAC. I have done blind wine tastings as part of a wine club at one time in my life. I found that I was not the most adept at detecting differences in the different wines and when I could, I did not have the vocabulary to describe what I was tasting. It was enough for me to recognize that I liked the stuff in bag number #3 and compete with others to drain that first. The most pompous member of our group had gone to Harvard from kindergarten through PhD, and thought he had an excellent pallet. He did not. His very humble wife however did have an excellent pallet, and routinely identified differences in the wine and the associated characteristics on a tasting wheel.
In wine tasting as in audio listening, some people “have it” and some people do not, but many people can enjoy drinking or listening in their own way. As the OP noted, engineering and food science can be applied to assure a certain level of quality, but there are very many other variables that go into how we experience and enjoy wine and hifi. I find in moderation they are best enjoyed together.
kn