In defense of ABX testing


We Audiophiles need to get ourselves out of the stoneage, reject mythology, and say goodbye to superstition. Especially the reviewers, who do us a disservice by endlessly writing articles claiming the latest tweak or gadget revolutionized the sound of their system. Likewise, any reviewer who claims that ABX testing is not applicable to high end audio needs to find a new career path. Like anything, there is a right way and many wrong ways. Hail Science!

Here's an interesting thread on the hydrogenaudio website:

http://www.hydrogenaud.io/forums/index.php?showtopic=108062

This caught my eye in particular:

"The problem with sighted evaluations is very visible in consumer high end audio, where all sorts of very poorly trained listeners claim that they have heard differences that, in technical terms are impossibly small or non existent.

The corresponding problem is that blind tests deal with this problem of false positives very effectively, but can easily produce false negatives."
psag
Psag,

Sorry, but I don't buy your last post. It makes absolutely no sense. You mean to tell me that the concept of probability, a simple coin toss, can't be addressed buy you because you feel its pseudoscience? You don't have to be an actuary to understand the concept, they teach it in grade school. And then to go on with all the subjective issues when I clearly stated that the whole purpose of the test was to was to not go there.

Given the above, the only thing that makes your response understandable, is if there is some type of deceit involved. I guess that maybe you don't like me from another thread, or you have a technical background that your ego needs to support, or something. Whatever it is, I don't believe you don't understand something so simple. I believe that you won't understand something so simple.

Sorry, but I just don't see it any other way.
Zd542, I think statistics is somewhat of a pseudoscience, so I can't completely address what you are saying, other than to offer my own experience. When there is a significant difference between two listening test scenarios, that difference is quickly evident, and it will be evident each and every time. Were that not the case, the difference would not be significant. Obviously I'm not talking about the textbook definition of 'significant difference'. There's not much point in making changes to an audio system if the differences approach the statistical limit of significance.
"01-29-15: Psag
I agree that only a limited number of switches are needed, if the test conditions are good. What are good test conditions?: A treated room with good acoustics, high quality electronics, well-recorded music, the ability to do rapid switching (having a second person to manipulate the hardware helps), and familiarity with the musical selections. That's all you need to eliminate subjectivity and get to the truth."

What I was referring to was a very simple test. You have 2 cables in the system, 1 is copper, the other silver. The goal was to see if you could pick out the silver or copper, and that's it. Nothing subjective like what cable sounds better. That's just personal preference. So after you hear a 10 second clip of music, you say copper or silver. With such a small sample, you can't really weed out things that may produce bad results. For example, lets say that there really was no difference that a test subject could hear between the 2 cables. That would mean, both cables would sound identical. But we won't know that until after the test. That would also mean, every answer given would only be right by pure chance. So for this test, since there are only 2 answers, and going by the assumption that there is no difference, over time, the answers would have to conform to a 50/50 split. If we only got 10 samples under this scenario, there's a really good chance you wouldn't get a 50/50 split with just 10 tries. With 100 tries, you get much closer. An easy way to visualize, or even try this concept to see for your self, would be to filp a coin. Flip it 10x, and even though you should get 5 heads and 5 tails, with so few tries, you can easily get different results. The only way to reduce this type of error is to take a larger sample. Flip a coin 100 times, and you'll get much closer to the 50/50 split that you would expect to get from just pure chance.
I agree that only a limited number of switches are needed, if the test conditions are good. What are good test conditions?: A treated room with good acoustics, high quality electronics, well-recorded music, the ability to do rapid switching (having a second person to manipulate the hardware helps), and familiarity with the musical selections. That's all you need to eliminate subjectivity and get to the truth.
Zd542; I get what you are saying in response to my post. The closest I have seen is as I mentioned. Reviewers listening to one piece and some music, then swapping it out for another piece, without changing anything else and listening again. My point earlier and using the wine example was that blind A/B testing would show that most reviewers (not all) have no clothes and they can't have that. so the best I can hope for in this day is what I mentioned earlier.

However, companies respond to letters and not so much to phone calls and posts on chatboards. So maybe more letters to the magazines requesting A/B blind testing may help

enjoy
"Your last post leaves me with the impression you do not think there are differences in cables and therefore they cannot be heard, especially when you get defensive when I asked the results of your controlled listening tests. Beats me why you wouldn't want to disclose the results."

You're allowed to have any impression you like. It has nothing to do with me. It's your choice, not mine. As for the reason why I don't want to disclose the results, once again, I already gave it. It was clearly stated in my last post. Here it is again.

"And as to the results of the test's, its not relative to this discussion. You only want me to list the results so you can comb through them to find the slightest detail just so you can claim the whole thing is null and void, so you get to be right."
Post removed 
After my last post something tells me I won't get anywhere, but here go.

"if test are not done scientifically and are not based on "opinions" they really aren't real to me. How does one measure whether the equipment accurately demonstrated the sound stage depth? dimensionality? etc. I hear many opinions of the reviewers, but based on what? What criteria? are you going by memory in your opinions and comparisons? or did you listen intently and then switch out that amp with another (without changing anything else) and listen again?

I have read of some reviews that do exactly that. And the equipment they are reviewing is compared to similar equipment within the price point. That is alright for me. But, I still prefer an A/B comparison test that is blind to really identify the sonic differences in an unbiased way.

In the first paragraph, you're talking about subjective qualities that the reviewers are discussing. We all know that those qualities mentioned can't be measured, so what would you have the reviewer do? We're supposed to be adults here. When I read a review its not too difficult to pick out the things that are purely subjective in nature. Yes, they are listening to the component and writing their subjective opinion as to what they heard. Here's the 1 detail that many people miss. Most of the people that read the reviews, the magazines customers, know this is how they do it, and its not a perfect process, but they still want the review anyway. Any why not? Why do you think they bought the magazine to begin with?

This caught my eye in particular.

"But, I still prefer an A/B comparison test that is blind to really identify the sonic differences in an unbiased way.
"

You say that you prefer this type of blind testing like there are some reviewers that are doing it. I've never seen any reviewers do this. Where are you finding them? I'm more than willing to give them a chance. If they can show me some testing that helps make a better decision, I'm all for it.
"And the findings, results, of the listening test?

Just curious what is behind your thinking of needing so many samples for your listening test?"

The real issue here is that you don't care about any tests that were done. You've got your emotions tied up in all this and just want to win the argument, and be right.

You ask me why I needed so many samples for my tests. Not only did I already give the the answer in a prior thread, you quoted it in your last post! Here it is. Maybe you'll remember it this time.

"So, for example, if you were trying to test to see if a difference can be heard between a silver cable and a copper cable, all other things being equal, maybe have them listen to 50 or 100 samples. Maybe they can get lucky and guess correctly for 5 or 10, but 100 is highly unlikely. "

I thought I was pretty clear, but I'll try to explain it again. With 5 or 10 samples in a simple yes or no test, I thought it wouldn't be out of the question to get an inaccurate score due to error's or guessing. A small sampling really leaves no margin for error. I mean if I flip a coin 10 times, what are the chances of you getting 5 heads and 5 tails and average 50% like the test should? There's a very high probability you won't average 50% with such a small sample. Flip the coin 100 times and you will get much closer to the statistically accurate 50% that you should be getting. Now, just 1 last time to be extremely clear, If I flipped a coin 10x, you have a much greater chance of guessing something other than 50% than if I was to flip it 100x. That's why I needed so many samples. Didn't you have to take statistics in college?

And as to the results of the test's, its not relative to this discussion. You only want me to list the results so you can comb through them to find the slightest detail just so you can claim the whole thing is null and void, so you get to be right. I won't play that game. You're just going to have to continue playing with yourself like you've been doing.
This to me is very indicative of the people in power or the ones that are the "experts" wanting to stay that way. Remember the attitude in the sixties and early seventies regarding wines and how the "experts" continuously stated that French wines were the best and everyone else's was not very good? It wasn't until the "judgement in Paris" happened that the world realized that opinions were changed dramatically when blind testing occurred. There I absolutely no scientifically logical explanation why blind testing isn't the best comparison method.

Of course it has to be an apples to apples comparison. This to me means price point testing. Just like cars. Pick a price point, get the equipment that falls within that price range and go at it. But, tube lovers will pick tube equipment most of the time based on knowing what they are hearing ahead of time. Same is true for solid state lovers. But, blind testing? within price points? Lets see what the experts say then. But, the "experts" don't what to do that because the would show people that many of them (absolutely not all of them) are frauds.

if test are not done scientifically and are not based on "opinions" they really aren't real to me. How does one measure whether the equipment accurately demonstrated the sound stage depth? dimensionality? etc. I hear many opinions of the reviewers, but based on what? What criteria? are you going by memory in your opinions and comparisons? or did you listen intently and then switch out that amp with another (without changing anything else) and listen again?

I have read of some reviews that do exactly that. And the equipment they are reviewing is compared to similar equipment within the price point. That is alright for me. But, I still prefer an A/B comparison test that is blind to really identify the sonic differences in an unbiased way.

enjoy
Post removed 
"50 to 100 samples you say? Why not make it 100 to 200? Over how many years do you expect your ABX listening test experiment to take?

Just curious have you ever A/B compared 2 or 3 cables to one another? More than 3 or 4 at a time? Could you hear audible differences between the cables?"

Yes, I have. I did an experiment a few years ago and compared AQ Cheetah IC's to a pair of AQ Panther IC's. Both cables are identical except for the conductors themselves. One silver, one copper. The goal was to see of a difference could be heard between the 2 metals, and nothing else. It wasn't about what one sounded better, just if there was a difference. There was 4 of us took the test and we listened to 100 samples of a 10 second audio clip that took around 30-40 minutes for each of us.

"50 to 100 samples.... Do you believe there are people in the world that can tell which key of a piano is struck on a tuned grand piano in a blind test? Do you think their brain learned the sound of each key in the span of a week or so, or even a few months or so? How about in a year? "

Actually yes, and I can prove it. My brother has something called perfect pitch. He can tell with 100% accuracy what any note or cord played on any instrument is, and if its in tune or not. I don't have it myself, but if you have ever played an instrument, you can develop something called relative pitch. Its not as good as perfect pitch, but its a skill that can be learned. For me, I needed to develop the skill somewhat when I played drums in school. If you have ever seen kettle drums or tympani, they have to be tuned to a certain note when you play them. That is what the food pedal is for, it sets tension on the drum head. Anyway, you have to be able to set the drums to different notes while the band is playing. To do this, you tap it very lightly (because the band is playing), and hopefully tune it to the right note before you need to play it. Its not an easy thing to do, but its a skill that can be learned.

This bleeding thing is a bit off topic, but since it keeps coming up, I thought I'd clarify that issue. (because I have no good opinion on the ABX thing)

During ancient and medieval times doctors believed in "the humor theory". It's pretty complicated and a bit funny by modern standards, but the short explanation is that the blood carries liquids called "humors". A sick person has bad humors in their blood and you have to let it out. Thus "blood letting". A person in good humors is healthy.

This theory died out in the 1700's and early 1800's, when the new and wonderful "germ theory" of disease became more popular.

For medical testing, doctors would normally draw a sample of the blood and check the color and taste. Good humors taste good, I imagine. =-}

http://en.wikipedia.org/wiki/Humorism

I read a lot of ancient writings...
Post removed 
"LOL, but doctors in that time period of history didn't know any better."

That was my point. The only thing they knew for sure about blood was that if you loose too much of it, you die. Armed with only that info, I stand by my statement. lol. Maybe there was some logic to it, but I don't see. I guess its possible that every time a doctor saw a bleeding, injured person, he thought that was the body getting rid of some excess blood. Why blame it on the arrow stuck in the arm.

About the rest of your post, my comments on all this, in context, is in reference to the thread the OP mentioned on Hydrogen Audio. The complaint was that reviewers were listening to audio components and then basing their review on what they heard, and not doing any type of scientific listening tests. They've complaining about this very thing for years. My comment was that if these types of test are so damn important, then just do them already. Even if its just a few tests just to show us all how to do it. Instead, its just year after year complaining, and they never do anything. That said, if they can come up with some kind of useful tool to help better evaluate audio equipment, I know I would be interested in seeing it. Why not? My personal opinion is that they don't have the guts to do anything they talk about. There's always the chance they would be wrong. They have too much invested in the argument.

"In ABX testing, the definition of the cumulative response is trivial: can reliable hear a difference or not. But because the stimulus is likely to be imprecisely defined, absence of reliably making the distinction does not mean there is no difference."

I agree with that. My view is that you would have to tell the test subjects what they're listening for. There's really no way around it, they have to know. To offset results that are not accurate, you can increase the number of tests, or try's, each subject takes. So, for example, if you were trying to test to see if a difference can be heard between a silver cable and a copper cable, all other things being equal, maybe have them listen to 50 or 100 samples. Maybe they can get lucky and guess correctly for 5 or 10, but 100 is highly unlikely. Not only that, under the same scenario, you can tell them exactly what they should be listening for. If there is really no difference to be heard, over time/individual tries, the test subjects will have to trend towards a 50/50 split. It won't matter what they think, or know they can hear.
One of the difficulties in behavioral research is precise definition of stimulus-response. The more complex the stimulus, the less precise the definition. In ABX testing, the definition of the cumulative response is trivial: can reliable hear a difference or not. But because the stimulus is likely to be imprecisely defined, absence of reliably making the distinction does not mean there is no difference. For music, Gestalt seems too relevant. That's one of the reasons we know so much about what a rat is likely to do in a maze and so little about what a kid is likely to do in a classroom.

db
Jea48, I think if they did it the way you recommend, then we'd have nothing to argue about. :-)
Post removed 
Its a tool. No tool is perfect. If it fits the task at hand use it. Just don't expect anyone else will draw the same conclusion. They may or may not and it should not matter other than as a point of interest. Only you can hear what you hear. No one else. Being of the same species, we all hear similarly perhaps but not exactly the same. Some differences may be major others subtle. The subtle ones will probably never be measured or quantified so just forget about it.
"01-20-15: Jea48
Back in the medieval days was it science or logic for the times when doctors used to bleed a patient saying the patient had too much blood?"

Neither. It was stupidity. They has no way of knowing how much blood was too much blood. The only thing they knew for sure was that if you lost enough blood, you died.

"01-20-15: Onhwy61
Zd542, I think I get your point, but the earth is flat is not good example. Ancient Egyptians and Greeks figured out that the earth was round via observation and logic."

Yes, but not every one knew that. And without knowing the earth is round, it is logical to assume that its flat. From they're perspective, that's how the world appeared. Also, and more important, its true that Ancient Egyptians and Greeks figured out the earth was round. But it only became logical assumption until after study/observation was done. They got direct results that proved otherwise.

"I thought the point of A/B testing was to determine if there was a difference, not a preference?"

Absolutely. If I said otherwise, point it out because its a mistake. Even though my view is that listening is the most important part of evaluating an audio component, I don't see why A/B testing, to root out differences, if any, is not worthwhile. Especially when the differences are small. For example, a test would come in handy if a reviewer has a difficult time hearing a difference between 2 products. We've all been there. Sometimes, its hard to tell. Having some type of conclusive data concerning these areas can help. But still, if a reviewer included this type of data in a review, it would be important to list exactly how the test was done and with what type of equipment. The reason, of course, is that not everyone has the same equipment, hearing ability and listening skills. And while not perfect, test results can be used as an aid, just like measurements. Something to help with a selection but not to be taken absolutely.
Zd542, I think I get your point, but the earth is flat is not good example. Ancient Egyptians and Greeks figured out that the earth was round via observation and logic.

I thought the point of A/B testing was to determine if there was a difference, not a preference?
Post removed 
"Probably I shouldn't have used the word 'science', which seems to get people in an uproar. Perhaps a better word would have been 'logic'. It is logical to assume that by using standard ABX testing, one can determine with certainty which of two testing scenarios sounds better. And in fact, that assumption turns out to be true."

No. Its not logical to assume that, and the assumption is not true. Science and logic are not the same thing. Science proves the earth is round while logic says its flat. Before science proved this, it was logical to assume you could fall off the edge of the earth if you went far enough in one direction.
All things considered, preferences play a major but under appreciated role. We can all agree on specs but in the final analysis we'll pick what we like.

All the best,
Nonoise
Granted, sighted tests produce false positives, but I would posit that stereo equipment exists to create pleasure in the brain, much like the role of cigarettes. This pleasure can be difficult to capture in the conscious mind, and does not lend itself to discernment, comparing, and "telling." When I smoked, there was no conscious altering of state. I could not say, "I am under the influence of nicotine." I could not say, "This cigarette is better nicotine." The brain just liked nicotine, and cigarettes were a delivery mechanism.

If you removed nicotine from one pack of cigarettes and not the other, and asked me to discern between the two packs, I don't believe I could, but then is the discernment between which cigarette is which really the measure of the related pleasure? If I kept the two packs for a couple of days, I would most likely want more of the pack with nicotine, but I would have a heck of a time telling you why. My failure to discern could lead an observer to conclude that my love of one pack was an illusion.

Why couldn't there be a similar behavior in the pleasure of music delivered through various devices? Why couldn't the brain-level pleasure of music be as difficult to discern as nicotine? "I smoked AmpX and AmpY, and really couldn't 'tell' the difference, but I listen to AmpX three hours a day."
Post removed 
Post removed 
Swampwalker, totally in the playful spirit intended.

In fact, after posting, I looked back at it and it seemed a bit succinct and could be interpreted in lots of ways. I should have followed it with a :-)

All the best,
Nonoise
"Especially the reviewers, who do us a disservice by endlessly writing articles claiming the latest tweak or gadget revolutionized the sound of their system."

"Can you show us some scientificly valid listening tests that were done comparing individual components as part of a review?"

Probably I shouldn't have used the word 'science', which seems to get people in an uproar. Perhaps a better word would have been 'logic'. It is logical to assume that by using standard ABX testing, one can determine with certainty which of two testing scenarios sounds better. And in fact, that assumption turns out to be true.

If reviewers did this, they could start to build trust with their readers, and the 'snakeoil' aspect of high end audio might start to diminish.
Nonoise- I hope you took my post in the (playful) spirit that was intended. Not sure I'm ready to go the vegetarian route, but you are correct; this horse surely has been sufficiently beaten.
Nonoise
One can't have a universal truth

If you are correct, Nonoise, apparently that would mean that you are also incorrect;-), which means that there can be a universal truth????? Perhaps one of our resident philosophers can chime in. Nandric, are you out there?
Since the flatter and more accurate frequency response was the most desirable I can see why speaker manufacturers should heed the results and simply strive to make speakers that behave as such. It just makes sense.

I also like your note that this is all subjective.
One can't have a universal truth (if I understand that correctly).

All the best,
Nonoise
I'm not sure I agree about your frequency response predictability since a given speaker can sound dramatically different from one room to the next. This is the objection to anechoic chamber frequency response measurements, I.e., they have no relation to real world performance. The variability of frequency response room to room was what created an instant demand for equalizers, not to mention room treatments in general, you know, tube traps, Mpingo discs, Shakti Halographs, Corner Tunes, Skyline diffusers, crystals, tiny bowl resonators, SteinMusic Harmonizers, things of that nature.
Post removed 
To properly assess something you have to immerse yourself in it. Moods and attitude can change very often but we still remain who we are. We assess things quickly but it's honed on a casual basis (learning curve) until it becomes second nature to us. By the time we are adults, our senses are mostly perfected on a level necessary to keep us alive, into old age.

Now comes along a hatred of things audio that uses the "scientific" method to deconstruct what we know to be true. The genesis of that hatred can be attributed to many things (envy, the wherewithal to buy, the refusal to relate, my mother ran off with an audio salesman, etc.).

We are constantly debating the manifestation of ABXing and not examining the latency behind it. It's been debunked time and again and yet it keeps rearing it's head turning these forums into another game of whack-a mole as new angles are tried.

All the best,
Nonoise
To assume that the system used for the test is operating perfectly, that it is sufficiently revealing for the specific test, to assume that listeners have sufficiently good hearing and know what they are listening to or for, these are all unknowns. Going on the basic assumption that most audiophile systems are pretty standard sounding, I.e., generic sounding, it wouldn't surprise me one bit that results of blind controlled tests would tend towards obtaining negative or up inconclusive results. Which is actually pretty much what Olive's speaker evaluation showed.
"01-17-15: Geoffkait
Judging from the AES paper by Olive the listening tests are, in fact, excessively complicated, a criticism he dismisses. Furthermore the listening tests apparently involved only frequency response. What happened to other audiophile parameters such as musicality, transparency, soundstaging ability, dynamics, sweetness, warmth, micro dynamics, pace, rhythm, coherence, to name a few? One supposes testing for those parameters would make the tests way too complicated. Maybe Olive thinks those parameters are too subjective, who knows?"

You couldn't have said it any better. If you read through the Hydrogen posts, those guys get mad because the reviewer listens to a component and puts what he hears into a review. What else would you have them do? I mean the intended purpose of a piece of audio equipment is to use it to listen to music. The nerve!

Bob_reynolds,

You've stated in the past, in no uncertain terms, that you can look at the specs of a component and tell how it sounds, without listening to it. Do you really expect anyone to believe that you can list all the qualities that Geoffkait states in his post without listening to whatever the component is? Its hard enough to do that when you have the piece in your own listening room.
Scientifically valid is the key phrase. There is no agreement in the scientific community with respect to audio tests. In fact, the scientific community could give a rat's behind about audio or audiophiles or testing audiophile devices, any of that. Hel-looo! If someone says he represents the scientific community in any of this controlled blind testing business or any type of testing for that matter, he's just pulling your leg.
"01-17-15: Bob_reynolds
Drs. Floyd Toole and Sean Olive have been doing blind listening tests of loudspeakers for over a decade."

I've seen all these before. I'm assuming everyone else here has too because they are fairly popular. They don't address the issue here, and that's comparing specific pieces of components. Here's a piece of the OP.

"Especially the reviewers, who do us a disservice by endlessly writing articles claiming the latest tweak or gadget revolutionized the sound of their system."

Can you show us some scientificly valid listening tests that were done comparing individual components as part of a review?
Judging from the AES paper by Olive the listening tests are, in fact, excessively complicated, a criticism he dismisses. Furthermore the listening tests apparently involved only frequency response. What happened to other audiophile parameters such as musicality, transparency, soundstaging ability, dynamics, sweetness, warmth, micro dynamics, pace, rhythm, coherence, to name a few? One supposes testing for those parameters would make the tests way too complicated. Maybe Olive thinks those parameters are too subjective, who knows?
Post removed 
This dead horse has been flogged for so long that it's turning me into a vegetarian.
"Nobody is denying the existence of placebo effect or expectation bias or it's ugly sibling the reverse expectation bias or any other such psychological effects. But to declare that there are no proper tests is a little bit inaccurate."

No, its not. If you take my quote that you reference, "For years I've been asking them to show me the tests.", and put it in context with the rest of my statement:

"If you're right, and you really know what you are talking about, pick 2 products, conduct the test, and report your findings in a way so the rest of us can try it for ourselves. That's how a real scientist would do it."

we get a different picture. To me, if you wanted to conduct a scientificly valid listening test to compare 2 audio products, I can't see you going wrong doing it the above way. Not only that, if you read through the thread that the OP referenced, you'll see that, at least some of them, do agree with me. So again, it all boils down to the same exact thing. Show me the test's. If me declaring that there are no proper tests done is a little bit inaccurate, by all means, show me. I'll settle for just 1. And just to be clear, I really have been asking for years. I'm 100% serious. If you really want to, you can check my old AG threads.

Sorry, but before I forget; "I suspect maybe you've been asking the wrong people.". Maybe you're right about that. Who do I ask? I mean if I don't get results at a place like Hydrogen, where its their mission statement to go by science, and will actually censor threads if they don't contain content approved by the moderators, then who do I ask? If you really take a step back and look at this whole issue, the people who talk about doing these listening tests the most, avoid it like the plague. It doesn't even make sense.

Post removed 
I agree with the notion of Science! being applied as a tool for understanding how different equipment influences my perceptions, but I'm not convinced ABX testing is the right way to go about it.

Even if a test population of reasonable size is acquired, large enough to smooth out the differences in loudness perception in the subjects, you'd still only have a result for that particular set of equipment. Perhaps a statistically significant percentage of the subject group could identify MP3 versus APE, but once you swap out a cable or change the room temperature, confidence in the results degrade.

If one wants to apply science here, there first has to be a hypothesis that can be falsified through measured experiment. For example, "all conducting wire measures the exact same frequency response curve with white noise at 90db, regardless of the transducer". This is probably trivial to falsify, but is on the same continuum of the notion that high priced cables are "better" than lamp cord.

If you want to apply science to that or a similar or even more refined question, super. But an ABX test isn't going to get you a definitive result.
Zd542 wrote,

"For years I've been asking them to show me the tests."

I suspect maybe you've been asking the wrong people. Nobody is denying the difficulty in devising tests that work like they're supposed to, I.e. demonstrate whether A is better than B or whatever. Or whether some newfangled device is a fraud. Nobody is denying the existence of placebo effect or expectation bias or it's ugly sibling the reverse expectation bias or any other such psychological effects. But to declare that there are no proper tests is a little bit inaccurate.
"01-15-15: Geoffkait
There is no such thing as a test that can be be generalized. Someone else may get entirely different results. Then which test is correct? And who decides?"

The whole issue is that there are no test's. There never have been and there probably never will be. The title of this thread is: "In defense of ABX testing". What testing? For years I've been asking these people to show me some of the tests they've done to back up what what they say. They can't. The best they can ever do is bring up concepts from psychology like expectation bias and just hang on to that like its the answer. lol. I have a degree in psychology. I know full well what these term mean and how they are applied. And if you think that you're dealing with a case of expectation bias, you still have to test for it. Otherwise you're just guessing.

The reason these guy's won't do any tests is because they know there's a really good chance they'll be wrong and they don't want to look bad. What's the first thing the OP says when I challenge him and his science?

"We Audiophiles need to get ourselves out of the stoneage, reject mythology, and say goodbye to superstition. Especially the reviewers, who do us a disservice by endlessly writing articles claiming the latest tweak or gadget revolutionized the sound of their system. Likewise, any reviewer who claims that ABX testing is not applicable to high end audio needs to find a new career path. Like anything, there is a right way and many wrong ways. Hail Science!"

"Actually they are not my words."

Well, he's probably right since you can't actually own words.