Tbg: For someone who "teaches statistics," you express a rather narrow perspective on the field. Think about how you would use statistics to determine whether a coin is fair. (You do agree that you can use statistics to do this, don't you?) The problem of determining whether a certain subject can hear a difference between two components is precisely the same. Do his results suggest that he was just guessing which was which (the equivalent of flipping a fair coin), or that he could indeed hear a difference (flipping an unbalanced coin). At any rate, it really doesn't matter whether you think statistics is applicable here. People who actually study hearing and do listening tests use statistics for this purpose every day of the week.
I would define undeniable differences as those for which measurements would lead us to predict such differences. If there are measured characteristics of two components that are above the known threshold of human detection, then there's no real need to do a DBT to determine whether they sound different. For example, if one amp has a THD of 0.1%, and the other is at 3%, we can safely assume that they are audibly different. Transducers typically measure differently enough that we can assume they sound different. Ditto many (but not all) tube amps. Solid state amps, unless they are underpowered for the speakers they are driving or have a non-flat frequency response (perhaps due to an impedance mismatch) generally do not.
Before I get tagged with the "measurements are everything" slur, let me say that these measurements can only predict WHETHER two components will sound different. If they do sound different, the measurements cannot tell us (at least not very well) which you will prefer, or even in what ways they will sound different to you.
For more info on DBTs, see the ABX home page, mirrored here:
http://www.pcavtech.com/abx/