Why HiFi Gear Measurements Are Misleading (yes ASR talking to you…)


About 25 years ago I was inside a large room with an A-frame ceiling and large skylights, during the Perseid Meteor Shower that happens every August. This one time was like no other, for two reasons: 1) There were large, red, fragmenting streaks multiple times a minute with illuminated smoke trails, and 2) I could hear them.

Yes, each meteor produced a sizzling sound, like the sound of a frying pan.

Amazed, I Googled this phenomena and found that many people reported hearing this same sizzling sound associated with meteors streaking across the sky. In response, scientists and astrophysicists said it was all in our heads. That, it was totally impossible. Why? Because of the distance between the meteor and the observer. Physics does not allow sound to travel fast enough to hear the sound at the same time that the meteor streaks across the sky. Case closed.

ASR would have agreed with this sound reasoning based in elementary science.

Fast forward a few decades. The scientists were wrong. Turns out, the sound was caused by radiation emitted by the meteors, traveling at the speed of light, and interacting with metallic objects near the observer, even if the observer is indoors. Producing a sizzling sound. This was actually recorded audibly by researchers along with the recording of the radiation. You can look this up easily and listen to the recordings.

Takeaway - trust your senses! Science doesn’t always measure the right things, in the right ways, to fully explain what we are sensing. Therefore your sensory input comes first. You can try to figure out the science later.

I’m not trying to start an argument or make people upset. Just sharing an experience that reinforces my personal way of thinking. Others of course are free to trust the science over their senses. I know this bothers some but I really couldn’t be bothered by that. The folks at ASR are smart people too.

nyev

@mastering92 

@amir_asr

mp3s are not great. Sure, you could fool someone in to thinking that 2 files are the same on a smartphone over bluetooth, but upon further inspection; in a more resolving system, you could tell the original .wav file and .mp3 file apart easily, no matter what the kbps was, even 320 kbps.

Self-bragging is common when it comes to lossy compression.  Problem is, when most of you are put to blind tests, you flunk being able to tell the source from the compressed one.  And no, resolving system has nothing to do with it.  The fact that you say that tells me you don't know what it takes to hear such differences.  As a trained listener in this domain, I can tell differences with just about any headphone on any system.  

While it is true that MP3 was not designed to be transparent, at high bitrates, especially at 320 kbps, it easily fools even the most ardent audiophiles.  I know because we have tested them.  While at Microsoft, I told my signal processing manager to recruit the large body of audiophiles we had there for testing our lossy codec.  We ran a large scale test among our self-selected audiophile group.  Results were embarrassing for me as an audiophile.  None could remotely match our trained but non-audiophile listeners.  

To hear those impairments, you need to learn to hear them.  It does not come naturally to audiophiles.  This learning also involves understanding of the algorithms and where the weak points may be.

I have lost count how many times a presenter at an audio show has whispered to me that they were playing lossy audio to audiophiles who had no idea, thinking they were uncompressed content!

You may be the exception -- there is a small percentage of audiophiles who are good at this.  To prove that, you need to provide results of a double blind test to show that and not just claim it.  Here is an example of me passing such test:

foo_abx 2.0 beta 4 report
foobar2000 v1.3.5
2015-01-05 20:26:27

File A: On_The_Street_Where_You_Live_A2.mp3
SHA1: 21f894d14e89d7176732d1bd4170e4aa39d289a3
File B: On_The_Street_Where_You_Live_A2.wav
SHA1: 3f060f9eb94eb20fc673987c631e6c57c8e7892f

Output:
DS : Primary Sound Driver

20:26:27 : Test started.
20:27:01 : 01/01
20:27:09 : 02/02
20:27:16 : 03/03
20:27:22 : 04/04
20:27:28 : 05/05
20:27:34 : 06/06
20:27:40 : 06/07
20:27:51 : 07/08
20:28:01 : 08/09
20:28:09 : 09/10
20:28:09 : Test finished.

----------
Total: 9/10
Probability that you were guessing: 1.1%

-- signature --
7a3d0c1aaaf8321306ff6cfdd1f91ff68f828a54

So please don't make such assertions unless you have evidence to back it.  Fish stories in audio are quite common.  Reliable facts, not so much....

@jayctoy

If I remember Robert Harley was warned by either Mike Moffat or Jason from Schiit audio that yagdrassil dac don’t have good measurements ? It turned out the yagdrassil got a stellar review from Robert Harley.

An outcome which he hands to any product that costs money. Why would that in any form or fashion mean anything in the context of measurements being reliable or not? I only recommend 1/3 of all products I review. What is that percentage for RH? 99%? 100%

BTW, Schiit publishes measurements of everything they build now. As noted, this came from work that we did at ASR, highlighting the value of such things. So I would not pick this as an example of anything.

 

@amir_asr ,

 

I think the test you gave yourself is too easy 😀  Your point is well taken in regards to trained listeners and MP3. I think a better test is to serve up 10 different tracks which may be MP3 or may be wave, and then test how well listeners do at accurately assessing if the track is compressed or not. I wonder if even the trained listeners will be challenged in that case without a reference.

@nyev 

All this to say, that sometimes, maybe a lesser performing component may sound better given lesser surrounding components in one’s system. Which again, judging a piece by measurements would never provide any useful context in these situations.

Thank you for another measured response.  On this point, we never foreclose such possibility.  All we ask that evidence of sonic superiority come in the form of only auditory senses.  And have statistical rigor to be reliable.  Should this evidence come about, we certainly will throw out the current measurements and investigate further what is going on.

What we face unfortunately is so and so says it sounds good.  Well, I can get you people who say the opposite.  And neither can be shown to be reliable.  Someone mentioned Robert Harley.  He raved about my Mark Levinson amplifiers.  Stereophile editors hated it.  To their credit, they showed some measurement issues although that was not normative with respect to comments from subjective reviewer (which no doubt was biased against class D amps).

So what we ask is simple: please conduct a test where all factors have been removed other than sonic fidelity of the two devices.  Match levels.  Play the same content on the same system.  And repat at least 10 times and see if you can get 8 out of 10 right.  It is not much to ask for as I perform many blind tests to add back up to my arguments.  If that is too much, and it can be, then we better run with a) measurements of the device and b) science and engineering behind how the device works and why something would or would not be audible.

I should note that when it comes to transducers, are measurements are less predictive and I have in a number of occasions liked something that didn't measure well.  Wilson Tunetot speaker review was one such speaker.  It was an expensive bookshelf speaker ($12K), so would have been easy to go with the flow of bad measurements and expensive so let's damn it.  But I could not in my listening tests and reported that.  Got heat for it from my own crowd but so be it.

 

@thespeakerdude 

I think the test you gave yourself is too easy 😀  Your point is well taken in regards to trained listeners and MP3. I think a better test is to serve up 10 different tracks which may be MP3 or may be wave, and then test how well listeners do at accurately assessing if the track is compressed or not. I wonder if even the trained listeners will be challenged in that case without a reference.

I didn't give that test to myself.  I was challenged on a major forum by an objectivist to be able to tell MP3 from original with him claiming that no one could.  At the same time, there had been a challenged on that forum to tell 16 bit content from 24 bit.  Content for that was produced by AIX records which is well known for quality of its productions.  So to remove any appearance of bias in selection of material, I grabbed the clips from that test and compressed them to MP3.  And post those results.  The clip was not at all "a codec killer" where such differences are easier to hear.

On the type of test you mention, I am not a fan of them for the reason you mention.  It is harder to identify the original vs compressed that way because you have to now know what the algorithm does to create or hide sounds.  In other words, is an artifact part of the original content or was it removed.  

Our goal with listening tests should always be to try and find differences, not make it hard for people to find what is there.  Because once we know an artifact exists, we can fix it.  Making the test harder to pass goes counter to that.

That said, I and many others were challenged to such a test on the same major site above.  We were given a handful of clips and asked to find which is which.  Results were privately shared with the test conductor.  When I shared my outcome, he told me I did not do all that well!  I was surprised as I was sure two of the clips were identical and thought that was put in there as a control.

Fast forward to when the results are published and wouldn't you know it, I was "wrong."  We had a regular member with huge reputation for mixing soundtracks for major films and he got it "right."  Puzzled, I performed a binary comparison and showed that the two files were identical!  Test conductor was shocked.  He went and checked and found out that he had uploaded the same file twice!  He declared the test faulty and that was that.

Despite that, as you saw, I will repeat again, I don't want to make blind tests too hard on purpose.  We need to be interested as much in positive outcomes as negative.