After several tests, the designer then made a conscious decision to let it be, since it sounded much better in its original untamed state after extensive listening tests. This is what many of us mean by "listen first and then measure". Putting more emphasis on listening and what sounds best as a means to an end, rather than making graph lines flat.
Oh I perfectly know what you mean. Before starting Audio Science Review, I co-founded a forum specifically focused on high-end audio. Folks there spend more on audio tweaks than most of you spend on your entire system there! That is where @daveyf and I met. So there is nothing you need to tell me about audiophile behaviors this way. I know it.
Here is the problem: there is no proof point that the assertion of said designer is true. You say he did "extensive listening tests." I guarantee that you have no idea what that testing was let alone that it was extensive. What music was used? What power level? What speakers? How many listeners? What is the qualifications of the designer when it comes to hearing impairments?
Story is told and believed. Maybe it is true. Maybe it is not. After all, if he saw a significant measurement error, logic says the odds of it sounding good is low. After all, why else would you tell that story? If the odds are low, then we better have a documented, controlled test that shows that. Not just something told.
BTW, the worse person you want to trust in these things is the person with a vested interest. I don't mean this in a derogatory way. Designer just want to defend their designs and be right. So we best not put our eggs in that basket and ask for proof.
I post this story from Dr. Sean Olive before but seems I have to repeat it. When he arrived from National Research Council to Harman (Revel, JBL, etc.), he was surprised at the strong resistance of both engineering and marketing people at the company:
To my surprise, this mandate met rather strong opposition from some of the more entrenched marketing, sales and engineering staff who felt that, as trained audio professionals, they were immune from the influence of sighted biases.
[...]
The mean loudspeaker ratings and 95% confidence intervals are plotted in Figure 1 for both sighted and blind tests. The sighted tests produced a significant increase in preference ratings for the larger, more expensive loudspeakers G and D. (note: G and D were identical loudspeakers except with different cross-overs, voiced ostensibly for differences in German and Northern European tastes, respectively. The negligible perceptual differences between loudspeakers G and D found in this test resulted in the creation of a single loudspeaker SKU for all of Europe, and the demise of an engineer who specialized in the lost art of German speaker voicing).
You see the problem with improper listening tests and engineer opinions of such products?
These people shun science so much that they never test their hypothesis of what sounds good. Not once they put themselves in a proper listening test. Because if they did, they would sober up and quick! Such was the case with me...
When I was at the height of my listening acuity at Microsoft and could tell that you flushed your toilet two states away :), my signal processing manager asked me if I would evaluate their latest encoder with their latest tuning. I told him it would be faster if he gave me those tuning parameters and I would optimize them with listening and give him the numbers.
I did that after a couple of weeks of testing. The numbers were floating point (had fractions) and I found it necessary to go way deep, optimizing them to half a dozen decimal places. I gave him the numbers and he expressed surprise telling me they don't use the fractions in the algorithm! That made me angry as I could hear the difference even when changing 0.001. I told him the difference was quite audible and I could not believe he couldn't hear them.
This was all in email and next thing I know he sent me a link to two sets of encoded music files and asked me which sounded better. I quickly detected one was clearly better and matched my observations above. I told him in no uncertain terms that one set was better. Here is the problem: he told me the files were identical!
I could not believe it. So I listened again and the audible difference was there clear as a day. So I perform a binary test only to find that the files were identical. Sigh. I resigned my unofficial position as the encoder tuner. :)
This is why I plead with you all to test your listening experiences in proper test. Your designer could have easily done that. He could have built two versions of that amp, matched their levels and performed AB tests on a number of audiophiles blind. Then, if the outcome was that the less well measuring amp was superior, I would join him to defend it!