@realworldaudio , I am not an EE, but my I have an advanced physics degree, and have worked in semiconductors, batteries, and development and measurement of those a long time. I have gotten pretty good with metrology out of necessity. Hence I don’t claim to be an expert, but I think I have a good grasp of what is being communicated in the measurements:
*all parameters tested on non-inductive perfectly passive extremely simplistic loads, while the loudspeakers are highly complex live loads affected by the room
Stereophile specifically mentions they use for some of their tests a synthetic speaker load that models a real world speaker. This appears to negate some of the above statement. Testing into 4 ohm is standard. Testing in 2 ohms seems common. This will provide insight into more reactive loads. Testing with worse case synthetic loads is common and harsher than real world conditions. For characterizing semiconductor devices we test with synthetic loads to find the "corners" for stability.
*Only additive distortion is measured, subtractive distortion is not.
I am not sure exactly what you mean by subtractive distortion. Are you stating that the interaction of a particular amplifier with a particular speaker may result in lower overall distortion? This seems possible. I will note that @atmasphere who seems to know his stuff stated most (not all) speakers are designed to be driven with a voltage source which may negate the advantage you may perceive for most listeners. This would be dependent exclusively on the load (speaker) so I don’t see how this could be tested.
*Change of THD in function of output level and frequency are no paid attention to, while these are strong determiners in relation whether the sound is perceived as natural VS manufactured.
This is purely false. Read any of the more recent reviews on ASR. THD and SINAD is tested from very low power to very high power across a range of frequencies from I think 20Hz to 15KHz. I am too lazy to go verify the exact frequencies used.
*Amplifier behavior is tested with constantly repetitive primitive signals, while the music output is a highly variable extremely complex waveform.
This is also purely false. ASR tests with a 32 tone IM signal. This is 32 tones from low frequency to high frequency. That would result in a signal that is complex and varying in amplitude as the frequencies add together.
*It is not examined how an amplifier deals with small signals following a large pulse at the frequency extremes.
I will not call this false, but I think you are not interpreting what the other measurements will accomplish. The 32 tone IM signal will vary from large to small. The THD stimulation also transitions from a very small level to very large. If the measurement is -100db in both cases then that would also be the case for the special condition you are theorizing.
What may be missing is testing if the distortion rises under continued heavy load causing device heating. I do not know if that is a valid real world condition.
I have listened to Nelson Pass. He strikes me as very much a heavy measurements guy. He may tune intentional artifacts in his designs, but what I have read and what I have been on Youtube indicates he is very much measurement oriented.