andy - thank you for your contributions. Indeed you are on it with "first order filter. It communicates the heart and emotion of the music better."
This communication is not some magic of technical accomplishment, and in fact it is one side of the coin of accuracy. This subtlety of communication is primarily a psycho-acoustic effect. That is not saying it is somehow fakery. On the contrary, we hear by synthesizing auditory experiences via very minimal sonic inputs which elicit associations, memories, conjectures and so forth. Then the auditory brain forwards those synthesized packets for storage and assembly into longer, more complex composites such as a musical or verbal phrase, etc. Our research at Thiel led us to commit to first-order slopes because they are the only solution to preserve the phase-time information that the ear-brain uses to believe the input is real. We really believe what we are hearing when the phase information is intact, rather than cognitively conjecturing what we hear when the phase information is scrambled, as it is with higher order filters. Much more can be said about this process, but put it on the shelf for now.
Side 2 of the coin is the technical execution. No doubt, hands down, higher order slopes are FAR more executable for exact frequency domain accuracy. In fact, first order slopes are generally considered non-executable because the drivers must have such a wide range of linear response. Higher order slopes attenuate the out-of-bandpass signal at double, triple or quadruple rates compared to first order. The ubiquitous 4th order slopes attenuate at 24dB / octave rather than our 6dB. So all the grief that the driver goes through at its frequency extremes just goes away with higher order filters, making much cleaner, more controllable frequency domain smoothness.
So, the double-whammy is that the frequency extreme grief of the first order slope is also more objectionable because the ear-brain is trying to process it as real music and not a music-like artifact as it does with higher order slopes. Both sides of the difficult coin gang up against first order slopes. Thiel decided the result of reality was worth the huge grief of execution.
As you eluded earlier, phase coherence without time coherence is not meaningful. The audio brain can only buy in if all the elements of the signal are correct. I believe, and Thiel's position is the distinct minority, that the phase-time aspect of the signal is more critical than the frequency domain aspect. In other words, it is not upsetting if a reproduced trumpet sounds slightly like a different trumpet (frequency-spectral differences), but it is upsetting if a trumpet's harmonics reach the ear at different times and in different phase relationships than real, non-reproduced music (time-phase differences).
An interesting phenomenon is that once a listener (or recording pro) has identified the importance of time-phase, there is no real going back. The artificiality of non-coherent wave-forms is unsettling, even if the frontal lobes convince us that that sound must be a trumpet. As I said before, this discussion encompasses a serious body of study; I hope this response covers the high spots.
Regarding more than 3 drivers in a phase-coherent system, I'll comment on that later. Think CS5, 6 and 7.