Wow, just wow. I never would have dreamed that this topic could be so 'diverse'. I read one post after another, saying one thing only to be roughly corrected by the next couple of posts.
I suppose that my take here is that you want the drivers to launch the audio signal 'in step' with each other. Also to be taken into account is the ear/brain acceptance of receiving of the audio. Let's say that 8 milliseconds are insignificant to the ear/brain acceptance (meaning that this draws a meaningful amount of attention). OTOH, 30 milliseconds are another matter, and will be noticed. I am not using actual millisecond references, but just rough ideas of time.
If I were to design a loudspeaker, I certainly would want a certain amount of alignment between drivers, whatever that number in milliseconds needs to be. If mounting the drivers makes no discernable difference to ear/brain were golden.