Not just the crossovers. the voice coils and construction of drivers have inductance that affect phase. I was building Bessel filters 30 years ago because of group delay characteristics. Then, there's impulse response ...
Huge, complex topic. Sometimes fairly bleeding edge stuff, especially in digital. Lots of opinion about what we can hear and how we hear it. Even the instruments to detect time domain differences to smaller scales are being developed.
I get a little upset when I read "time and phase aligned" as promotional and restated as dogma. Not that I disagree with the principle, the effort and, sometimes, the results. Narrow viewpoint.