@blisshifi, my impression is that soundstage depth can either be dominated by the room or by the recording, the latter being preferable but generally more difficult to accomplish.
First looking at the room as the constraining factor: As a ballpark first approximation, it seems to me that soundstage depth is related to the distance from the speakers to the wall (or equipment rack, or TV, or whatever) that is behind and/or in between them. My untested observation is that soundstage depth often seems constrained to twice the distance to the reflective surface between and behind the speakers. So if the speakers are out three feet from the wall, it seems to me that soundstage depth tends to extend to about six feet - which is cool because it is perceived as extending beyond the physical room boundaries, but imo there is significant room for improvement.
Diffusing, re-directing, or broadband-absorbing the first reflections off the wall behind the speakers (and avoiding having the equipment rack or TV there) can help to "unmask" the soundstage depth cues which are already on the recording.
Also, well set-up (i.e. far enough out into the room) bi-directional speakers like the Borresens/Raidhos, Quads, and Maggies you mentioned, have reflection characteristics which can be exploited to shift the perceived acoustic space from "playback room" to "recording venue", whether the recording venue cues be real or engineered or both.
Duke