First, two speakers cannot image "height". You have two variables, when the sound arrives at each each, and how loud the sound is for each air. There is no way to derive "height" from that, so don’t bother trying .... though you may convince yourself it is there and/or your room acoustics may give an impression based on inconsistent frequency response w.r.t. direction (which is not a good thing).
I get a good laugh when I read people saying they made some minor change and the "soundstage doubled", or some other superlative. As pointed out, most of the soundstage is in the recording, so it comes down to how it is recorded and mixed. Remember, only two variables w.r.t. imaging, when the music gets to each ear, and relative volume.
After that, it is a complex interaction of your speakers and room. If either one is bad, you lose your imaging. On the speaker front, it comes down to the same things always discussed, smooth on-axis frequency response, and smooth decaying off-axis response. If you don’t have smooth on-axis and smooth decaying off axis response, then the volume balance between instruments gets messed up due to frequency response variations. So when you do research, look for products that either publish good on/off axis frequency response and/or where you can find tests on the web. "Trust your ears" it not always the best advice. Speakers will sound much different at home versus in a demo room. You are listening to the room as much as the speakers, and a purpose built demo room can hide problems that you may not be able to avoid at home.
If you aren’t willing to work on room treatments, then forget about good imaging. This comes back to timing. If you have strong reflections, your brain doesn’t know what arrived first and it needs to be able to clearly identify the same signal reaching both ears. This also plays into the importance of smoothly decaying off-axis frequency response. First reflections and strong reflections are all bad. Side walls in front of the speaker, behind the speaker, and often forgotten is behind the listener, but also the floor and the ceiling. You don’t want to eliminate all reflections as then you loose that nice artificial sense of "space".