@dogearedaudio There are several aspects to stereo imaging that help define a systems performance. Imaging specificity is essentially the "focus" of the image. With the best systems and recordings each instrument and voice is sharply defined in 3 dimensional space with "blackness" (no sound) in between. In most systems the individual sound sources blend into each other. With a large number of instruments, like a symphony orchestra there is a solid wall of sound. Instruments still occupy a location, but they blend into each other whether one likes it or not. Imaging the third dimension is not where in instrument is located in 3 dimensional space it is the sensation that an instrument or voice is not a flat object on a painting but a three dimensional object in space. This is the sensation that the singer is in the room with you. This is the hardest aspect of stereo performance to achieve. Very few systems will do this and only with a limited number of recordings. A 3D soundstage is imaging in 3 dimensional space. Some instruments are up front and others behind. In many instances this is artificial. The recording engineer is doing this with echo. It is best to evaluate this with a live recording as fewer tricks are used in their production. A good recording of a symphony orchestra should easily demonstrate that the tympany is at the back of the stage.
Sometimes audiophiles will refer to a soundstage as being wide and will relate that their system images out beyond their speakers. You can throw an image beyond the speakers with phasing tricks, but in the absence of these the system soundstage should be defined by the distance between the speakers and the listener's distance from the speakers. Imaging beyond the speakers is due to reflections off the side walls and always represents a problem that diminishes image specificity. For a system to have the best image specificity the frequency response curve of the two channels has to be identical. This is very difficult to achieve usually due to room issues. This is not to say the frequency response of a system should be flat. To the contrary systems tuned to be flat sound bright and bass less.