A well recorded performance using more than one microphone will have spatial queues in the recording in the form of phase delay between the channels, which when played back through stereo speakers will replicate to some degree those same spatial queues allowing your mind to assign direction, distance, even altitude. Well made speakers, with very inert cabinets, will tend to better throw these sonic "images" akin to stereoscopic pictures.
Part of the ability to throw a stable image is close tolerance on the frequency responses of the left and right speaker, and smooth response with no peaks. The rest is how accurately the speakers can render those spatial queues in a way that bring realism to the experience.
Very good speakers, with the right recordings, can throw an image that extends well beyond the location of the speakers themselves, producing an enveloping sensation, a very 3D experience.