3D imaging

I I started thinking about this yesterday. What makes speakers produce a 3D image? I figured the first thing is the recording itself. I'm guessing mic placement has a lot to do with this. Next I would imagine is room,and speaker placement. Downstream gear certainly has to have some effect on this. Does the crossover have something to do with providing this "illusion " for lack of a better term? 
     Now please understand,I don't have anywhere near the technical knowledge a lot of you folks have,so as you explain this phenomenon,please dumb it down for me! 
    Thanks in advance,
I believe this is the first time I've heard anyone mention diffusion vs absorption at early reflection points. However,thinking about what you said about removing energy,it makes sense. Thanks for your post.
Diff + absorb + random = good, lots of variation in surfaces, textures, a loved in room is often FAR superior to the audio ph dedicated room with rack between speakers and a forest of amplifiers on the floor....
Two huge advantages to headphones 🎧 are you don’t need room treatments and you don’t have to drive yourself crazy with speaker placement. What a relief! 😂
I'm not a fan of headphones. I haven't used mine in a couple of years. For me,speakers are my drink of choice. I actually find it fun in a way, to mark where speakers are placed,listen for a couple days,then move them to see where I'm at.