Oh Mapman, you are goading me now with that type of crazy subject!
It really is harder to get good sound than good still images. The still images are static: no transients! Easy! And judging it is easy....it's static!
Movies are just series of stills....the static gets repeated every so often.
Not so simple with the ear, whose bandwidth is high, that is extremely time sensitive (so phase distortion shows greatly), and which no single transducer can effectively satisfy.
Consider this: the ear/brain system is 10 octaves wide!!!!
They eye is less than one octave wide in its frequency spectrum! Ouch, that ear is hard to satisfy.
As for "beef" in high end audio, there is little beef. Lots of guys saying they are from Bell Labs (tall tale from that guy), NASA (almost all of those but one that I know are false), NSA scientist (bogus).....so you are right, lots of experimenting, only a few that have real scientific / physics / engineering chops. The job is harder, the market smaller.
A lot of substance and good points from several folks in this thread, far more than in most.