He used an tape recorder and a digital recording maschine. There were 2,3 women talking very quitly before the concert startet..
After the concert he checked both recordings. The whispering of the women could be heard on the Tape, slightly, as the tape noise was almost as high
Several things at play here and also how it would play out differently today.
The tape recorder was probably set up (often are) so that there was some compression on the loudest peaks. That effectively extends the dynamic range beyond the raw SNR with technically loss of fidelity.
The digital recorder was likely set up so that no peak was at the max, effectively reducing its dynamic range.
You can detect sounds with an SNR < 1, which was or was close to what was happening with the tape player.
The digital recorder was likely a lower quality or older unit (if in the 90's) and was limited to raw 16 bits for recording.
If you did this test today, you would record it at 24 bit, and the voices would be more audible than the tape. If you down mixed to 16 bit, you would add noise shaped dither to get the perceptive dynamic range up 115+db, and again, the voices would be audible with a lower noise floor than the tape.