@mijostyn
Listen to a choir, pick out one voice then pick out another voice. Try and listen to them together at the exact same time. Your mind can bounce back and forth quickly between the two but you can not listen to both at the same time unless you ignore the individuality of the voices.
Kudos on your description of this difficulty. It captures something very real about the challenge of evaluating audio.
My initial approach to such events is to initially take them as single experience, which later turns out (on inspection) to have multiple parts. Scenic views come across this way, as well. Looking at a landscape, I don't go jumping around from one particular to another, but "take in the whole." Indeed, most of our experience of eating is exactly about the combination of flavors and not the individual flavors.
I guess my point would be that the experience of the combination can be as immediate as the experience of the particular; indeed, the experience of a particular which is embedded in a larger whole involves the mental act where we have to "prescind" or "abstract out" something which only then gets our selective attention. But in the initial moment, we experience (what we'll later call) the complex. But we experience it as a simple.
This point -- about the complex whole -- doesn't really defuse the difficulty you pose, because there again, we can *take* that whole complex in various ways, each time. (Is the landscape cheery? Is it plaintive? Is it intimidating? Etc.) So, how could we ever compare? -- that would be the challenging question.
I'd start the answer with the word "habit." I cannot hear a choir in a million different ways for the same reason I cannot see a staircase in a million different ways. I have habits of listening, habits of staircase maneuvering; habits of tasting. These habits become my bases of comparison; they allow me to compare one listening session to the next, and because I'm a self-in-society (and not a random self), I can gain insight from what you hear and perhaps hear it that way, myself.