If all sound, including music, consists of two physical properties, namely amplitude and frequency,
All sound comprises of phase & amplitude. Further, time & phase are related in that time is phase. It takes a certain time for the sound originating from a source (any source) to reach your ear. Depending upon how much time it takes dictates what the phase of the sound is (w.r.t. its originating point) when it reaches your ear. Phase changes from zero to 360 degrees & starts all over again from zero infinitely until the sound dies out. Amplitude of sound falls off 6dB per doubling of the distance. So, the further away you are the lower the amplitude.
In nature, phase is the independent variable & frequency is a derivative of phase. So, if phase changes, frequency will change. And, when the frequency changes, the timbral aspect of the sound changes.
then one could argue that much of the audio language we use is vague, and sometimes extremely difficult to understand.
Continuing what was written above, the audio language is basically describing what the distortion sounds like upon playback.
As you know, all electronics & all speakers create distortion. The name of the game is to identify components (electronics + speakers) for any given price-point that playback music with the least distortion. That will always enhance one's music playback session(s).
So when people write
For example, what are we supposed to understand by words like 'analytical' or 'warm'?
they are really describing the sound of the distortion - specifically the phase distortion - thru their resp system or some system they auditioned (friend's place, dealer's place, etc).
it's important to remember that when the phase of the original signal (coming off a LP or a CD or a music stream) is altered (by electronics or the speaker), the frequency content of that music playback session is altered. This in turn changes the timbral accuracy & this completely alters the music for the worse (in general. there are many people that like phase distortions because it gives them a coloured sound they are looking for specifically).
Once you understand this, it is very easy to understand the audio language.