In my opinion:
"There is a roll-off before 20kHz.." This is the shortest wavelengths being absorbed by surfaces in the room.
"A dip is present around 12kHz.." There will be a on-axis cancellation dip in the response with a round horn or round waveguide, centered on the frequency where the mouth reflection arrives 180 degrees out-of-phase a the listening position. The center frequency of this dip changes with listening distance or microphone distance. Off-axis the arrival time of the horn mouth reflection is smeared, and the cancellation dippage is correspondingly reduced.
"There is a noticeable boost between 50Hz and 1.5kHz." The midwoofer's pattern is wider than the waveguide's pattern below the crossover frequency resulting in more in-room energy below the crossover frequency.
Duke