Very true. In a 100 person AMT, we'd generally see one true outlier who even gave dramatically different scores to some of the biggest most broadly liked songs. On a rare occasion we might find two that we felt would not distort the total results but might affect age and gender breaks.I'm a total statistics nerd. When I was involved in music research decades ago, we calculated the annual percentage of 'outliers' that had been jettisoned from the control group the prior year. Surprisingly, that percentage didn't vary too much from year to year.
2 people is 2% of the sample. The AMT process has a greater overall margin of error, so removing a couple of outliers has no negative impact but some effect on the narrower breakouts. d
We knew interesting things, like how the weather outdoors would effect average scores; a storm or rain will bring down the average for example. Or bad traffic getting to the test site will do the same. Once you do a few hundred AMTs you know how to read the room even as to overall mood. That means that the cut-off point on playability will change according to that assessment.
Now, we test online. We can't read the room, we can't even be sure that the participant is the actual person we recruited. Technology brought down the price, but not the accuracy.