Re: Definition of Insanity?
SirRoxalot said:
As far as research is concerned, anybody who's taken elementary statistics on a college level, especially as part of a Psychology major, can see flaws in music testing.
Statistics pretty much tells you things like margin of error, and helps determine sample size. It does not tell you if there are any flaws in music testing.
The critical areas in music testing don't even include sample size... through replication studies, we have proven what the necessary sample is long ago. Those important areas include recruit specifications and actual recruiting, questionnaire design, location of a physical test facility, driving conditions, availability of parking, etc., etc. The big issues are logististical, not statistical.
"Hook" testing tells you how an audience responds to hooks - but now how they respond to an entire song.
When respondants are given the scoring instrucions, including a "test pod" practice, they are told that they will hear "snippets" or "slices" of each song "or we would be here until tomorrow." And they are told, generally, to score based on how much they would like to hear that song on the radio today. Nearly every person who can write their name can understand this, and knows that they are to think of the song the hook represents and score accordingly. Post-test debriefings in a one-on-one environment confirm that participants know what they were supposed to do and that, in fact, they did it.
Case in point - "Get This Party Started" by Black-eyed Peas. Test that with a 45+ audience, and it's likely to get good response because the hook has been used on TV so much. Play the song, and a 45+ audience will hit the button EVERY TIME the "rap" part comes on.
First, the test participants are not scoring on familiarity, they are scoring on whether they want to hear the song on the radio today. That's just the first step, though.
And this is where the PD comes in. Research is a tool, not a final decision.
Every list of songs tested includes songs the station does not play... maybe ones that they used to play, or ones on competitors. The PD evaluates songs and can make decisions based on "it does not fit my vision of the station" and not play certain high testing numbers. Further, many testing companies provide cluster/factor analysis to determine audience subsets and relative scores; songs are often eliminated due to fit.
We need to remember that every "average" listener uses 5 to 6 stations in a week. So they may have other "favorite" songs but those songs may not be why they come to our station... different mood or situation need, etc. Part of the analysis, both emotive and statistical, by the PD is to decide on fit. This is like the situation here we test a bunch of good songs in one combination, and the pod scores low, but when a good PD puts them in the right order for good flow, the intent score rises amazingly. It's about playing good songs the right way on the right station. To do all that, you need a good PD.
The idea that you can get accurate data from a :06 second sample - but that an :08 second sample creates "fatigue" is only valid if you play :06 song snippets, not entire songs on the radio.
Again... and this is getting tedious... you totally misread and are now mis-stating what I said. I said most people have finished evaluating a song and scored it before 6 seconds go by. And that most folks do 8" hooks with a fade to make sure the respondant is ready to move to the next song. The 6"/8" rule is so well defined that I can tell you that, after looking at nearly 100,000 respondants just from the last 5 or 6 years there is seldom a variance... and the variances are due to people who should not have been in the test for a variety of reasons and who are deleated in the data cleansing phase anyway.
I can also tell you from witnessing tests done with too-long hooks that the audience becomes fidgity and disattentive immediately... ruining the validity of the test data.
Other data, including downloads, sales, and audience response is at least as valid as music testing.
I don't know what "audience response" means, but if you mean "phone calls" you are way out in left field at a football game. Incoming research of any kind is very biased and has no way to control the sample.
Downloads don't tell you age or station preference or radio usage. Sales tells you CDs were sold... but to whom? For a gift? Which cut? Age? Sex? P1 radio station? All are totally invalid within today's technology. Of course, radio is not in the record selling business, anyway, and so whether a song sells or not is no indication of its radio usage qualities.
Testing longer segments of fewer songs over a two hour period would likely give you different - and possibly more valid - results.
Nah. It's been done, both as a control group exercise and by accident. And it yields vastly poorer results, as after the first 50 or so hooks, people's average scores decline in proportion to the added length so by the end of the test, they are scoring everything as neutral or below.
A music test is like a wine tasting. You don't drink the whole bottle or glass... you, correctly, savor and spit. Anything else just yields drunks by the 4th or 5th variety, and achieves nothing.
That would mean spending a lot more time and money to test 600 songs.
And that is money radio does not have (besides the fact that it does not work, ever, at all, anywhere).
A hook test is most valid in telling you what's burnt to a crisp, because people will go negative on it almost immediately.
No, a test is intended to tell what to play and what not to play, for any reason, and how much to play the good ones. It does not matter if a song is burnt, just that it is currently unplayable. And tests mostly identify songs that just should not be played for whatever reason.
Do I have all the answers? No. But, the "powers that be" don't have all the answers either. It's pure hubris to assume that the current methodology is flawless. Continued declines in listening have to tell you that SOMETHING'S WRONG. What's that they say about "Doing the same thing over and over and expecting different results"?
Agian, radio is not gradually seeing lower TSL due to bad music. The issue is that people have so much more to chose from in the way of options, and each one gets thinner pieces.
The issues with music research have to do with logistics... getting the right people to show up being one of them. But there are other methods we also use, like by-appointment personal testing with one person scoring on a computer... and web testing by invitation (with an incentive, too) at their own pace (they score in about 6 seconds, too).
Since it's obvious you have not should much familiarity with testing, and none with the recruiting and implementation end, you should really study the process first before making wild, totally wrong, statements.
BTW, David Eduardo is the one who's saying that everything prior to the current PPM testing is invalid. Maybe you'd better take it up with him.
No, I am saying that comparing data from today's methodology with yesterday's less perfect methodologies is falacious; why buy a buggy when there are now cars?