Monday, October 12, 2009

Netflix prize

Fortune magazine had a recent article on the Netflix prize. A very interesting competition to increase their correct recommendation percentage by 10%. I was struck by a comment at the bottom by a reader of the article:

If it is true what the article says, that the BellKor team’s algorithms used information such as "genre" and "actors", then they plainly cheated!
This information was not available in the contest. The contest data had only 4 pieces of information: people ID, movie, date, and the rating values.
If BellKor managed to get extra information about the people and movies, it is hard to understand how NetFlix overlooked it. Therefore I'm quite sure that the article got the story wrong – it is too extreme to be believable. But if the article is right, then the winning team used crooked means to get the prize and its 'solution' does not worth much. Hope it is the first option…

I remember looking at the data, and if I recall correctly, I remember a script to add to the database from IMDB. I assume this was allowed, because otherwise there would have been no way to get 10%.

