Wednesday, December 10, 2008

Progress Prize 2008

The long awaited Progress Prize 2008 as finally been awarded. Of course, I immediately rushed to download the supporting papers and learn what is in the new super-accurate model discussed on BellKor’s web, and what other goodies are in BellKor’s and BigChaos’ solution. If you haven’t checked the papers yet, do it before continuing to read this (references can be found in the NetflixPrize forum).

It appears that what’s new in 2008 is mostly about exploiting the dates. BellKor’s 2007 solution used the date in the global effects, but that was about it. It seems logical that 2008 brings more ways of making use of it. That’s not a surprise: much of PragmaticTheory’s recent improvement has been about using dates too. Still, there are differences, so we’ll see where that leads us. I know what I’ll be doing over the Christmas Holidays.

What worries me is the approach using billions of parameters. My poor home PC can’t do that. Running Windows XP Home Edition, a process is limited to about 1.6 GB of memory (2GB application address space minus the address space lost due to DLL mapping). With about 400MB used for the training data (typically), that leaves about 150M double precision parameters, far from the required number. Going single precision raises the number to 300M, still far short. Running from disk is my only option, but the poor disk is almost full!

What’s funny is that 10 billion parameters is not only much larger than the training data size (roughly 100 million ratings), but it is even larger than the 17770 movies X 480189 users (approximately 8.5 billion) problem space. Still, the model introduces a third dimension (time) with 2243 different dates, resulting in a problem space of 2243 X 17770 X 480189 = 19139 billion (almost the cost of a bank bailout). Fair enough, but I still have to ask Santa for a new PC.

Enough rambling, I need to write some code. We’ve been falling behind…

Martin for PragmaticTheory