Wednesday, December 10, 2008

Progress Prize 2008

The long awaited Progress Prize 2008 as finally been awarded. Of course, I immediately rushed to download the supporting papers and learn what is in the new super-accurate model discussed on BellKor’s web, and what other goodies are in BellKor’s and BigChaos’ solution. If you haven’t checked the papers yet, do it before continuing to read this (references can be found in the NetflixPrize forum).


It appears that what’s new in 2008 is mostly about exploiting the dates. BellKor’s 2007 solution used the date in the global effects, but that was about it. It seems logical that 2008 brings more ways of making use of it. That’s not a surprise: much of PragmaticTheory’s recent improvement has been about using dates too. Still, there are differences, so we’ll see where that leads us. I know what I’ll be doing over the Christmas Holidays.


What worries me is the approach using billions of parameters. My poor home PC can’t do that. Running Windows XP Home Edition, a process is limited to about 1.6 GB of memory (2GB application address space minus the address space lost due to DLL mapping). With about 400MB used for the training data (typically), that leaves about 150M double precision parameters, far from the required number. Going single precision raises the number to 300M, still far short. Running from disk is my only option, but the poor disk is almost full!


What’s funny is that 10 billion parameters is not only much larger than the training data size (roughly 100 million ratings), but it is even larger than the 17770 movies X 480189 users (approximately 8.5 billion) problem space. Still, the model introduces a third dimension (time) with 2243 different dates, resulting in a problem space of 2243 X 17770 X 480189 = 19139 billion (almost the cost of a bank bailout). Fair enough, but I still have to ask Santa for a new PC.


Enough rambling, I need to write some code. We’ve been falling behind…


Martin for PragmaticTheory

8 comments:

Colin Green said...

Hi, y'know if you use a 64bit OS you can allocate more than 4GB of RAM. I've been using Windows XP 64bit with 4GB - on the basis that I can reliably allocate 2GB or more in a single process.

Anonymous said...

I am backing you guys. I like the idea of some people in their free time surpassing at&t and company. I am lost when it comes to coding, but I enjoy the theories' behind the algorithms. So I will keep an eye on your blog. Good luck

Anonymous said...

You should Google for "boot.ini 3gb" to enable a 3GiB user address space.

Anonymous said...

Kudos for the recent 0.8620 RMSE
I guess you implemented the Biggies ideas on your own, or do you have any "personal sauce" added to the mix?
I mean, yes/no, I am not asking for the magic spell :-)

PragmaticTheory said...

LMV: The last submission (0.8620) was mostly our implementation of some ideas described in the 2008 progress prize and KDD 2008 papers. About 1/4 of the gain was "personal sauce".

Anonymous said...

About 1/4 of the gain was "personal sauce"

Thanks, so it means roughly that your personal sauce is worth about .0004 and that the still secret sauce of the Biggies (with respect to the published papers) is worth about .0013, interesting...

Nitrous Gold said...

You can also run a 64 bit linux with less headache than trying to switch to 64 bit XP/Vista. I installed Wubi (an Ubuntu installer that does not require repartitioning) in a 5 gb hard drive file and had more than enough room - you can probably go much smaller. The NTFS partition is still available when in the linux os, but you can then access all your RAM.

Anonymous said...

What about Distributed Computing like Folding@home?

I will be willing to donate my computing resources if it goes to free movie recommending website.

If I get recommendations of *good* movies :).


Not sure if Netflix is still at 9.63% improvement.