We have been working recently on variants of BellKor's integrated model as described in their 2008 progress prize paper. We obtained results very similar from the published numbers: our implementation achieved 0.8790 RMSE on the Quiz Set (f=200), compared to the reported 0.8789.
This model proved superior to our own flavor of integrated model. However, what is interesting is that we were able to leverage the best of both models and combine them together. This combined model achieved a Quiz set RMSE of 0.8756 (f=200). This is, to our knowledge, the best reported number for a model without blending. On today's leaderboard, this would achieve the 47th rank by itself.
Thursday, February 12, 2009
Subscribe to:
Post Comments (Atom)
9 comments:
What a teaser!
While I'm trying to tie time biases (with 30 bins) to the classic SVD++, I get worse rmses. I suspect it has something to do with the LRATES. When I lower them (for the biases that is), the training starts with higher rmses that drop VERY quickly in the first epochs, only to achieve a worse rmse in the end.
Perhaps you can share a few tips on this matter. How much do LRATES and REGULARIZATION values change in the temporal effects version of the same SVD++?
Any help is greatly appreciated. Thanks anyway.
I started messing with this stuff way back near the beginning of the contest but ran out of time...
I too started with the movie-data-is-better theory but ended up with the same conclusions as you.
It's great to see two regular guys neck and neck with the academics.
If you still need additional computing power, I have a quad core Dell Workstation running Vista X64 with 16GB memory and tons of disk space which I can dedicate to your task. No strings, no conditions.
An update: now at 0.8732 for a single prediction set. It would be placed 36th on the leaderboard. This is much better than everything we have produced before. Unfortunately, it makes almost no contribution to the blended results.
sogrwig: We used gamma1 = 0.007 and lambda6 = 0.005 for biases as described in Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model.
Darren: Thanks for the offer. As we said in a previous blog entry, our computing resources were starting to be a limiting factor. However we are in better shape since we acquired a Core i7-920 with 12Gb RAM last month. For now, it does the job nicely. If we reach a situation where it is no longer sufficient, we might just take you up on your offer.
No problem - offer is open if you need it. (And I was serious about no strings, no conditions).
One of the things I've was tinkering with (before running out of steam) was identifying those Netflix accounts where more than one person was rating movies, then dividing the account into two and using the most highly correlated account when doing rating predictions. It was promising in some limited testing. I'm sure you don't need more ideas...
Care to share your factor initialization values (Pu/Qi/Yj) when using Bellkor's integrated model?
I started to calculate how much ram would be consumed for an SVD++(3) model and I came to the conclusion that my 4Gb of memory would only be enough for the 30 factors version... Then for every 5 more factors it would require approximately 400Mb... perhaps a bit less...
So in order to run a 200 factors version (like you have) I would need around 16Gb...
Am I way off in my calculations here? How much memory does the 50 (or 200) factor version consume in your implementation?
I'm about to go buy some new memory for my machine and I was wondering if I should also buy a new pc, with a new motherboard that can hold up to 16Gb or even more.
I wonder what their machine is that managed to do 2000 factors... According to my (probably false) calculations it would need much more than 100Gb... (I hope I'm wrong :)
sogrwig:
We never attempted to implement the SVD++(3) model. The integrated model we were referring to used b(3) but only p(1) as described in the same paper, and thus required just a small fraction of the memory.
You are about right about the amount of memory required for SVD++(3). Logically, the solution is to keep the users coefficients on disk. Each pass through the training data requires reading and writing through all the user coefficients once. Doing it about 30 times doesn't seem impossible, just slow. This is just speculation, as we have never tried it.
On the other the hand, there are other ways to get similar or even better results using more compact parameterization, as suggested in the paper. Also, our experience has shown that although mega-models (our largest takes about 8G of RAM) may achieve marginally better individual accuracy, they provide very little improvements to the blended results. The bottom line is that it is not worthwhile to invest in the a new computer just for this. Brains is what is going to win this competition, not gigaflops.
Are you guys factoring the user time-factors by any chance ? That is, instead of using UxD set of factors, you have UxK and KxD ? Or perhaps use time-factors on the implicit (NSVD1/2) side ?
Post a Comment