(a.k.a. why one submission a day is enough)
Serious attempts at the Netflix challenge require blending results from many algorithms. Blending is an operation that transforms multiple estimates into a single higher accuracy estimate. This is a brief tutorial of the steps involved. Experienced Netflix participants should not bother to read further.
Step 1: construct a reduced training set
To blend the models, you need to construct a reduced training set by excluding from Netflix provided training set all ratings present in the probe set.
Step 2: train you different models on the reduced training set
For now we train each individual model on the reduced training set. Later we will re-train all the models on the full training set. To re-train in a consistent way, it is critical to record carefully at this step all the parameters used, the number of training epoch, etc.
Step 3: predict the probe set
For each model trained on the reduced training set, predict the probe set.
Step 4: select you favorite blending recipe
This step receives as input the predicted probe set results from each model, and the real probe set ratings. The output is a function that mixes the individual model predictions into a blended prediction, hopefully better than any individual result. A simple linear regression will get you a long way, but feel free to experiment with your favorite machine learning algorithm. What is key here, is that any unknown coefficient (for example the linear regression coefficients) can be selected to minimize the error between the blended prediction and the real probe set scores.
N.B. If over fitting the blending function is an issue, partition the probe set in two. Use one part for training the function, and the other for cross-validation.
Step 5: re-train all models using the full training set
At this point, we are preparing for our final predictions. To get the best possible accuracy, we re-train all models using the full training set. The addition of the probe set data in the training data can result in an accuracy boost of 0.0060 or more.
Step 6: blend the re-trained models
Here's the leap of fate. We assume that the blending function we computed at step 4 is still a valid function to blend the re-trained models. For this to work, the two sets of models must be computed with a rigorously equivalent methodology. Also, the selected function from step 4 must be a valid generalization and avoid over fitting. This is not an issue with a simple linear regression, but may become problematic for complex machine learning methods with many degrees of freedom.
Step 7: clamp the final predictions
Here's a hint: clamping values between 1 and 5 is not optimal.
If this well done, then improvements in the models can be measured after step 4 by comparing the accuracy on the probe set. Values on the qualifying set will be better by 0.0060 or more, but this offset should be very consistent from one submission to another. Lately I have been getting 0.0068 +/- 0.0001.
Saturday, July 26, 2008
Subscribe to:
Post Comments (Atom)
9 comments:
Thanks for taking the time to write down all this information. Could you care to expand a bit more on your blending method?
I mean the actual blending recipe. Do you use some kind o neural network or just linear regression? Also, do you really use cross-validation (like splitting the probe in half or more parts and try combinations on those)?
My small contribution to the points made in this post: my gap between probe and test rmse is about 0.0065 +/- 0.0001 (so I can confirm your numbers)
We have tried several blending methods, but I won't go into details here. We used cross-validation by splitting the probe set to investigate different approaches and select parameters. For official submissions, we don't split the probe set.
Do you get results after you submit a resulting dataset?
Or if it is lower than your last RMSE on the Leaderboard you don't get the results?
The submission process works as follow:
1) When a result file is submitted, the last submit time is immediately updated on the leaderboard and a confirmation email is sent to the team leader.
2) The server computes the score. It takes a few minutes.
3) If the new result is the best so far, the best score column of the leaderboard is updated with the new value. An email with the new score is sent to the team leader, whether or not it improves on the previous scores.
So the team leader receives an email with the submission score, good or bad. The leaderboard always displays the best submission score ever, together with the latest submission date/time.
Also note that you appear in the leaderboard only once you have beaten Cinematch once.
I'm sure that you can find more details on this on the www.netflixprize.com website. Look at the forum discussions if nothing else.
Blending 101... What does the 101 mean ?
http://en.wiktionary.org/wiki/101
LOL
Would it be feasible to have different per-user blending functions? Maybe each algorithm that goes in the "blender" has a group of users on which it works best...
Training a different blending function for each user is unlikely to succeed for lack of sufficient training data. However, training different blending functions for subsets of users is discussed in The BellKor Solution to the Netflix Prize (section "Combining multiple results").
Post a Comment