Saturday, July 26, 2008

Blending 101

(a.k.a. why one submission a day is enough)

Serious attempts at the Netflix challenge require blending results from many algorithms. Blending is an operation that transforms multiple estimates into a single higher accuracy estimate. This is a brief tutorial of the steps involved. Experienced Netflix participants should not bother to read further.

Step 1: construct a reduced training set

To blend the models, you need to construct a reduced training set by excluding from Netflix provided training set all ratings present in the probe set.

Step 2: train you different models on the reduced training set

For now we train each individual model on the reduced training set. Later we will re-train all the models on the full training set. To re-train in a consistent way, it is critical to record carefully at this step all the parameters used, the number of training epoch, etc.

Step 3: predict the probe set

For each model trained on the reduced training set, predict the probe set.

Step 4: select you favorite blending recipe

This step receives as input the predicted probe set results from each model, and the real probe set ratings. The output is a function that mixes the individual model predictions into a blended prediction, hopefully better than any individual result. A simple linear regression will get you a long way, but feel free to experiment with your favorite machine learning algorithm. What is key here, is that any unknown coefficient (for example the linear regression coefficients) can be selected to minimize the error between the blended prediction and the real probe set scores.

N.B. If over fitting the blending function is an issue, partition the probe set in two. Use one part for training the function, and the other for cross-validation.

Step 5: re-train all models using the full training set

At this point, we are preparing for our final predictions. To get the best possible accuracy, we re-train all models using the full training set. The addition of the probe set data in the training data can result in an accuracy boost of 0.0060 or more.

Step 6: blend the re-trained models

Here's the leap of fate. We assume that the blending function we computed at step 4 is still a valid function to blend the re-trained models. For this to work, the two sets of models must be computed with a rigorously equivalent methodology. Also, the selected function from step 4 must be a valid generalization and avoid over fitting. This is not an issue with a simple linear regression, but may become problematic for complex machine learning methods with many degrees of freedom.

Step 7: clamp the final predictions

Here's a hint: clamping values between 1 and 5 is not optimal.

If this well done, then improvements in the models can be measured after step 4 by comparing the accuracy on the probe set. Values on the qualifying set will be better by 0.0060 or more, but this offset should be very consistent from one submission to another. Lately I have been getting 0.0068 +/- 0.0001.

9 comments:

teCh poVerA said...

Thanks for taking the time to write down all this information. Could you care to expand a bit more on your blending method?

I mean the actual blending recipe. Do you use some kind o neural network or just linear regression? Also, do you really use cross-validation (like splitting the probe in half or more parts and try combinations on those)?

My small contribution to the points made in this post: my gap between probe and test rmse is about 0.0065 +/- 0.0001 (so I can confirm your numbers)

PragmaticTheory said...

We have tried several blending methods, but I won't go into details here. We used cross-validation by splitting the probe set to investigate different approaches and select parameters. For official submissions, we don't split the probe set.

Anonymous said...

Do you get results after you submit a resulting dataset?

Or if it is lower than your last RMSE on the Leaderboard you don't get the results?

PragmaticTheory said...

The submission process works as follow:

1) When a result file is submitted, the last submit time is immediately updated on the leaderboard and a confirmation email is sent to the team leader.

2) The server computes the score. It takes a few minutes.

3) If the new result is the best so far, the best score column of the leaderboard is updated with the new value. An email with the new score is sent to the team leader, whether or not it improves on the previous scores.

So the team leader receives an email with the submission score, good or bad. The leaderboard always displays the best submission score ever, together with the latest submission date/time.

Also note that you appear in the leaderboard only once you have beaten Cinematch once.

I'm sure that you can find more details on this on the www.netflixprize.com website. Look at the forum discussions if nothing else.

teCh poVerA said...

Blending 101... What does the 101 mean ?

PragmaticTheory said...

http://en.wiktionary.org/wiki/101

teCh poVerA said...

LOL

Anonymous said...

Would it be feasible to have different per-user blending functions? Maybe each algorithm that goes in the "blender" has a group of users on which it works best...

PragmaticTheory said...

Training a different blending function for each user is unlikely to succeed for lack of sufficient training data. However, training different blending functions for subsets of users is discussed in The BellKor Solution to the Netflix Prize (section "Combining multiple results").