Incorporating Side Information into Probabilistic Matrix Factorization Using Gaussian Processes

Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI), Catalina Island, California (2010)
arXiv:1004.4944 [stat.ML] | PDF | Google Doc | Code | Data | Google Scholar | BibTex | EndNote

Abstract:

Probabilistic matrix factorization (PMF) is a powerful method for modeling data associated with pairwise relationships, finding use in collaborative filtering, computational biology, and document analysis, among other areas. In many domains, there is additional information that can assist in prediction. For example, when modeling movie ratings, we might know when the rating occurred, where the user lives, or what actors appear in the movie. It is difficult, however, to incorporate this side information into the PMF model. We propose a framework for incorporating side information by coupling together multiple PMF problems via Gaussian process priors. We replace scalar latent features with functions that vary over the space of side information. The GP priors on these functions require them to vary smoothly and share information. We successfully use this new method to predict the scores of professional basketball games, where side information about the venue and date of the game are relevant for the outcome.

Keywords:

gaussian process, probabilistic matrix factorization, sports, time series

Raw NBA Data

The tarball of code above provides the data as a python pickle. If you just want the raw data (collected by George Dahl), it is here as a gzipped CSV file. It has a header row and then each further row corresponds to a game, with date, teams, scores and betting line. The data is from the start of the 2002-2003 season to midway through the 2009-2010 season. Note that the number of teams changes over the data. A snapshot is below:

```
2002,11,2,Washington,New Jersey,79,87,-1.5,184.0
2002,11,2,New York,Boston,107,117,6.5,180.0
2002,11,2,Milwaukee,Orlando,90,100,-3.5,199.5
2002,11,3,LA Lakers,Portland,98,95,-4.0,185.5
2002,11,3,Miami,Sacramento,88,78,7.0,181.0
2002,11,3,LA Clippers,Detroit,74,72,-1.5,185.0
2002,11,3,Oklahoma City,Utah,91,77,-7.5,183.0
2002,11,4,Dallas,Golden State,107,100,-11.5,207.0
```