Datasets

DataRec includes several commonly used recommendation datasets to facilitate reproducibility and standardization. These datasets have been carefully curated, with traceable sources and versioning information maintained whenever possible. For each dataset, DataRec provides metadata such as the number of users, items, and interactions and data characteristics known to impact recommendation performance (e.g., sparsity and user/item distribution shifts). The dataset collection in DataRec is continuously updated to include more recent and widely used datasets from the recommendation systems literature. The most recent and widely used version is included when the original data source is unavailable to ensure backward compatibility.

The following datasets are currently included in DataRec:

Dataset Name Source
Alibaba iFashion https://drive.google.com/drive/folders/1xFdx5xuNXHGsUVG2VIohFTXf9S7G5veq
Amazon Beauty https://amazon-reviews-2023.github.io
Amazon Books https://amazon-reviews-2023.github.io/
Amazon Clothing https://amazon-reviews-2023.github.io/
Amazon Sports and Outdoors https://amazon-reviews-2023.github.io/
Amazon Toys and Games https://amazon-reviews-2023.github.io/
Amazon Video Games https://amazon-reviews-2023.github.io/
Ciao https://guoguibing.github.io/librec/datasets.html
Epinions https://snap.stanford.edu/data/soc-Epinions1.html
Gowalla https://snap.stanford.edu/data/loc-gowalla.html
LastFM https://grouplens.org/datasets/hetrec-2011/
MovieLens https://grouplens.org/datasets/movielens/
Tmall https://tianchi.aliyun.com/dataset/53?t=1716541860503
Yelp https://www.yelp.com/dataset