GitXplorerGitXplorer
j

spark-ranking-metrics

public
28 stars
7 forks
1 issues

Commits

List of commits on branch master.
Unverified
d3710aa060ae7b4f34b4c682398844d86ada5a6a

MAP now according to Kaggle's definition

jjongwook committed 7 years ago
Unverified
b90881d87817665d02d611c1c60263729409c2ab

fixes #2; perfect prediction achives 1.0 metrics, (NDCG@k, MAP@k, overall precision, overall recall)

jjongwook committed 7 years ago
Unverified
d1517a538311975af3659645ffd104c74a7be051

added test checking for the perfect scores (failing)

jjongwook committed 7 years ago
Unverified
d9d0727076ec6a5e21e0f88b3d1a2d7627362c03

fixed some edge cases

jjongwook committed 7 years ago
Unverified
6e600aadf51fa4adf9ab92012fb507a541361490

added maven central info

jjongwook committed 7 years ago
Unverified
3d89d33ed1a94e640644f67e035405df04c08c17

Setting version to 0.0.2-SNAPSHOT

jjongwook committed 8 years ago

README

The README file for this repository.

spark-ranking-metrics

// hosted on Maven Central, for Scala 2.11 only at the moment
libraryDependencies += "com.github.jongwook" %% "spark-ranking-metrics" % "0.0.1"

Ranking is an important component for building recommender systems, and there are various methods to evaluate the offline performance of ranking algorithms [1].

Users are usually provided with a certain number of recommended items, so it is usual to measure the performance of only the top-k recommended items, and this is why the metrics are often suffixed by "@k".

Currently, Spark's implementations for ranking metrics are quite limited, and its NDCG implementation assumes that the relevance is always binary, producing different numbers. Meanwhile, RiVal aims to be a toolkit for reproducible recommender system evaluation and provides a robust implementation for a few ranking metrics, but it is written as a single-threaded application and therefore is not applicable to the scale which Spark users typically encounter.

To complement this, SparkRankingMetrics contains scalable implementations for NDCG, MAP, Precision, and Recall, using Spark's DataFrame/Dataset idioms.

[1] Gunawardana, A., & Shani, G. (2015). Evaluating recommendation systems. In Recommender systems handbook (pp. 265-308). Springer US.

Usage

import org.apache.spark.mllib.recommendation.MatrixFactorizationModel
import org.apache.spark.mllib.recommendation.Rating
import org.apache.spark.rdd.RDD
import com.github.jongwook.SparkRankingMetrics

// A recommendation model obtained by Spark's ALS
val model = MatrixFactorizationModel.load(sc, "path/to/model")
val result: RDD[(Int, Array[Rating])] = model.recommendProductsForUsers(20)

// A flattened RDD that contains all Ratings in the result
val prediction: RDD[Rating] = result.values.flatMap(ratings => ratings)
val groundTruth: RDD[Rating] = /* the ground-truth dataset */

// create DataFrames using the RDDs above
val predictionDF = spark.createDataFrame(prediction)
val groundTruthDF = spark.createDataFrame(groundTruth)

// instantiate using either DataFrames or Datasets
val metrics = SparkRankingMetrics(predictionDF, groundTruthDF)

// override the non-default column names
metrics.setItemCol("product")
metrics.setPredictionCol("rating")

// print the metrics
println(metrics.ndcgAt(10))
println(metrics.mapAt(15))
println(metrics.precisionAt(5))
println(metrics.recallAt(20))

Validation

This repository contains a test case that checks the numbers produced by SparkRankingMetrics are identical to what RiVal's corresponding implementations produce, using the MovieLens 100K dataset. The result is: Build Status