July 10, 2015

Better sort by rating with Spree and spree_reviews

The blog post How Not To Sort By Average Rating continually pops up and got me thinking about how we currently implement sort by rating. We currently use spree_reviews for capturing ratings and it takes a very simplistic approach to storing the average rating for a product:

self[:avg_rating] = reviews.approved.sum(:rating).to_f / reviews_count

This exact scenario is mentioned in the blog post above:

Why it is wrong: Average rating works fine if you always have a ton of ratings, but suppose item 1 has 2 positive ratings and 0 negative ratings. Suppose item 2 has 100 positive ratings and 1 negative rating. This algorithm puts item two (tons of positive ratings) below item one (very few positive ratings). WRONG.

A better solution is using a Bayesian estimate which actually takes the number of reviews into consideration. This is how IMDB currently create their top 250 movie list:

weighted rating (WR) = (v ÷ (v+m)) × R + (m ÷ (v+m)) × C
where:
* R = average for the movie (mean) = (Rating)
* v = number of votes for the movie = (votes)
* m = minimum votes required to be listed in the Top 250 (currently 1300)
* C = the mean vote across the whole report (currently 6.8) for the Top 250, only votes from regular voters are considered.

With that, it’s fairly simple to approximate with spree_reviews. Just be sure to recalculate all your product ratings.

Spree::Product.class_eval do
  def recalculate_rating
    reviews_count = self.reviews.reload.approved.count
 
    self.reviews_count = reviews_count
    if reviews_count > 0
      r = reviews.approved.average(:rating).to_f
      v = reviews.approved.count.to_f
      m = Spree::Review.approved.count.to_f
      c = Spree::Review.approved.average(:rating).to_f
 
      self.avg_rating = (v / (v+m)) * r + (m / (v+m)) * c
    else
      self.avg_rating = 0
    end
    self.save
  end
end