Wednesday, January 4, 2012

Mahout Recommendation Engine

Apache mahout implements scalable data mining algorithms over apache hadoop. Classification , clustering and collaborative filtering algorithms are implemented in mahout that can be used for analyzing large scale data and predicting user behavior.

Mahout implements collaborative filtering based on :
1. User Preferences
2. Item similarity (product similarity)

Here i am giving a sample code for item similarity based recommendation building.
Requirements:
1. For building mahout project one needs maven.
2. InputFile : content of the file will be like :
userid, itemid, preference
101,202,3
101,203,5
102,202,2
note: both userid and item id are supposed to be long type and preference is supposed to be of float type.
string is not supported by mahout recommendation API so you need to resolve your data in IDs before feeding into mahout recommender.

Output: Given code takes input in above given format and write output in given file as :
user,recom1,recom2,recom3,recom4,recom5
Note: Recommendations will be arranged in descending order of recommendation strength. If customer preference is not known and then in that case there will be no ordering and given below recommender will be converted to binary recommender , that means either you like some product (1) or you don't like that product (0).


import java.io.File;
import java.util.List;
import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;
import org.apache.mahout.cf.taste.impl.recommender.slopeone.SlopeOneRecommender;
import org.apache.mahout.cf.taste.recommender.*;
import org.apache.mahout.cf.taste.model.*;
import org.apache.mahout.cf.taste.eval.*;
import org.apache.mahout.common.*;
import java.io.FileWriter;
import java.io.BufferedWriter;
import org.apache.mahout.cf.taste.recommender.RecommendedItem;
import org.apache.mahout.cf.taste.impl.common.LongPrimitiveIterator;
import org.apache.mahout.cf.taste.common.Weighting;
import org.apache.mahout.cf.taste.impl.recommender.slopeone.MemoryDiffStorage;
import org.apache.mahout.cf.taste.recommender.slopeone.DiffStorage;


public class HiveLog {
   
    public static void main(String... args) throws Exception
   {
       
        // create data source (model) - from the csv file          

        File inputFile= new File("/home/test/test_input.csv");

        final DataModel model = new FileDataModel( inputFile );
        FileWriter fstream=new FileWriter("/home/test/recommendation.csv",true);
        BufferedWriter out=new BufferedWriter(fstream);      

RecommenderBuilder recommenderBuilder=new RecommenderBuilder(){
@Override
public Recommender buildRecommender(DataModel model) throws TasteException {

DiffStorage diffStorage = new MemoryDiffStorage( model, Weighting.WEIGHTED, Long.MAX_VALUE);
return new SlopeOneRecommender(model,Weighting.WEIGHTED, Weighting.WEIGHTED, diffStorage);
}
};


Recommender recommender=recommenderBuilder.buildRecommender(model);
      // for all users
        for (LongPrimitiveIterator it = model.getUserIDs(); it.hasNext();)
{
          long userId = it.nextLong();
           
            // get the recommendations for the user
            List<RecommendedItem> recommendations = recommender.recommend(userId,8);
            int i=0;
            for (RecommendedItem recommendedItem : recommendations)
   {
if (i==0)
{
  out.write(userId+","+recommendedItem.getItemID());
i++;
}
else
{
out.write(","+recommendedItem.getItemID());
i++;
          }
   }
out.newLine();
        }  

out.close();
  }
}


For more details and mahout algorithms implementation  please write.



  

No comments: