Mahout comes up with large number of machine learning algorithms implemented to be run on top of Hadoop. For generating item similarity based recommendations, Mahout has built in algorithms to generate item similarity using multiple similarity measures, like COSINE, PEARSON correlation, etc.
There are two pieces to the item similarity based recommendation.
Mahout has a command to generate recommendations, which is 'recommenditembased'. The mahout built in code first generate item similarity metrics and then proceeds to generate recommendations.
The mahout recommenditembased command will look like below.
mahout recommenditembased --startPhase 0 --endPhase 10 -i /input -o /output -s SIMILARITY_COSINE -mp 15 -m 300 --numRecommendations 1000 --tempDir /temDir
You can provide the startPhase and endPhase to control to what extend mahout should proceed in its processing.
See the link below for more on mahout recommendation phases.
http://www.slideshare.net/vangjee/a-quick-tutorial-on-mahouts-recommendation-engine-v-04
step1 - delete tempDir if already existing
hadoop fs -rm -r <tempDir>
step 2 - generate preference metrics
mahout recommenditembased --startPhase 0 --endPhase 0 -i <inputDir> -o <outputDir> -s <similarityClass> --tempDir <tempDir>.
For similarity metrics computation the endPhase is 1. Next step is to compute the similarities
step 3 - compute similarities
mahout recommenditembased --startPhase 1 --endPhase 1 -i <inputDir> -o <outputDir> -s <similarityClass> --tempDir <tempDir>
For the job which computes similarities you will need step1 and step2.
Start with deleting the preference metrics folder in hadoop if it is already existing. Note that we need to compute the near realtime user-item interactions
step 1 - delete <tempDir>/preparePreferenceMatrix
hadoop fs -rm -r <tempDir>/preparePreferenceMatrix
Note that you should not delete the entire tempDir as it contains the item similarity metrics and other things which are required for recommendation generation.
step 2 - generate preference metrics
mahout recommenditembased --startPhase 0 --endPhase 0 -i <inputDir> -o <outputDir> -s <similarityClass> --tempDir <tempDir>
step 3 - delete partial multiply directory if already existing.
hadoop fs -rm -r <tempDir>/partialMultiply
step 4 - generate recommendations
mahout recommenditembased --startPhase 2 --endPhase 30 -i <inputDir> -o <outputDir> " -s <similarityClass> -mp 15 -m 300 --numRecommendations 1000 "+similarityOut+"/current --tempDir <tempDir>
There are two pieces to the item similarity based recommendation.
- Compute item similarity metrics
- Generate recommendations based on the latest user-item interaction data.
Mahout has a command to generate recommendations, which is 'recommenditembased'. The mahout built in code first generate item similarity metrics and then proceeds to generate recommendations.
The mahout recommenditembased command will look like below.
mahout recommenditembased --startPhase 0 --endPhase 10 -i /input -o /output -s SIMILARITY_COSINE -mp 15 -m 300 --numRecommendations 1000 --tempDir /temDir
You can provide the startPhase and endPhase to control to what extend mahout should proceed in its processing.
See the link below for more on mahout recommendation phases.
http://www.slideshare.net/vangjee/a-quick-tutorial-on-mahouts-recommendation-engine-v-04
Generating similarities - (frequency - run every week or so)
The first step is to compute the preference metrics to gather all the user-item interactions. This is the 0th phase.step1 - delete tempDir if already existing
hadoop fs -rm -r <tempDir>
step 2 - generate preference metrics
mahout recommenditembased --startPhase 0 --endPhase 0 -i <inputDir> -o <outputDir> -s <similarityClass> --tempDir <tempDir>.
For similarity metrics computation the endPhase is 1. Next step is to compute the similarities
step 3 - compute similarities
mahout recommenditembased --startPhase 1 --endPhase 1 -i <inputDir> -o <outputDir> -s <similarityClass> --tempDir <tempDir>
For the job which computes similarities you will need step1 and step2.
Generating recommendations( frequency - run every 5-6 hours or daily)
Start with deleting the preference metrics folder in hadoop if it is already existing. Note that we need to compute the near realtime user-item interactions
step 1 - delete <tempDir>/preparePreferenceMatrix
hadoop fs -rm -r <tempDir>/preparePreferenceMatrix
Note that you should not delete the entire tempDir as it contains the item similarity metrics and other things which are required for recommendation generation.
step 2 - generate preference metrics
mahout recommenditembased --startPhase 0 --endPhase 0 -i <inputDir> -o <outputDir> -s <similarityClass> --tempDir <tempDir>
step 3 - delete partial multiply directory if already existing.
hadoop fs -rm -r <tempDir>/partialMultiply
step 4 - generate recommendations
mahout recommenditembased --startPhase 2 --endPhase 30 -i <inputDir> -o <outputDir> " -s <similarityClass> -mp 15 -m 300 --numRecommendations 1000 "+similarityOut+"/current --tempDir <tempDir>
No comments:
Post a Comment