Tuesday, January 27, 2009

Assignment 2






In the continuance of Chapter 2, I have come up with the same answer as the textbook for "Recommending Items" pg 17 in PCI.






For >>> recommendations.getRecommendations(recommendations.critics,'Toby')



[(3.3477895267131013, 'The Night Listener'), (2.8325499182641614, 'Lady in the Water'), (2.5309807037655649, 'Just My Luck')]






And in






>>>recommendations.getRecommendations(recommendations.critics,'Toby',... similarity=recommendations.sim_distance)[(3.5002478401415877, 'The Night Listener'), (2.7561242939959363, 'Lady in the Water'), (2.5946144209447373, 'Just My Luck')]






Matching Products




This section uncovered an error in




recommendations.topMatches(movies,'Superman Returns')




[(0.65795169495976946, 'You, Me, and Dupree'), (0.48795003647426888, 'Lady in the Water'), (0.11180339887498941, 'Snakes on a Plane'), (-0.17984719479905439, 'The Night Listener'),




(-0.46625240412015717, 'Just My Luck')]




The text book rounds down and supplies (-0.422, 'Just My Luck').

Went to google for the feedparser, downloaded the zip file extracted and placed the feedparser.py file in my library and proceeded to the textbooks instructions.


Set up pydelicious and got about 2 pages of popular posts on programming.



So far so good, I used the files of code that were provided on the class website. Then I typed the following:



from pydelicious import get_popular,get_userposts,get_urlposts



>>> from deliciousrec import *



>>> delusers=initializeUserDict('programming')



>>> delusers ['arturousmc']={}



>>> fillItems(delusers)



>>> import random



>>> user=delusers.keys( )[random.randint(0,len(delusers)-1)]



>>> user



u'chaostheory'



>>> import recommendations



>>> recommendations.topMatches(delusers,user)



[(0.11907894736842106, u'synewaves'), (0.11907894736842106, u'mangosi'), (0.05131578947368421, u'xulu'), (0.05131578947368421, u'wdr1'), (0.05131578947368421, u'thomd')]




>>> recommendations.getRecommendations(delusers,user)[0:10]



[(0.19082672706681769, u'http://colorschemedesigner.com/'), (0.17667044167610421, u'http://python.net/~goodger/projects/pycon/2007/idiomatic/handout.html'), (0.14665911664779163, u'http://woork.blogspot.com/2009/01/beautiful-datepickers-and-calendars-for.html'), (0.14665911664779163, u'http://nettuts.com/freebies/cheat-sheets/jquery-cheat-sheet/'), (0.13250283125707815, u'http://www.noupe.com/tools/15-incredible-mac-apps-for-freelance-web-designers.html'), (0.13250283125707815, u'http://css.dzone.com/news/how-to-develop-a-firefox-exten'), (0.10249150622876559, u'http://www.vectorials.com/index.html'), (0.10249150622876559, u'http://www.templatemonster.com/'), (0.10249150622876559, u'http://www.smashingmagazine.com/2009/01/20/50-extremely-useful-php-tools/'), (0.10249150622876559, u'http://www.smashingmagazine.com/2008/01/14/monday-inspiration-data-visualization-and-infographics/')]



>>> url=recommendations.getRecommendations(delusers,user)[0][1]



>>> recommendations.topMatches(recommendations.transformPrefs(delusers),url)



[(0.48976000741676007, u'http://www.yelp.com/biz/delancey-street-foundation-movers-los-angeles#hrid:4SlRsxSrZDu8DbEvrCWdhg'), (0.48976000741676007, u'http://www.webstandards.org/action/acid2/guide/'), (0.48976000741676007, u'http://www.wasabi.net.cn/'), (0.48976000741676007, u'http://www.theonion.com/content/news/obama_disappointed_cabinet_failed'), (0.48976000741676007, u'http://www.schematic.com/#//')]





Finally I have added a search engine to del.icio.us!



In building the item comparison dataset, I added the code asked by the text to recommendations.py. and the following happened:



>>> reload(recommendations)





>>>> itemsim=recommendations.calculateSimilarItems(recommendations.critics)



>>> itemsim



{'Lady in the Water': [(0.40000000000000002, 'You, Me, and Dupree'), (0.2857142857142857, 'The Night Listener'), (0.22222222222222221, 'Snakes on a Plane'), (0.21052631578947367, 'Just My Luck'), (0.090909090909090912, 'Superman Returns')], 'Snakes on a Plane': [(0.22222222222222221, 'Lady in the Water'), (0.18181818181818182, 'The Night Listener'), (0.16666666666666666, 'Superman Returns'), (0.10526315789473684, 'Just My Luck'), (0.05128205128205128, 'You, Me, and Dupree')], 'You, Me, and Dupree': [(0.40000000000000002, 'Lady in the Water'), (0.18181818181818182, 'Just My Luck'), (0.14814814814814814, 'The Night Listener'), (0.053333333333333337, 'Superman Returns'), (0.05128205128205128, 'Snakes on a Plane')], 'Just My Luck': [(0.21052631578947367, 'Lady in the Water'), (0.18181818181818182, 'You, Me, and Dupree'), (0.13333333333333333, 'The Night Listener'), (0.10526315789473684, 'Snakes on a Plane'), (0.063492063492063489, 'Superman Returns')], 'Superman Returns': [(0.16666666666666666, 'Snakes on a Plane'), (0.10256410256410256, 'The Night Listener'), (0.090909090909090912, 'Lady in the Water'), (0.063492063492063489, 'Just My Luck'), (0.053333333333333337, 'You, Me, and Dupree')], 'The Night Listener': [(0.2857142857142857, 'Lady in the Water'), (0.18181818181818182, 'Snakes on a Plane'), (0.14814814814814814, 'You, Me, and Dupree'), (0.13333333333333333, 'Just My Luck'), (0.10256410256410256, 'Superman Returns')]}



>>> reload(recommendations)





>>>> recommendations.getRecommendedItems(recommendations.critics,itemsim,'Toby')



[(4.5, 'Lady in the Water')]




Using the MovieLens Dataset
I was successful in loading the datasets from a googeled site. I was not successful in loading the part of the assignment. The
error i keep getting is as follows. No luck with Python tonight, maybe Arizona will have better luck than I'm having!


>>> prefs=recommendations.loadMovieLens()


Traceback (most recent call last):File "", line 1, in File "C:\Python26\lib\recommendations.py", line 163, in loadMovieLens for line in open(path+'/u.item'):IOError: [Errno 2] No such file or directory: 'C:Python26/Lib/u.item'



WEKA



The installation was pretty easy. I ran through the sample data and understand its format and processes.


I used the dataset provided on the class website that loaded directly on to WEKA. This was the part it returned.


=== Stratified cross-validation ===

=== Summary ===


Correctly Classified Instances 235 77.5578 %

Incorrectly Classified Instances 68 22.4422 %

Kappa statistic 0.5443

Mean absolute error 0.1044

Root mean squared error 0.2725

Relative absolute error 52.0476%

Root relative squared error 86.5075 %

Total Number of Instances 303



According to the results there were 235 correctly classified instances equaling to 78% and 68 incorrectly classified instances that made up the other 22%. The majority were correctly classified so I would have to agree that this is a good turn out.

No comments:

Post a Comment