



Now, the results of war movie keyword clustering:








One last thing to keep in mind is displaying the data. As with all data the visualization has to be a healthy mix of human understandable format, and enough detail to still be relevant. The algorithms described before do a good job of finding the solution, but displaying it can be a whole other issue. It has to be human readable and still retain its relevance.
Below are 2 example of the same data:
BAD


These 2 pictures show the exact same data, but the second is much more human readable. This because the lines are not crossed. Keeping lines from crossing is the most common way of making data easy to read. This is done by counting lines, keeping track of lengths and positions and using an algorithm to make sure that none of these lines are crossing using this data. Ironically enough, this is very often done using a genetic algorithm. Also, to keep the data from being oddly distanced, min and max line lengths are often declared. This keeps thedata easy to read and it also keeps it auto genereated, which is important when dealing with very large data sets.

In this visualization, each surname is given a bubble, depending on the amount of times the name was found determines the size of the bubble. This is one of the more easily understood visualization to understand.
One of the other visualizations I found was the one that indicated the average home price index in the S&P out of 14 different states between 1998 and 2008, which is pictured below.

Some of the data that I inputed were the Governator's favor rates by quarter. This information was pulled by swivel.com

Another way we decided to cluster the data was by age and sex of the passenger, and whether or not they survived. What makes this more interesting is the fact that a fair amount of female children did not survive. The upper-right area is female child passengers, and red (did not survive) is the dominant color in that area. You can manipulate the visualizations in many ways, in order to discover interesting trends in the data set.


I was also able to determine if they were from other coalition forces. Here are the number of UK military that died as a result of combat related deaths.

Matching Products
This section uncovered an error in
recommendations.topMatches(movies,'Superman Returns')
[(0.65795169495976946, 'You, Me, and Dupree'), (0.48795003647426888, 'Lady in the Water'), (0.11180339887498941, 'Snakes on a Plane'), (-0.17984719479905439, 'The Night Listener'),
(-0.46625240412015717, 'Just My Luck')]
The text book rounds down and supplies (-0.422, 'Just My Luck').
Went to google for the feedparser, downloaded the zip file extracted and placed the feedparser.py file in my library and proceeded to the textbooks instructions.Set up pydelicious and got about 2 pages of popular posts on programming.
So far so good, I used the files of code that were provided on the class website. Then I typed the following:
from pydelicious import get_popular,get_userposts,get_urlposts
>>> from deliciousrec import *
>>> delusers=initializeUserDict('programming')
>>> delusers ['arturousmc']={}
>>> fillItems(delusers)
>>> import random
>>> user=delusers.keys( )[random.randint(0,len(delusers)-1)]
>>> user
u'chaostheory'
>>> import recommendations
>>> recommendations.topMatches(delusers,user)
[(0.11907894736842106, u'synewaves'), (0.11907894736842106, u'mangosi'), (0.05131578947368421, u'xulu'), (0.05131578947368421, u'wdr1'), (0.05131578947368421, u'thomd')]
>>> recommendations.getRecommendations(delusers,user)[0:10]
[(0.19082672706681769, u'http://colorschemedesigner.com/'), (0.17667044167610421, u'http://python.net/~goodger/projects/pycon/2007/idiomatic/handout.html'), (0.14665911664779163, u'http://woork.blogspot.com/2009/01/beautiful-datepickers-and-calendars-for.html'), (0.14665911664779163, u'http://nettuts.com/freebies/cheat-sheets/jquery-cheat-sheet/'), (0.13250283125707815, u'http://www.noupe.com/tools/15-incredible-mac-apps-for-freelance-web-designers.html'), (0.13250283125707815, u'http://css.dzone.com/news/how-to-develop-a-firefox-exten'), (0.10249150622876559, u'http://www.vectorials.com/index.html'), (0.10249150622876559, u'http://www.templatemonster.com/'), (0.10249150622876559, u'http://www.smashingmagazine.com/2009/01/20/50-extremely-useful-php-tools/'), (0.10249150622876559, u'http://www.smashingmagazine.com/2008/01/14/monday-inspiration-data-visualization-and-infographics/')]
>>> url=recommendations.getRecommendations(delusers,user)[0][1]
>>> recommendations.topMatches(recommendations.transformPrefs(delusers),url)
[(0.48976000741676007, u'http://www.yelp.com/biz/delancey-street-foundation-movers-los-angeles#hrid:4SlRsxSrZDu8DbEvrCWdhg'), (0.48976000741676007, u'http://www.webstandards.org/action/acid2/guide/'), (0.48976000741676007, u'http://www.wasabi.net.cn/'), (0.48976000741676007, u'http://www.theonion.com/content/news/obama_disappointed_cabinet_failed'), (0.48976000741676007, u'http://www.schematic.com/#//')]
Finally I have added a search engine to del.icio.us!
In building the item comparison dataset, I added the code asked by the text to recommendations.py. and the following happened:
>>> reload(recommendations)
>>>> itemsim=recommendations.calculateSimilarItems(recommendations.critics)
>>> itemsim
{'Lady in the Water': [(0.40000000000000002, 'You, Me, and Dupree'), (0.2857142857142857, 'The Night Listener'), (0.22222222222222221, 'Snakes on a Plane'), (0.21052631578947367, 'Just My Luck'), (0.090909090909090912, 'Superman Returns')], 'Snakes on a Plane': [(0.22222222222222221, 'Lady in the Water'), (0.18181818181818182, 'The Night Listener'), (0.16666666666666666, 'Superman Returns'), (0.10526315789473684, 'Just My Luck'), (0.05128205128205128, 'You, Me, and Dupree')], 'You, Me, and Dupree': [(0.40000000000000002, 'Lady in the Water'), (0.18181818181818182, 'Just My Luck'), (0.14814814814814814, 'The Night Listener'), (0.053333333333333337, 'Superman Returns'), (0.05128205128205128, 'Snakes on a Plane')], 'Just My Luck': [(0.21052631578947367, 'Lady in the Water'), (0.18181818181818182, 'You, Me, and Dupree'), (0.13333333333333333, 'The Night Listener'), (0.10526315789473684, 'Snakes on a Plane'), (0.063492063492063489, 'Superman Returns')], 'Superman Returns': [(0.16666666666666666, 'Snakes on a Plane'), (0.10256410256410256, 'The Night Listener'), (0.090909090909090912, 'Lady in the Water'), (0.063492063492063489, 'Just My Luck'), (0.053333333333333337, 'You, Me, and Dupree')], 'The Night Listener': [(0.2857142857142857, 'Lady in the Water'), (0.18181818181818182, 'Snakes on a Plane'), (0.14814814814814814, 'You, Me, and Dupree'), (0.13333333333333333, 'Just My Luck'), (0.10256410256410256, 'Superman Returns')]}
>>> reload(recommendations)
>>>> recommendations.getRecommendedItems(recommendations.critics,itemsim,'Toby')
[(4.5, 'Lady in the Water')]
Using the MovieLens Dataset
I was successful in loading the datasets from a googeled site. I was not successful in loading the part of the assignment. The error i keep getting is as follows. No luck with Python tonight, maybe Arizona will have better luck than I'm having!
>>> prefs=recommendations.loadMovieLens()