Thursday, April 30, 2009

Portfolio Assignment 9

Our final project added on to the movie clustering project we worked on earlier this year. We decided to focus our clustering on genre, year, and rating. Our goal was to see how time affected the criteria for each genre. The most prominent example is how the horror genre has changed over time. Currently, most people would find horror movies from the 1930s and 1940s completely not scary and maybe a little bit funny, but when those movies first debuted, they were some of the scariest movies ever made. Our dendrogram was able to differentiate older horror movies from newer ones and put them in different clusters.
We chose to cluster genre, year, and rating because we decided adding keywords would vastly complicate our project and most likely be of minimal benefit to the clustering, since it would not help what we were trying to study. A lot of words have multiple meanings, such as vampire. There are some drama and chick flick movies that also have vampires in them and these words would interfere with getting a good horror movie cluster.
As far as future improvments to our clustering algorithm, we would have liked to work out a way to differentiate between subgenres. There are multiple kinds of chick flicks and it would be nice to have a way to separate them, especially to remove the teen drama movies into their own group. Often a few legitimately good chick flicks are released (as in, these are actually really good movies), but so are several not very good teen melodramatic ones, and they would all share the same year. By breaking the chick flick category into subgenres, the legitimately high rated movies will not have those thrown in with them.