I got pydelicious to work after editing the file and changing the md5 library to the hashlib libary in the file and then I had to add the feedparser library into the python26 folder. I don't know if its in the right spot, but its working and that's good enough for me.
I tried to build the dataset using the commands on the bottom of page 21 and got several errors:
>>> from deliciousrec import *
>>> delusers=initializeUserDict('programming')
Traceback (most recent call last):
File "
delusers=initializeUserDict('programming')
File "C:\Python26\deliciousrec.py", line 9, in initializeUserDict
for p2 in get_urlposts(p1['href']):
File "C:\Python26\pydelicious.py", line 803, in get_urlposts
d return getrss(url = url)
File "C:\Python26\pydelicious.py", line 794, in getrss
return dlcs_rss_request(tag=tag, popular=popular, user=user, url=url)
File "C:\Python26\pydelicious.py", line 418, in dlcs_rss_request
url = DLCS_RSS + '''url/%s'''%md5.new(url).hexdigest()
NameError: global name 'md5' is not defined
Turns out you can't just change the library name. I have to deal with the md5 deprecation warning instead. At least it works now.
Building the data set, recommending neighbors and links, and building the item comparison data set, and getting recommendations sections all worked, they just didn't look nearly as neat as the book examples because the numbers were not nicely rounded to 3 decimal places. The actual data matched correctly though.
As for movie lens stuff, once I got my file path hardcoded into the def loadMovieLens function it worked great for me.
def loadMovieLens(path='C:\Python26/data/movielens'):
That bit of code was quite important, especially the one slash that goes the other way...
Building the item-based recommendations took about a minute to complete on my desktop, which makes me glad I did not use my much slower laptop for this assignment. My outputs matched the book again, which made me quite happy.
Weka Part 1
This part seemed pretty straightforward to me as long you followed the book's directions you were fine. I'm still working on understanding the algorithms, but it seemed to do a good job in most cases. It had fewer errors than the 1 rule method on the weather data set.
Weka Part 2
I ran the J4.8 tree building algorithm on the data set and got the following results:
=== Summary ===
Correctly Classified Instances 235 77.5578 %
Incorrectly Classified Instances 68 22.4422 %
Kappa statistic 0.5443
Mean absolute error 0.1044
Root mean squared error 0.2725
Relative absolute error 52.0476 %
Root relative squared error 86.5075 %
Total Number of Instances 303
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC Area Class
0.83 0.29 0.774 0.83 0.801 0.809 <50
0.71 0.17 0.778 0.71 0.742 0.809 >50_1
0 0 0 0 0 ? >50_2
0 0 0 0 0 ? >50_3
0 0 0 0 0 ? >50_4
Weighted Avg. 0.776 0.235 0.776 0.776 0.774 0.809
=== Confusion Matrix ===
a b c d e <-- classified as
137 28 0 0 0 | a = <50
40 98 0 0 0 | b = >50_1
0 0 0 0 0 | c = >50_2
0 0 0 0 0 | d = >50_3
0 0 0 0 0 | e = >50_4
It seems to have worked. This method seems like it is fairly accurate, but I definitely would not bet my life on it because there is a 20% chance that it could be wrong. There were also a lot more variables to consider in this set and as far as machine learning goes, it did really well.