Monday, December 1, 2014

Review: Programming Collective Intelligence by Toby Segaran

Machine Learning is a hot topic these days.  O'Reilly just published a book called Thoughtful Machine Learning.  There is a .NET focused ML book being published in January, and one or two Python ML books this spring.  R is everywhere.  There are more than a few start-ups creating machine learning APIs, in addition to the big guys like Google.  Microsoft has an Azure Machine Learning offering that looks very interesting.  And if you want to try a MOOC, a new Machine Learning class is put on line for free every week.

Years and years ago when I was studying Neural Networks in graduate school, we used a commercial add-on to MATLAB and there were very few books on the topic.  Now, rather than being too little information, there seems to be too much.

Programming Collective Intelligence by Toby Segaran was published in 2009.  This is almost ancient history for a computer book, but it does not feel dated.   It says something that even though I received a free review ebook (full disclosure), I purchased a hard copy too.  I immediately recommended this book to a colleague.

There was a lot to like about this book.

First, the book uses Python and I think this is a great choice.  The purpose of books like this is learning and I can't think of a better language for teaching.  Even though I don't know Python very well, I was able to implement a .NET version of the decision tree chapter very quickly.  I never had to look up anything Python related.  It illustrated the points and never got in the way.

Second, the book rarely uses libraries.  In the real world, most projects that use machine learning algorithms will tend to use Libraries and APIs.  But to understand what is going on and to get an intuition for the appropriate approach to a problem, you need to write the algorithms yourself.  To this end Segaran usually implements the most basic version on the algorithm under discussion.  Each chapter includes exercises so that readers can the implementation more sophisticated.

Third, there is a unifying theme.  I love this.  Some of the other books I've read on Machine Learning treat the subject as a grab bag of algorithms and then treat them like a series of articles.  But Programming Collective Intelligence writes about the algorithms only insofar as they help solve Web 2.0 problems.  This gives a coherence to the book that most others lack.

The only think I didn't like about it was it's example of a Neural Network (used in the search chapter otherwise focused on web crawling).  It seemed like a poor fit and just thrown in there because people expect Neural Networks.   But this is a very minor annoyance.

In summary, this is a terrific book and I'd recommend it to anyone who wants to learn about Machine Learning and especially how to use these techniques to make their web sites better.

No comments:

Post a Comment