Thoughtful Machine Learning. There is a .NET focused ML book being published in January, and one or two Python ML books this spring. R is everywhere. There are more than a few start-ups creating machine learning APIs, in addition to the big guys like Google. Microsoft has an Azure Machine Learning offering that looks very interesting. And if you want to try a MOOC, a new Machine Learning class is put on line for free every week.
Years and years ago when I was studying Neural Networks in graduate school, we used a commercial add-on to MATLAB and there were very few books on the topic. Now, rather than being too little information, there seems to be too much.
Programming Collective Intelligence by Toby Segaran was published in 2009. This is almost ancient history for a computer book, but it does not feel dated. It says something that even though I received a free review ebook (full disclosure), I purchased a hard copy too. I immediately recommended this book to a colleague.
There was a lot to like about this book.
First, the book uses Python and I think this is a great choice. The purpose of books like this is learning and I can't think of a better language for teaching. Even though I don't know Python very well, I was able to implement a .NET version of the decision tree chapter very quickly. I never had to look up anything Python related. It illustrated the points and never got in the way.
Second, the book rarely uses libraries. In the real world, most projects that use machine learning algorithms will tend to use Libraries and APIs. But to understand what is going on and to get an intuition for the appropriate approach to a problem, you need to write the algorithms yourself. To this end Segaran usually implements the most basic version on the algorithm under discussion. Each chapter includes exercises so that readers can the implementation more sophisticated.
Third, there is a unifying theme. I love this. Some of the other books I've read on Machine Learning treat the subject as a grab bag of algorithms and then treat them like a series of articles. But Programming Collective Intelligence writes about the algorithms only insofar as they help solve Web 2.0 problems. This gives a coherence to the book that most others lack.
The only think I didn't like about it was it's example of a Neural Network (used in the search chapter otherwise focused on web crawling). It seemed like a poor fit and just thrown in there because people expect Neural Networks. But this is a very minor annoyance.
In summary, this is a terrific book and I'd recommend it to anyone who wants to learn about Machine Learning and especially how to use these techniques to make their web sites better.
Monday, December 1, 2014
Tuesday, November 11, 2014
Data Science and scientific programming, in general, favors the REPL approach. R, the lingua franca, of data science is basically an interactive language that allows scripting--as opposed to the other big data science language, Python, which is a scripting language that features a REPL. MATLAB (and I assume Octave, it's Open Source imitator) is similar. Haskell and F#, academic languages used especially in math-heavy industries like Finance, also feature scripts and favor interactive exploration using a REPL. I suspect Julia is similar, but I have yet to take a look at it. All this is to point out that a Test Driven Approach in Data Science is a bit of a novelty. This is the approach that Thoughtful Machine Learning by Matthew Kirk takes.
And it's a pretty good idea. Personally I am a little more on the side of the REPL, but as a .NET developer primarily I don't have many options in that quarter. So, when I wanted to test out some machine learning algorithms, the fist thing I did was create a Unit Test project and create a Test. I was excited then when the very next day I happened to see a Machine Learning book which is explicitly test driven!
I was somewhat disappointed. I thought that the initial introduction oversells TDD. I literally rolled my eyes while reading it several times. "Hypothesize, test, theorize could be called 'red-green-refactor' instead" claims the author on the 3rd page. Yeah... no. They could not be. There is nothing remotely similar about forming a hypothesis and creating a failing test; indeed, they are opposites. I would have thought the argument would have focused more on producing reproducible research or providing regressions when swapping algorithms. I don't recall these being touched on. A valuable part of the introduction was the list of risks in Machine Learning and a discussion of how to use automated tests to guard against these risks. It was good and I wish that this section was expanded.
Next, the author is a bit touchy on the subject of Ruby. Most Machine Learning books use Python or R, but the author favors Ruby because of the great automated test abstractions. Fair enough. I have not a lot of experience in Python or Ruby, but I will say this: I could understand 99% of the Python code in Toby Segaran's Programming Collective Intelligence instantly, but found most of the code in Thoughtful Machine Leaning to be gibberish sprinkled with pipes. Because of this I mostly read this book from the perspective, as the author puts it, of the CTO or Business Analyst.
After the introductory chapter on TDD, there is an overview of Machine Learning algorithms. I thought it was a bit superficial and suffered from introducing terms and jargon before explaining them. "The curse of dimensionality" was thrown around a few times but not discussed or defined until a sidebar a chapter or so later. I wish the introduction was more detailed and explained the criteria for picking the algorithms detailed in the book. Why were tree techniques omitted, for instance?
The rest of the book covers algorithms. The chapters follow the pattern: introduce a technique, describe a problem, write some tests to try to solve it, give a summary. I have to say that the example problems really didn't capture my imagination. When discussing k-Nearest Neighbor, the example is detecting beards and glasses in photographs. Ok... As a comparison, Segaran's book used eBay's web API as a source for price prediction. Which do you find more helpful?
In the Naive Bayes Classifier chapter, the cliche Spam detector is used. Which is fine, but this is all not very original. Why try to compete with Paul Graham's classic A Plan for Spam? Now, I understand that the primary goal of the book is teaching and that this is the canonical example, but it would be nice to see something more creative.
It was nice to see a popular Machine Learning book that covers Hidden Markov Models. I admit that I need to revisit this chapter a few more times, because I haven't fully internalized it, but this is a very interesting technique and I wish there were more popular treatments.
Finally, I read this book primarily on a Kindle (full-disclosure: I got a free review eBook from the publisher). It isn't great. There is no Table of Contents for some reason and the formatting isn't as clear. When I opened the PDF, I was surprised to see how beautifully it was laid out.
In summary, I was a bit disappointed by the book but I am glad I looked at it. Get it if you are a Rubyist, really into the "TDD way", and want a fairly high-level view. Otherwise, I would recommend Toby Segaran's book. Alternatively, check out free online courses from Coursera and Udacity and edX on Machine Learning. (Or see the excellent resources on FastML.com.
Wednesday, October 29, 2014
- Look around the internet for lists of study materials (often these were wrong though or outdated)
- Find several articles or book chapters.
- Brainstorm projects and exercises that would apply to the skill
Once this was done, I had a giant study list that I could work off of, and gauge the speed of my learning. This worked well but it had the drawback that I spent almost as much time identifying study materials and thinking up exercises, as I did studying and practicing. Further, the material was uneven.
This is where a book like "Exam Ref 70-486: Developing ASP.NET MVC 4 Web Applications" by William Penberthy comes in handy. It already gathers together in one places discussions on each subject the exam covers, as well as exercises to work through, and links to further information.
The chapters are concise and relatively well-written, and since I have a good bit of experience with MVC, I could tell that the author knows his subject.
I the end I decided not to pursue taking the exam (my interests have taken me in different directions), but if I ever decide to take another Microsoft Exam, I'll definitely consider getting a book from this series.
Saturday, March 30, 2013
- O'Reilly: http://shop.oreilly.com/product/0636920027713.do
Monday, January 28, 2013
|Programming ASP.NET MVC 4|
I wanted to review this book (and full disclosure: I got the ebook for free from O’Reilly), because I’m looking to learn the new MVC 4 features, having used MVC 3 for a little over a year. This book was OK for that goal but there was a good bit that I already knew. Luckily, there is a section that addresses what is new and provides a link to the appropriate chapters. Those chapters were good, but tended to be rather basic, and not all that much better than what could be gleaned from the web. Having said this, I've also skimmed ASP.NET MVC 4 in Action and I think Programming MVC 4 covers the new material better and in a way better integrated into the text.
I think this book would be great for a beginner who is approaching ASP.NET MVC [#] for the first time. The introductory chapters are great and the reference application is nice (the feel is very similar to Pro ASP.NET MVC 3 Framework—which I used to learn MVC last year—though I haven’t read the latest version that one). The book explains the ‘theory’ behind the framework well.
I liked the chapter “Client-Side Optimization Techniques”: it is basically the Cliff Notes of High Performance Web Sites, with specific applications to the .NET platform. The “Parallel, Asynchronous, and Real-Time Data Operations” was surprisingly thorough, covering the subjects better than other MVC books I've looked into.
In summary, I think this book would be perfect for a beginner wanting to get a complete picture of MVC 4 and who already knows the .NET framework and C#. It would give them a good foundation for digging deeper into subjects that are important to them, though they would probably need to look elsewhere for that depth. For those of us looking for the new version 4 stuff only, it probably isn’t worth it unless you like reading and owning big giant books.
- O'Reilly: http://shop.oreilly.com/product/0636920024040.do
- Amazon: http://www.amazon.com/gp/product/1449320317
Tuesday, October 2, 2012
I found much of the book helpful, but it suffers from being uneven and unfocused. There are 14 chapters in the book but 80% of the content is in chapters 9 through 13. The first 8 chapters and introductory, repetitive, and short. It would have been better to have a single 10 or 15 page introduction than eight 2 page chapters. Even beginners could safely skip to chapter 9.
The best chapter in the book is chapter 12, “Design Patterns in jQuery.” It describes patterns by showing them in use in jQuery, and providing commentary on the actual source. If the entire book had been organized like chapter 12, this would be a five star review.
One nice thing about the book is the many references included. Almost everything was backed up by at least one, sometimes several, blog posts or articles by experts.
The eBook originally had many errors in its diagrams, but O’Reilly has updated it recently, and most of the problems seem to be fixed. I was disappointed with the kindle version which I read on my iPod; I had to switch to the PDF at times to understand the book. Usually O’Reilly creates excellently formatted eBooks, so that was surprising.
There was a lot of good in this book, especially in the later chapters, but I would have a hard time recommending it for purchase, when the content is available online in a searchable form for free.
- O'Reilly: http://shop.oreilly.com/product/0636920025832.do
Monday, September 17, 2012
|Getting Started with D3|
Its design is elegant and powerful, but it can be pretty daunting. The website includes a large number of beautiful visualizations (which I definitely recommend checking out), some bare-bones API documentation, as well as links to introductions and talks. Even so, it can be difficult to know where to begin to learn the library, and I was happy to see Dewar’s book devoted to it; I’ve been meaning to try out D3 for months but haven’t been able to work up the time or courage.
The book is structured around a half-dozen visualization based upon NYC’s mass transit system, with each chapter describing the creation of increasingly complex output. Even the most complex visualizations are not treated in a great deal of depth, however. The source and input files are provided on the book’s website. Unfortunately, there are no examples with dynamic content.
The book might have needed closer editing. I noticed a few mistakes in the code of the printed book, and at times the writing wasn’t very good, in the style of an informal blog post. It would have been nice if the author had provided exercises.
I feel after reading the book that I know where to start with learning D3. In short: if you need a quick start guide to D3, the book will probably be helpful. If you're looking for something more from its 70 pages, you'll be disappointed.
(I read the kindle version, on my iPod, and the eBook conversion was excellent.)
- O'Reilly: http://shop.oreilly.com/product/0636920025429.do
- Amazon: http://www.amazon.com/Getting-Started-D3-Mike-Dewar/dp/1449328792