Matt Reed's Coding/Learning Blog

Monday, December 1, 2014

Review: Programming Collective Intelligence by Toby Segaran

Machine Learning is a hot topic these days. O'Reilly just published a book called Thoughtful Machine Learning. There is a .NET focused ML book being published in January, and one or two Python ML books this spring. R is everywhere. There are more than a few start-ups creating machine learning APIs, in addition to the big guys like Google. Microsoft has an Azure Machine Learning offering that looks very interesting. And if you want to try a MOOC, a new Machine Learning class is put on line for free every week.

Years and years ago when I was studying Neural Networks in graduate school, we used a commercial add-on to MATLAB and there were very few books on the topic. Now, rather than being too little information, there seems to be too much.

Programming Collective Intelligence by Toby Segaran was published in 2009. This is almost ancient history for a computer book, but it does not feel dated. It says something that even though I received a free review ebook (full disclosure), I purchased a hard copy too. I immediately recommended this book to a colleague.

There was a lot to like about this book.

First, the book uses Python and I think this is a great choice. The purpose of books like this is learning and I can't think of a better language for teaching. Even though I don't know Python very well, I was able to implement a .NET version of the decision tree chapter very quickly. I never had to look up anything Python related. It illustrated the points and never got in the way.

Second, the book rarely uses libraries. In the real world, most projects that use machine learning algorithms will tend to use Libraries and APIs. But to understand what is going on and to get an intuition for the appropriate approach to a problem, you need to write the algorithms yourself. To this end Segaran usually implements the most basic version on the algorithm under discussion. Each chapter includes exercises so that readers can the implementation more sophisticated.

Third, there is a unifying theme. I love this. Some of the other books I've read on Machine Learning treat the subject as a grab bag of algorithms and then treat them like a series of articles. But Programming Collective Intelligence writes about the algorithms only insofar as they help solve Web 2.0 problems. This gives a coherence to the book that most others lack.

The only think I didn't like about it was it's example of a Neural Network (used in the search chapter otherwise focused on web crawling). It seemed like a poor fit and just thrown in there because people expect Neural Networks. But this is a very minor annoyance.

In summary, this is a terrific book and I'd recommend it to anyone who wants to learn about Machine Learning and especially how to use these techniques to make their web sites better.

Tuesday, November 11, 2014

Review: Thoughtful Machine Learning by Matthew Kirk

There seems to be a little tension these days between REPL people and Unit Test people. Some users of languages which feature only a Read-Eval-Print Loop (REPL) claim that unit tests are heavyweight and unnecessary. Others, I suppose, view any development workflow other than deliberate Red-Green-Refactor as deviant, unprofessional, and irresponsible. These are extreme cases. Clearly, using a REPL and Unit Test in development are not mutually exclusive, but we all have our tendencies on one side or the other.

Data Science and scientific programming, in general, favors the REPL approach. R, the lingua franca, of data science is basically an interactive language that allows scripting--as opposed to the other big data science language, Python, which is a scripting language that features a REPL. MATLAB (and I assume Octave, it's Open Source imitator) is similar. Haskell and F#, academic languages used especially in math-heavy industries like Finance, also feature scripts and favor interactive exploration using a REPL. I suspect Julia is similar, but I have yet to take a look at it. All this is to point out that a Test Driven Approach in Data Science is a bit of a novelty. This is the approach that Thoughtful Machine Learning by Matthew Kirk takes.

And it's a pretty good idea. Personally I am a little more on the side of the REPL, but as a .NET developer primarily I don't have many options in that quarter. So, when I wanted to test out some machine learning algorithms, the fist thing I did was create a Unit Test project and create a Test. I was excited then when the very next day I happened to see a Machine Learning book which is explicitly test driven!

I was somewhat disappointed. I thought that the initial introduction oversells TDD. I literally rolled my eyes while reading it several times. "Hypothesize, test, theorize could be called 'red-green-refactor' instead" claims the author on the 3rd page. Yeah... no. They could not be. There is nothing remotely similar about forming a hypothesis and creating a failing test; indeed, they are opposites. I would have thought the argument would have focused more on producing reproducible research or providing regressions when swapping algorithms. I don't recall these being touched on. A valuable part of the introduction was the list of risks in Machine Learning and a discussion of how to use automated tests to guard against these risks. It was good and I wish that this section was expanded.

Next, the author is a bit touchy on the subject of Ruby. Most Machine Learning books use Python or R, but the author favors Ruby because of the great automated test abstractions. Fair enough. I have not a lot of experience in Python or Ruby, but I will say this: I could understand 99% of the Python code in Toby Segaran's Programming Collective Intelligence instantly, but found most of the code in Thoughtful Machine Leaning to be gibberish sprinkled with pipes. Because of this I mostly read this book from the perspective, as the author puts it, of the CTO or Business Analyst.

After the introductory chapter on TDD, there is an overview of Machine Learning algorithms. I thought it was a bit superficial and suffered from introducing terms and jargon before explaining them. "The curse of dimensionality" was thrown around a few times but not discussed or defined until a sidebar a chapter or so later. I wish the introduction was more detailed and explained the criteria for picking the algorithms detailed in the book. Why were tree techniques omitted, for instance?

The rest of the book covers algorithms. The chapters follow the pattern: introduce a technique, describe a problem, write some tests to try to solve it, give a summary. I have to say that the example problems really didn't capture my imagination. When discussing k-Nearest Neighbor, the example is detecting beards and glasses in photographs. Ok... As a comparison, Segaran's book used eBay's web API as a source for price prediction. Which do you find more helpful?

In the Naive Bayes Classifier chapter, the cliche Spam detector is used. Which is fine, but this is all not very original. Why try to compete with Paul Graham's classic A Plan for Spam? Now, I understand that the primary goal of the book is teaching and that this is the canonical example, but it would be nice to see something more creative.

It was nice to see a popular Machine Learning book that covers Hidden Markov Models. I admit that I need to revisit this chapter a few more times, because I haven't fully internalized it, but this is a very interesting technique and I wish there were more popular treatments.

Finally, I read this book primarily on a Kindle (full-disclosure: I got a free review eBook from the publisher). It isn't great. There is no Table of Contents for some reason and the formatting isn't as clear. When I opened the PDF, I was surprised to see how beautifully it was laid out.

In summary, I was a bit disappointed by the book but I am glad I looked at it. Get it if you are a Rubyist, really into the "TDD way", and want a fairly high-level view. Otherwise, I would recommend Toby Segaran's book. Alternatively, check out free online courses from Coursera and Udacity and edX on Machine Learning. (Or see the excellent resources on FastML.com.

Product Information:

Wednesday, October 29, 2014

Review: "Exam Ref 70-486: Developing ASP.NET MVC 4 Web Applications"

Last year I took and passed the Microsoft's Exam 70-480 (Programming HTML5/CSS3/JavaScript). I perhaps over-studied, living a breathing JavaScript for a couple months. This was prior to any published books on the test so I had to find my own way. In any case, I found a process that worked well. For each skill on the Exam's website, I did the following.

Look around the internet for lists of study materials (often these were wrong though or outdated)
Find several articles or book chapters.
Brainstorm projects and exercises that would apply to the skill

Once this was done, I had a giant study list that I could work off of, and gauge the speed of my learning. This worked well but it had the drawback that I spent almost as much time identifying study materials and thinking up exercises, as I did studying and practicing. Further, the material was uneven.

This is where a book like "Exam Ref 70-486: Developing ASP.NET MVC 4 Web Applications" by William Penberthy comes in handy. It already gathers together in one places discussions on each subject the exam covers, as well as exercises to work through, and links to further information.

The chapters are concise and relatively well-written, and since I have a good bit of experience with MVC, I could tell that the author knows his subject.

I the end I decided not to pursue taking the exam (my interests have taken me in different directions), but if I ever decide to take another Microsoft Exam, I'll definitely consider getting a book from this series.

Product Information:

Amazon: http://www.amazon.com/Exam-70-486-Developing-ASP-NET-Applications/dp/0735677220

Saturday, March 30, 2013

Review: "JavaScript Enlightenment"

JavaScript Enlightenment

It’s hard to keep up with the O’Reilly’s output of pancake sized JavaScript books. On the one hand, this is a refreshing trend in an industry where publishers release expensive, thousand page books that will be obsolete within months. On the other hand, with so many small books overlapping content, goals and target audiences, it is hard to know where to begin. JavaScript Enlightenment (<insert your own Voltaire joke here>) is one of the most recent of these short O’Reilly JavaScript offerings, the second that I've read which originated as a free web book (the other was "Learning JavaScript DesignPatterns" by Addy Osmani).

This is a short book, with short sections, that covers the basics of the JavaScript language. Each section of the book follows the same pattern: it starts with a description of a concept, follows with a code snippet linked to jsfiddle, and ends with a summary. This pattern is followed so strictly that it begins to get tiresome. Phrases like “What I really need you to grok…” are continually repeated, because there are only so many ways to write a summary paragraph. The writing style is overly informal and often imprecise. No doubt this was intentional, as a way to make newcomers more comfortable. I think this was a mistake; presumably newcomers to writing code in JavaScript are not newcomers to reading books in English.

The short chapters and linked code examples, however, will probably be very helpful to beginning JavaScript programmers. And the code examples are fine, although not particularly interesting. For instance, a recurring object discussed is the ‘cody’ object, an object about the author, with properties for living, age, and gender, to which the author adds a getGender() method. Um, ok. But why on earth would anyone write code like that in JavaScript? Surely a better example could be found.

I decided to read this book because of the title, but I probably should have read the description more carefully before getting it. As an intermediate to advanced JavaScript developer, I wasn't the target audience. The title is mostly just a gimmick and ‘Some Annotated JavaScript snippets for the Learner’ would probably be a more fitting.

This is a print version of a free online book. I read the eBook, which I received for free from the publisher, with the Kindle app on an iPod. It still retains the feel of a free online book. If you are new to JavaScript, you might want to read this book online, but I doubt you’d want to buy a hard copy, unless maybe to give thank you money to the author and publisher. In any case, there are several high quality free JavaScript language books online (such as Eloquent JavaScript) that may serve the beginner’s purpose even better, not to mention websites like Codecademy.

If you already know JavaScript reasonably well, you probably won’t get much out of either the print or the free version.

Product Information:

Monday, January 28, 2013

Review: Programming ASP.NET MVC 4 by Jess Chadwick, Todd Snyder, and Hrusikesh Panda

Programming ASP.NET MVC 4

This one took me a long time to get through and I didn't even read it all that deeply. Unlike the other O’Reilly books I've read recently, this one is HUGE. It seems like O’Reilly has two types of books: giant 500 page reference tomes, and focused technology pancakes between 50 and 150 pages. I think I prefer the pancakes. They’re easier to read and they are more likely to address what you purchase them for. On the other hand, there can be a lot of repetition: if you've read their JavaScript books you know what I mean.

I wanted to review this book (and full disclosure: I got the ebook for free from O’Reilly), because I’m looking to learn the new MVC 4 features, having used MVC 3 for a little over a year. This book was OK for that goal but there was a good bit that I already knew. Luckily, there is a section that addresses what is new and provides a link to the appropriate chapters. Those chapters were good, but tended to be rather basic, and not all that much better than what could be gleaned from the web. Having said this, I've also skimmed ASP.NET MVC 4 in Action and I think Programming MVC 4 covers the new material better and in a way better integrated into the text.

I think this book would be great for a beginner who is approaching ASP.NET MVC [#] for the first time. The introductory chapters are great and the reference application is nice (the feel is very similar to Pro ASP.NET MVC 3 Framework—which I used to learn MVC last year—though I haven’t read the latest version that one). The book explains the ‘theory’ behind the framework well.

I liked the chapter “Client-Side Optimization Techniques”: it is basically the Cliff Notes of High Performance Web Sites, with specific applications to the .NET platform. The “Parallel, Asynchronous, and Real-Time Data Operations” was surprisingly thorough, covering the subjects better than other MVC books I've looked into.

In summary, I think this book would be perfect for a beginner wanting to get a complete picture of MVC 4 and who already knows the .NET framework and C#. It would give them a good foundation for digging deeper into subjects that are important to them, though they would probably need to look elsewhere for that depth. For those of us looking for the new version 4 stuff only, it probably isn’t worth it unless you like reading and owning big giant books.

Product Information:

O'Reilly: http://shop.oreilly.com/product/0636920024040.do
Amazon: http://www.amazon.com/gp/product/1449320317

Tuesday, October 2, 2012

Review: "Learning JavaScript Design Patterns" by Addy Osmani

Learning JavaScript Design Patterns

Since I started re-learning JavaScript after an absence of 7 or 8 years, I've found all sorts of great free resources online and purchased a ton of great books (JavaScript: The Good Parts, Maintainable JavaScript, Secrets of the JavaScript Ninjas (MEAP), JavaScript Web Applications, and Professional JavaScriptfor Web Developers); I've read two of them so far. This new book, Learning JavaScript Patterns by Addy Osmani, was released both freely via Creative Commons on the internet and through O’Reilly, and I decided to add it to my stack. (Full disclosure: I got the ebook for free from O’Reilly).

I found much of the book helpful, but it suffers from being uneven and unfocused. There are 14 chapters in the book but 80% of the content is in chapters 9 through 13. The first 8 chapters and introductory, repetitive, and short. It would have been better to have a single 10 or 15 page introduction than eight 2 page chapters. Even beginners could safely skip to chapter 9.

Unlike other Pattern books that are systematically organized, it is hard to understand the structure of the book. In the introductory chapters a table is given of the patterns described in the rest of the book, but it doesn't include page references. Sometimes patterns are mentioned before they are actually described. The GOF patterns described rely, as you would expect, on Design Patterns by Gamma et al., but I don’t think the author described well how (or if) these patterns fit into JavaScript other than showing an implementation. The chapter on User Interface patterns was good, heavily relies on Martin Fowler’s work, but seemed out of place where it was located. The jQuery Plug-in Design Patterns chapter, was excellent and new, at least, to me.

The best chapter in the book is chapter 12, “Design Patterns in jQuery.” It describes patterns by showing them in use in jQuery, and providing commentary on the actual source. If the entire book had been organized like chapter 12, this would be a five star review.

One nice thing about the book is the many references included. Almost everything was backed up by at least one, sometimes several, blog posts or articles by experts.

The eBook originally had many errors in its diagrams, but O’Reilly has updated it recently, and most of the problems seem to be fixed. I was disappointed with the kindle version which I read on my iPod; I had to switch to the PDF at times to understand the book. Usually O’Reilly creates excellently formatted eBooks, so that was surprising.

There was a lot of good in this book, especially in the later chapters, but I would have a hard time recommending it for purchase, when the content is available online in a searchable form for free.

Product Information:

Monday, September 17, 2012

Review: “Getting Started with D3” by Mike Dewar

Getting Started with D3

D3 (Data-Driven Documents) is an interesting library for creating browser-based visualizations. Unlike most JavaScript libraries I’ve looked at, D3 provides tools to declaratively transform data into graphics formats, rather than providing a pre-canned monolithic solution to various charting problems. The successor of Protovis, D3 was specifically created to work with dynamic data.

Its design is elegant and powerful, but it can be pretty daunting. The website includes a large number of beautiful visualizations (which I definitely recommend checking out), some bare-bones API documentation, as well as links to introductions and talks. Even so, it can be difficult to know where to begin to learn the library, and I was happy to see Dewar’s book devoted to it; I’ve been meaning to try out D3 for months but haven’t been able to work up the time or courage.

I like the book’s format: focused, concise. With such a huge topic it would be possible to write a formidable, 600 page book that only a few people would actually read. Especially with all the resources on the internet available, what readers like me need is a push in the right direction. D3 relies on open-standards and modern web browsers, but the book doesn't waste time explaining JavaScript, CSS, or SVG. There are recommended books in the preface, but nothing more. D3 supports JSON, XML and CSV as data sources, but Dewar (wisely, I think) focused solely on just JSON.

The book is structured around a half-dozen visualization based upon NYC’s mass transit system, with each chapter describing the creation of increasingly complex output. Even the most complex visualizations are not treated in a great deal of depth, however. The source and input files are provided on the book’s website. Unfortunately, there are no examples with dynamic content.

The book might have needed closer editing. I noticed a few mistakes in the code of the printed book, and at times the writing wasn’t very good, in the style of an informal blog post. It would have been nice if the author had provided exercises.

I feel after reading the book that I know where to start with learning D3. In short: if you need a quick start guide to D3, the book will probably be helpful. If you're looking for something more from its 70 pages, you'll be disappointed.

(I read the kindle version, on my iPod, and the eBook conversion was excellent.)

Product Information:

O'Reilly: http://shop.oreilly.com/product/0636920025429.do
Amazon: http://www.amazon.com/Getting-Started-D3-Mike-Dewar/dp/1449328792