Machine Learning Design Patterns

The Seattle Google Developers Group hosted a talk by the authors of Machine Learning Design Patterns. Though ML is not yet at a point where it’s “just another library”, there’s a body of engineering knowledge building up around ML as people get more experience using models as parts of larger software systems. This book looks to be a nice addition to that trend.

The talk highlights three pattern out of the thirty in the book.

Imbalanced training data

Imbalanced training data is typical in anomaly detection. Here are 3 strategies:

Downsample the majority class
Upsample minority class (see SMOTE)
Weighted classes

Continuous Model Evaluation

In order to monitor models for concept/data drift, we need a way to compare model with ground truth. But, it might be necessary to wait some time to get true labels for model predictions.

Bridged Schema

Data gets better over time. A categorical feature might gain finer grained categories, ex. card -> credit card, debit card, gift card.

Solution:

Data augmentation on old examples, replacing one-hot values with floats from estimated distribution

Machine Learning Design Patterns by Valliappa Lakshmanan, Sara Robinson, Michael Munn
Github: ml-design-patterns repo
During Q&A, Sara Robinson mentioned that she had written a blog post about the writing process: Writing a technical book: from idea to print.

Imbalanced training data

Continuous Model Evaluation

Bridged Schema

More