Machine Learning Design Patterns

The Seattle Google Developers Group hosted a talk by the authors of Machine Learning Design Patterns. Though ML is not yet at a point where it’s “just another library”, there’s a body of engineering knowledge building up around ML as people get more experience using models as parts of larger software systems. This book looks to be a nice addition to that trend.

The talk highlights three pattern out of the thirty in the book.

Imbalanced training data

Imbalanced training data is typical in anomaly detection. Here are 3 strategies:

  • Downsample the majority class
  • Upsample minority class (see SMOTE)
  • Weighted classes

Continuous Model Evaluation

In order to monitor models for concept/data drift, we need a way to compare model with ground truth. But, it might be necessary to wait some time to get true labels for model predictions.

Bridged Schema

Data gets better over time. A categorical feature might gain finer grained categories, ex. card -> credit card, debit card, gift card.

Solution:

  • Data augmentation on old examples, replacing one-hot values with floats from estimated distribution

More