Machine Learning Design Patterns
The Seattle Google Developers Group hosted a talk by the authors of Machine Learning Design Patterns. Though ML is not yet at a point where it’s “just another library”, there’s a body of engineering knowledge building up around ML as people get more experience using models as parts of larger software systems. This book looks to be a nice addition to that trend.
The talk highlights three pattern out of the thirty in the book.
Imbalanced training data
Imbalanced training data is typical in anomaly detection. Here are 3 strategies:
- Downsample the majority class
- Upsample minority class (see SMOTE)
- Weighted classes
Continuous Model Evaluation
In order to monitor models for concept/data drift, we need a way to compare model with ground truth. But, it might be necessary to wait some time to get true labels for model predictions.
Bridged Schema
Data gets better over time. A categorical feature might gain finer grained categories, ex. card -> credit card, debit card, gift card.
Solution:
- Data augmentation on old examples, replacing one-hot values with floats from estimated distribution
More
- Machine Learning Design Patterns by Valliappa Lakshmanan, Sara Robinson, Michael Munn
- Github: ml-design-patterns repo
- During Q&A, Sara Robinson mentioned that she had written a blog post about the writing process: Writing a technical book: from idea to print.