Metrics¶
Note: To render math equations, enable
pymdownx.arithmatex
in yourmkdocs.yml
and include MathJax. See the user guide for details.
mmm-eval provides a suite of metrics to evaluate MMM performance. This guide explains each metric and how to interpret the results.
Note that these metrics do not claim to be entirely comprehensive, but instead aim to provide an overall view of MMM performance across several key dimensions.
Overview¶
mmm-eval calculates several key metrics across different validation tests:
Accuracy Metrics¶
- MAPE (Mean Absolute Percentage Error): Average percentage error between predictions and actual values
- SMAPE (Symmetric Mean Absolute Percentage Error): Symmetric version of MAPE that treats over and underestimation equally
- R-squared: Proportion of variance explained by the model
Stability Metrics¶
- Parameter Change: Percentage change in model parameters
- Channel Stability: Stability of media channel coefficients
- Intercept Stability: Stability of baseline parameters
Metric Definitions¶
MAPE (Mean Absolute Percentage Error)¶
Interpretation: - Lower is better: 0% = perfect predictions - Scale: Expressed as a percentage, e.g. 15.0 rather than 0.15
SMAPE (Symmetric Mean Absolute Percentage Error)¶
Interpretation: - Lower is better: 0% = perfect predictions - Scale: Expressed as a percentage, e.g. 15.0 rather than 0.15 - Symmetric: Treats over and underestimation equally (unlike MAPE) - Robust: Less sensitive to extreme values and zero actual values
Advantages over MAPE: - Symmetry: 10% overestimation and 10% underestimation give the same SMAPE value - Zero handling: Better handling of zero or near-zero actual values - Bounded: Upper bound of 200% vs. unbounded MAPE
R-squared (Coefficient of Determination)¶
Interpretation: - Range: 0 to 1 (higher is better) - Scale: 1 = perfect fit, 0 = no predictive power - Benchmark: > 0.8 is generally good
Test-Specific Metrics¶
Holdout Accuracy Test Metrics¶
Metrics calculated on out-of-sample predictions using train/test splits.
- MAPE: Overall prediction accuracy
- SMAPE: Symmetric prediction accuracy
- R-squared: Model fit quality
In-Sample Accuracy Test Metrics¶
Metrics calculated on in-sample predictions using the full dataset.
- MAPE: Model fit accuracy
- SMAPE: Symmetric model fit accuracy
- R-squared: Model fit quality
Cross-Validation Metrics¶
- Mean MAPE: Average out-of-sample accuracy
- Std MAPE: Consistency of accuracy across folds
- Mean SMAPE: Average out-of-sample symmetric accuracy
- Std SMAPE: Consistency of symmetric accuracy across folds
- Mean R-squared: Average out-of-sample fit
- Std R-squared: Consistency of fit across folds
Refresh Stability Metrics¶
- Mean Percentage Change: Average parameter change
- Std Percentage Change: Consistency of parameter changes
- Channel-specific Stability: Stability per media channel
Perturbation Metrics¶
- Percentage Change: Change in ROI estimates when input data is perturbed
- Channel-specific Sensitivity: Sensitivity of each media channel to data perturbations
- Model Robustness: Overall model stability to input noise
Placebo Metrics¶
- Shuffled Channel ROI: Estimated ROI for the spurious (shuffled) channel, ideally close as possible to -100% (indicating the model learned that the feature had no impact).
Note that the result of this test will be heavily impacted by the choice of media prior. Consider using relatively uninformative media priors in order to reduce the chance of the model picking up spurious correlations.
Interpreting Results¶
Good Performance Indicators¶
- MAPE < 15%: Good prediction accuracy
- SMAPE < 15%: Good symmetric prediction accuracy
- R-squared > 0.8: Strong model fit
- Low perturbation sensitivity: Robust to input noise
- Low placebo ROI (≤ -50%): Correctly identifies spurious features and assigns them low effect sizes
Thresholds and Benchmarks¶
Rough Benchmarks¶
The below is only intended to be an approximate guide, as performance depends on multiple factors including the quality and quantity of the training data, as well as the suitability of the model specification to the problem at hand.
Metric | Excellent | Good | Acceptable | Poor |
---|---|---|---|---|
MAPE | < 5% | 5-10% | 10-15% | > 15% |
SMAPE | < 5% | 5-10% | 10-15% | > 15% |
R-squared | > 0.9 | 0.8-0.9 | 0.6-0.8 | < 0.6 |
Parameter Change | < 5% | 5-10% | 10-20% | > 20% |
Perturbation Change | < 5% | 5-10% | 10-15% | > 15% |
Placebo ROI | ≤ -50% | -50% to -25% | -25% to 0% | > 0% |
Customizing Metrics¶
Modifying Thresholds¶
If you'd like to modify the test pass/fail thresholds, you can fork the branch and modify the thresholds in mmm_eval/metrics/threshold_constants.py
.
Adding Custom Metrics¶
To add custom metrics, extend the metrics module:
from mmm_eval.metrics import BaseMetric
class CustomMetric(BaseMetric):
def calculate(self, y_true, y_pred):
# Your custom calculation
return custom_value
Best Practices¶
Metric Selection¶
- Start with MAPE: Most intuitive for business users
- Include SMAPE: More robust alternative to MAPE for symmetric evaluation
- Include R-squared: Technical measure of fit quality
- Monitor stability: Critical for production models
- Track performance: Important for scalability
Result Analysis¶
- Compare across frameworks: Use same metrics for fair comparison
- Track over time: Monitor performance as data grows
- Set business thresholds: Align with business requirements
- Document decisions: Record metric choices and rationale
Troubleshooting¶
Common Issues¶
- Extreme MAPE values: Check for zero or near-zero actual values
- High SMAPE values: Check for symmetric errors and zero handling
- Negative R-squared: Model performs worse than baseline
- Inconsistent metrics: Verify data preprocessing
- Missing metrics: Check test configuration
Getting Help¶
- Review Tests for metric context
- Check Configuration for settings
- Join Discussions for support