Data Reference¶
mmm_eval.data
¶
Data loading and processing utilities.
Classes¶
DataLoader(data_path: str | Path)
¶
Simple data loader for MMM evaluation.
Takes a data path and loads the data.
Initialize data loader with data path.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data_path | str | Path | Path to the data file (CSV, Parquet, etc.) | required |
Raises:
Type | Description |
---|---|
FileNotFoundError | If the data file does not exist. |
Source code in mmm_eval/data/loaders.py
Functions¶
load() -> pd.DataFrame
¶
Load data from the specified path.
Returns Loaded DataFrame
Raises ValueError: If the file format is not supported.
Source code in mmm_eval/data/loaders.py
DataPipeline(data: pd.DataFrame, framework: str, control_columns: list[str] | None, channel_columns: list[str], date_column: str, response_column: str, revenue_column: str, min_number_observations: int = DataPipelineConstants.MIN_NUMBER_OBSERVATIONS)
¶
Data pipeline that orchestrates loading, processing, and validation.
Provides a simple interface to go from raw data file to validated DataFrame.
Initialize data pipeline.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data | DataFrame | DataFrame containing the data | required |
framework | str | name of supported framework | required |
control_columns | list[str] | None | List of control columns | required |
channel_columns | list[str] | List of channel columns | required |
date_column | str | Name of the date column | required |
response_column | str | Name of the response column | required |
revenue_column | str | Name of the revenue column | required |
min_number_observations | int | Minimum required number of observations | MIN_NUMBER_OBSERVATIONS |
Source code in mmm_eval/data/pipeline.py
Functions¶
run() -> pd.DataFrame
¶
Run the complete data pipeline: process → validate.
Returns Validated and processed DataFrame
Raises Various exceptions processing or validation steps
Source code in mmm_eval/data/pipeline.py
DataProcessor(control_columns: list[str] | None, channel_columns: list[str], date_column: str = InputDataframeConstants.DATE_COL, response_column: str = InputDataframeConstants.RESPONSE_COL, revenue_column: str = InputDataframeConstants.MEDIA_CHANNEL_REVENUE_COL)
¶
Simple data processor for MMM evaluation.
Handles data transformations like datetime casting, column renaming, etc.
Initialize data processor.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
control_columns | list[str] | None | List of control columns | required |
channel_columns | list[str] | List of channel columns | required |
date_column | str | Name of the date column to parse and rename | DATE_COL |
response_column | str | Name of the response column to parse and rename | RESPONSE_COL |
revenue_column | str | Name of the revenue column to parse and rename | MEDIA_CHANNEL_REVENUE_COL |
Source code in mmm_eval/data/processor.py
Functions¶
process(df: pd.DataFrame) -> pd.DataFrame
¶
Process the DataFrame with configured transformations.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df | DataFrame | Input DataFrame | required |
Returns:
Type | Description |
---|---|
DataFrame | Processed DataFrame |
Raises:
Type | Description |
---|---|
MissingRequiredColumnsError | If the required columns are not present. |
InvalidDateFormatError | If the date column cannot be parsed. |
Source code in mmm_eval/data/processor.py
DataValidator(framework: str, date_column: str, response_column: str, revenue_column: str, control_columns: list[str] | None, min_number_observations: int = DataPipelineConstants.MIN_NUMBER_OBSERVATIONS)
¶
Validator for MMM data with configurable validation rules.
Initialize validator with validation rules.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
framework | str | a supported framework, one of | required |
date_column | str | Name of the date column | required |
response_column | str | Name of the response column | required |
revenue_column | str | Name of the revenue column | required |
control_columns | list[str] | None | List of control columns | required |
min_number_observations | int | Minimum required number of observations for time series CV | MIN_NUMBER_OBSERVATIONS |
Source code in mmm_eval/data/validation.py
Functions¶
run_validations(df: pd.DataFrame) -> None
¶
Run all validations on the DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df | DataFrame | Input DataFrame | required |
Returns:
Type | Description |
---|---|
None | Validation result with all errors and warnings |
Source code in mmm_eval/data/validation.py
Functions¶
generate_meridian_data()
¶
Load and process a Meridian-compatible dataset for E2E testing.
The Excel file should be placed at: mmm_eval/data/sample_data/geo_media.xlsx
Returns DataFrame containing Meridian-compatible data with media channels, controls, and response variables
Source code in mmm_eval/data/synth_data_generator.py
generate_pymc_data()
¶
Generate synthetic MMM data for testing purposes.
Returns DataFrame containing synthetic MMM data with media channels, controls, and response variables
Source code in mmm_eval/data/synth_data_generator.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 |
|
Modules¶
constants
¶
exceptions
¶
Custom exceptions for data validation and processing.
Classes¶
DataValidationError
¶
Bases: Exception
Raised when data validation fails.
EmptyDataFrameError
¶
Bases: Exception
Raised when DataFrame is empty.
InvalidDateFormatError
¶
Bases: Exception
Raised when date parsing fails.
MissingRequiredColumnsError
¶
Bases: Exception
Raised when required columns are missing.
ValidationError
¶
Bases: Exception
Base class for validation errors.
loaders
¶
Data loading utilities for MMM evaluation.
Classes¶
DataLoader(data_path: str | Path)
¶
Simple data loader for MMM evaluation.
Takes a data path and loads the data.
Initialize data loader with data path.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data_path | str | Path | Path to the data file (CSV, Parquet, etc.) | required |
Raises:
Type | Description |
---|---|
FileNotFoundError | If the data file does not exist. |
Source code in mmm_eval/data/loaders.py
load() -> pd.DataFrame
¶Load data from the specified path.
Returns Loaded DataFrame
Raises ValueError: If the file format is not supported.
Source code in mmm_eval/data/loaders.py
pipeline
¶
Data pipeline for MMM evaluation.
Classes¶
DataPipeline(data: pd.DataFrame, framework: str, control_columns: list[str] | None, channel_columns: list[str], date_column: str, response_column: str, revenue_column: str, min_number_observations: int = DataPipelineConstants.MIN_NUMBER_OBSERVATIONS)
¶
Data pipeline that orchestrates loading, processing, and validation.
Provides a simple interface to go from raw data file to validated DataFrame.
Initialize data pipeline.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data | DataFrame | DataFrame containing the data | required |
framework | str | name of supported framework | required |
control_columns | list[str] | None | List of control columns | required |
channel_columns | list[str] | List of channel columns | required |
date_column | str | Name of the date column | required |
response_column | str | Name of the response column | required |
revenue_column | str | Name of the revenue column | required |
min_number_observations | int | Minimum required number of observations | MIN_NUMBER_OBSERVATIONS |
Source code in mmm_eval/data/pipeline.py
run() -> pd.DataFrame
¶Run the complete data pipeline: process → validate.
Returns Validated and processed DataFrame
Raises Various exceptions processing or validation steps
Source code in mmm_eval/data/pipeline.py
processor
¶
Data processing utilities for MMM evaluation.
Classes¶
DataProcessor(control_columns: list[str] | None, channel_columns: list[str], date_column: str = InputDataframeConstants.DATE_COL, response_column: str = InputDataframeConstants.RESPONSE_COL, revenue_column: str = InputDataframeConstants.MEDIA_CHANNEL_REVENUE_COL)
¶
Simple data processor for MMM evaluation.
Handles data transformations like datetime casting, column renaming, etc.
Initialize data processor.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
control_columns | list[str] | None | List of control columns | required |
channel_columns | list[str] | List of channel columns | required |
date_column | str | Name of the date column to parse and rename | DATE_COL |
response_column | str | Name of the response column to parse and rename | RESPONSE_COL |
revenue_column | str | Name of the revenue column to parse and rename | MEDIA_CHANNEL_REVENUE_COL |
Source code in mmm_eval/data/processor.py
process(df: pd.DataFrame) -> pd.DataFrame
¶Process the DataFrame with configured transformations.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df | DataFrame | Input DataFrame | required |
Returns:
Type | Description |
---|---|
DataFrame | Processed DataFrame |
Raises:
Type | Description |
---|---|
MissingRequiredColumnsError | If the required columns are not present. |
InvalidDateFormatError | If the date column cannot be parsed. |
Source code in mmm_eval/data/processor.py
schemas
¶
synth_data_generator
¶
Generate synthetic data for testing.
Based on: https://www.pymc-marketing.io/en/stable/notebooks/mmm/mmm_example.html
Functions¶
generate_meridian_data()
¶
Load and process a Meridian-compatible dataset for E2E testing.
The Excel file should be placed at: mmm_eval/data/sample_data/geo_media.xlsx
Returns DataFrame containing Meridian-compatible data with media channels, controls, and response variables
Source code in mmm_eval/data/synth_data_generator.py
generate_pymc_data()
¶
Generate synthetic MMM data for testing purposes.
Returns DataFrame containing synthetic MMM data with media channels, controls, and response variables
Source code in mmm_eval/data/synth_data_generator.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 |
|
validation
¶
Data validation for MMM evaluation.
Classes¶
DataValidator(framework: str, date_column: str, response_column: str, revenue_column: str, control_columns: list[str] | None, min_number_observations: int = DataPipelineConstants.MIN_NUMBER_OBSERVATIONS)
¶
Validator for MMM data with configurable validation rules.
Initialize validator with validation rules.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
framework | str | a supported framework, one of | required |
date_column | str | Name of the date column | required |
response_column | str | Name of the response column | required |
revenue_column | str | Name of the revenue column | required |
control_columns | list[str] | None | List of control columns | required |
min_number_observations | int | Minimum required number of observations for time series CV | MIN_NUMBER_OBSERVATIONS |
Source code in mmm_eval/data/validation.py
run_validations(df: pd.DataFrame) -> None
¶Run all validations on the DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df | DataFrame | Input DataFrame | required |
Returns:
Type | Description |
---|---|
None | Validation result with all errors and warnings |