Testing¶

This guide covers testing practices and procedures for the mmm-eval project.

Testing Philosophy¶

We follow these testing principles:

Comprehensive coverage: Aim for high test coverage across all modules
Fast feedback: Tests should run quickly to enable rapid development
Reliable: Tests should be deterministic and not flaky
Maintainable: Tests should be easy to understand and modify
Realistic: Tests should reflect real-world usage patterns

Test Structure¶

The test suite is organized as follows:

tests/
├── test_adapters/               # Framework adapter tests
├── test_configs/                # Configuration object tests
├── test_core/                   # Core functionality tests
├── test_data/                   # Data handling tests
└── test_validation_tests/       # Metrics calculation tests

Running Tests¶

Basic Test Execution¶

# Run all tests
poetry run pytest

# Run tests with verbose output
poetry run pytest -v

# Run tests with coverage
poetry run pytest --cov=mmm_eval

# Run tests in parallel
poetry run pytest -n auto

Running Specific Test Categories¶

# Run only unit tests
poetry run pytest tests/unit/

# Run only integration tests
poetry run pytest tests/integration/

# Run tests for a specific module
poetry run pytest tests/unit/test_core/

# Run tests matching a pattern
poetry run pytest -k "test_accuracy"

Running Tests with Markers¶

# Run integration tests only
poetry run pytest -m integration

# Run slow tests only
poetry run pytest -m slow

# Skip slow tests
poetry run pytest -m "not slow"

Test Types¶

Unit Tests¶

Unit tests verify individual functions and classes in isolation. They should:

Test one specific behavior or functionality
Use mocks for external dependencies
Be fast and deterministic
Have clear, descriptive names

Example unit test:

def test_calculate_mape_returns_correct_value():
    """Test that MAPE calculation returns expected results."""
    actual = [100, 200, 300]
    predicted = [110, 190, 310]

    result = calculate_mape(actual, predicted)

    expected = 10.0  # 10% average error
    assert result == pytest.approx(expected, rel=1e-2)

Integration Tests¶

Integration tests verify that multiple components work together correctly. They:

Test the interaction between different modules
Use real data and minimal mocking
May take longer to run
Are marked with the @pytest.mark.integration decorator

Example integration test:

@pytest.mark.integration
def test_pymc_marketing_evaluation_workflow():
    """Test complete PyMC Marketing evaluation workflow."""
    # Setup test data
    data = load_test_data()

    # Run evaluation
    result = evaluate_framework(
        data=data,
        framework="pymc-marketing",
        config=test_config
    )

    # Verify results
    assert result.accuracy > 0.8
    assert result.cross_validation_score > 0.7
    assert result.refresh_stability > 0.6

Test Data and Fixtures¶

Using Fixtures¶

Pytest fixtures provide reusable test data and setup:

@pytest.fixture
def sample_mmm_data():
    """Provide sample MMM data for testing."""
    return pd.DataFrame({
        'date': pd.date_range('2023-01-01', periods=100),
        'sales': np.random.normal(1000, 100, 100),
        'tv_spend': np.random.uniform(0, 1000, 100),
        'radio_spend': np.random.uniform(0, 500, 100),
        'digital_spend': np.random.uniform(0, 800, 100)
    })

def test_data_validation(sample_mmm_data):
    """Test data validation with sample data."""
    validator = DataValidator()
    result = validator.validate(sample_mmm_data)
    assert result.is_valid

Test Data Management¶

Use realistic but synthetic data
Keep test data files small and focused
Document the structure and purpose of test data

Mocking and Stubbing¶

When to Mock¶

Mock external dependencies to:

Speed up tests
Avoid network calls
Control test conditions
Test error scenarios

Mocking Examples¶

from unittest.mock import Mock, patch

def test_api_call_with_mock():
    """Test API call with mocked response."""
    with patch('requests.get') as mock_get:
        mock_get.return_value.json.return_value = {'status': 'success'}
        mock_get.return_value.status_code = 200

        result = fetch_data_from_api()

        assert result['status'] == 'success'
        mock_get.assert_called_once()

Test Coverage¶

Coverage Goals¶

Minimum coverage: 80% for all modules
Target coverage: 90% for critical modules
Critical modules: Core evaluation logic, data validation, metrics calculation

Coverage Reports¶

# Generate HTML coverage report
poetry run pytest --cov=mmm_eval --cov-report=html

# Generate XML coverage report (for CI)
poetry run pytest --cov=mmm_eval --cov-report=xml

# View coverage summary
poetry run pytest --cov=mmm_eval --cov-report=term-missing

Coverage Configuration¶

Configure coverage in pyproject.toml:

[tool.coverage.run]
source = ["mmm_eval"]
omit = [
    "*/tests/*",
    "*/test_*",
    "*/__pycache__/*"
]

[tool.coverage.report]
exclude_lines = [
    "pragma: no cover",
    "def __repr__",
    "if self.debug:",
    "if settings.DEBUG",
    "raise AssertionError",
    "raise NotImplementedError",
    "if 0:",
    "if __name__ == .__main__.:",
    "class .*\\bProtocol\\):",
    "@(abc\\.)?abstractmethod"
]

Performance Testing¶

Benchmark Tests¶

For performance-critical code, use benchmark tests:

def test_mape_calculation_performance(benchmark):
    """Benchmark MAPE calculation performance."""
    actual = np.random.normal(1000, 100, 10000)
    predicted = np.random.normal(1000, 100, 10000)

    result = benchmark(lambda: calculate_mape(actual, predicted))

    assert result > 0

Memory Usage Tests¶

Monitor memory usage in tests:

import psutil
import os

def test_memory_usage():
    """Test that operations don't use excessive memory."""
    process = psutil.Process(os.getpid())
    initial_memory = process.memory_info().rss

    # Run memory-intensive operation
    result = process_large_dataset()

    final_memory = process.memory_info().rss
    memory_increase = final_memory - initial_memory

    # Memory increase should be reasonable (< 100MB)
    assert memory_increase < 100 * 1024 * 1024

Continuous Integration¶

GitHub Actions¶

Tests run automatically on:

Every pull request
Every push to main branch
Scheduled runs (nightly)

CI Configuration¶

The CI pipeline includes:

Linting: Code style and quality checks
Type checking: Static type analysis
Unit tests: Fast feedback on basic functionality
Integration tests: Verify component interactions
Coverage reporting: Track test coverage trends

Pre-commit Hooks¶

Install pre-commit hooks to catch issues early:

# Install pre-commit
poetry add --group dev pre-commit

# Install hooks
pre-commit install

# Run all hooks
pre-commit run --all-files

Debugging Tests¶

Verbose Output¶

# Run with maximum verbosity
poetry run pytest -vvv

# Show local variables on failures
poetry run pytest -l

# Stop on first failure
poetry run pytest -x

Debugging with pdb¶

def test_debug_example():
    """Example of using pdb for debugging."""
    import pdb; pdb.set_trace()  # Breakpoint
    result = complex_calculation()
    assert result > 0

Test Isolation¶

Ensure tests don't interfere with each other:

@pytest.fixture(autouse=True)
def reset_global_state():
    """Reset global state before each test."""
    # Setup
    yield
    # Teardown
    cleanup_global_state()

Best Practices¶

Test Naming¶

Use descriptive test names that explain the expected behavior
Follow the pattern: test_[function]_[scenario]_[expected_result]
Include edge cases and error conditions

Test Organization¶

Group related tests in classes
Use fixtures for common setup
Keep tests focused and single-purpose

Assertions¶

Use specific assertions (assert result == expected)
Avoid complex logic in assertions
Use appropriate assertion methods (assertIn, assertRaises, etc.)

Test Data¶

Use realistic test data
Avoid hardcoded magic numbers
Document test data assumptions

Documentation¶

Write clear docstrings for test functions
Explain complex test scenarios
Document test data sources and assumptions

Common Pitfalls¶

Flaky Tests¶

Avoid flaky tests by:

Not relying on timing or external services
Using deterministic random seeds
Properly mocking external dependencies
Avoiding shared state between tests

Slow Tests¶

Keep tests fast by:

Using appropriate mocks
Minimizing I/O operations
Using efficient test data
Running tests in parallel when possible

Over-Mocking¶

Don't over-mock:

Test the actual behavior, not the implementation
Mock only external dependencies
Use real objects when possible

Getting Help¶

If you encounter testing issues:

Check the pytest documentation
Review existing tests for examples
Ask questions in project discussions
Consult the Contributing Guide