generate_piecewise_its_data#

causalpy.data.simulate_data.generate_piecewise_its_data(N=100, interruption_times=None, baseline_intercept=10.0, baseline_slope=0.1, level_changes=None, slope_changes=None, noise_sigma=1.0, seed=None)[source]#

Generate piecewise Interrupted Time Series data with known ground truth parameters.

This function creates synthetic data for testing and demonstrating piecewise ITS / segmented regression models. The data follows the model:

y_t = β₀ + β₁t + Σₖ(level_k · I_k(t) + slope_k · R_k(t)) + ε_t

Where: - I_k(t) = 1 if t >= T_k else 0 (step function for level change) - R_k(t) = max(0, t - T_k) (ramp function for slope change)

Parameters:

N (int, default=100) – Number of time points in the series.
interruption_times (list[int], optional) – List of time indices where interruptions occur. Defaults to [50].
baseline_intercept (float, default=10.0) – The intercept (β₀) of the baseline trend.
baseline_slope (float, default=0.1) – The slope (β₁) of the baseline trend.
level_changes (list[float], optional) – List of level changes at each interruption. Length must match interruption_times. If None, defaults to [5.0] for single interruption.
slope_changes (list[float], optional) – List of slope changes at each interruption. Length must match interruption_times. If None, defaults to [0.0] (no slope change).
noise_sigma (float, default=1.0) – Standard deviation of the Gaussian noise.
seed (int, optional) – Random seed for reproducibility.

Returns:

df (pd.DataFrame) – DataFrame with columns: - ‘t’: time index (0 to N-1) - ‘y’: observed outcome with noise - ‘y_true’: outcome without noise (ground truth) - ‘counterfactual’: baseline trend without intervention effects - ‘effect’: true causal effect at each time point
params (dict) – Dictionary containing the true parameters: - ‘baseline_intercept’: β₀ - ‘baseline_slope’: β₁ - ‘level_changes’: list of level changes - ‘slope_changes’: list of slope changes - ‘interruption_times’: list of interruption times - ‘noise_sigma’: noise standard deviation

Return type:

tuple[DataFrame, dict]

Examples

>>> from causalpy.data.simulate_data import generate_piecewise_its_data
>>> # Single interruption with level and slope change
>>> df, params = generate_piecewise_its_data(
...     N=100,
...     interruption_times=[50],
...     level_changes=[5.0],
...     slope_changes=[0.2],
...     seed=42,
... )
>>> df.shape
(100, 5)

>>> # Multiple interruptions
>>> df, params = generate_piecewise_its_data(
...     N=150,
...     interruption_times=[50, 100],
...     level_changes=[3.0, -2.0],
...     slope_changes=[0.1, -0.15],
...     seed=42,
... )
>>> len(params["interruption_times"])
2

>>> # Level change only (no slope change)
>>> df, params = generate_piecewise_its_data(
...     N=100,
...     interruption_times=[50],
...     level_changes=[5.0],
...     slope_changes=[0.0],
...     seed=42,
... )