Forecasting Basics (Baselines, Splits, and Error)

Module 5: Astrophysics & Machine Learning

This notebook teaches how to forecast a time series in a way that is useful for engineering.

What you’ll learn

  • How to do a clean train/test split for time series
  • Three forecasting baselines you should always try first:
    • Last value (persistence)
    • Rolling mean
    • Seasonal naive (repeat yesterday)
  • How to measure error (MAE / RMSE)
  • How to plot forecasts so they’re easy to interpret

Data

We use a toy “space environment” index (inspired by NOAA Kp / solar activity) so the notebook runs offline.

Code
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

plt.style.use("dark_background")
plt.rcParams.update(
    {
        "figure.dpi": 110,
        "axes.titlesize": 14,
        "axes.labelsize": 12,
        "xtick.labelsize": 10,
        "ytick.labelsize": 10,
        "legend.fontsize": 10,
    }
)

ACCENT = "#00d4ff"
DANGER = "#ff4d6d"
GOOD = "#7CFC00"

print("Environment ready. We'll build and evaluate simple forecasts.")
Environment ready. We'll build and evaluate simple forecasts.

1) Create a toy “space environment index” time series

We’ll generate hourly samples for ~30 days.

We’ll include: - Daily seasonality (things that repeat each day) - Occasional storms (rare spikes) - Noise

This is not a real Kp dataset — it’s just shaped like the problems you’ll see.

Code
rng = np.random.default_rng(2026)

t = pd.date_range("2026-01-01", periods=30 * 24, freq="1h", tz="UTC")
N = len(t)

# Daily seasonality (24h cycle)
hours = np.arange(N)
daily = 0.6 * np.sin(2 * np.pi * hours / 24) + 0.2 * np.cos(2 * np.pi * hours / 24)

# Baseline level + noise
baseline = 2.0
noise = 0.25 * rng.normal(size=N)

# Rare storm spikes
storm = np.zeros(N)
storm_centers = rng.choice(np.arange(24, N - 24), size=4, replace=False)
for c in storm_centers:
    width = rng.integers(6, 14)
    amp = rng.uniform(2.5, 4.5)
    window = np.arange(N)
    storm += amp * np.exp(-0.5 * ((window - c) / width) ** 2)

kp_toy = baseline + daily + storm + noise

# Clip to a plausible-ish Kp-like range (0..9)
kp_toy = np.clip(kp_toy, 0, 9)

s = pd.Series(kp_toy, index=t, name="kp_toy")

fig, ax = plt.subplots(figsize=(12, 4.2))
ax.plot(s.index, s.values, color=ACCENT, lw=1.4)
ax.set_title("Toy space environment index (hourly, ~30 days)")
ax.set_ylabel("index (0–9, toy)")
ax.grid(True, alpha=0.25)
plt.tight_layout()
plt.show()

print("Storm centers (UTC):")
for c in storm_centers:
    print(" -", s.index[c])

Storm centers (UTC):
 - 2026-01-19 09:00:00+00:00
 - 2026-01-08 10:00:00+00:00
 - 2026-01-28 20:00:00+00:00
 - 2026-01-04 01:00:00+00:00

2) Time-series train/test split (no cheating)

For forecasting, we never shuffle.

We’ll train on the first ~80% of time and test on the last ~20%.

Code
split = int(0.8 * len(s))
train = s.iloc[:split]
test = s.iloc[split:]

print("Train:", train.index[0], "→", train.index[-1], "(n=", len(train), ")")
print("Test: ", test.index[0], "→", test.index[-1], "(n=", len(test), ")")

fig, ax = plt.subplots(figsize=(12, 4.0))
ax.plot(train.index, train.values, color=GOOD, lw=1.2, label="train")
ax.plot(test.index, test.values, color=ACCENT, lw=1.2, label="test")
ax.axvline(test.index[0], color="white", alpha=0.25, ls="--")
ax.set_title("Train/test split (time order preserved)")
ax.set_ylabel("index (0–9, toy)")
ax.grid(True, alpha=0.25)
ax.legend(loc="upper right")
plt.tight_layout()
plt.show()
Train: 2026-01-01 00:00:00+00:00 → 2026-01-24 23:00:00+00:00 (n= 576 )
Test:  2026-01-25 00:00:00+00:00 → 2026-01-30 23:00:00+00:00 (n= 144 )

3) Baseline forecasts (start here)

Before fancy models, always try baselines:

  1. Last value (persistence): (y_t = y_{t-1})
  2. Rolling mean: average of the last (k) hours
  3. Seasonal naive: repeat the value from 24 hours ago

If your fancy model can’t beat these, it’s not useful.

Code
def mae(y_true: np.ndarray, y_pred: np.ndarray) -> float:
    return float(np.mean(np.abs(y_true - y_pred)))


def rmse(y_true: np.ndarray, y_pred: np.ndarray) -> float:
    return float(np.sqrt(np.mean((y_true - y_pred) ** 2)))


# Align predictions to the test index
# 1) Persistence: predict last observed value
pers = s.shift(1).loc[test.index]

# 2) Rolling mean (k hours)
k = 6
roll = s.rolling(k).mean().shift(1).loc[test.index]

# 3) Seasonal naive: repeat 24 hours ago
season = s.shift(24).loc[test.index]

# Drop any rows where a baseline can't be computed
valid = (~pers.isna()) & (~roll.isna()) & (~season.isna())

y = test.loc[valid].values
preds = {
    "persistence": pers.loc[valid].values,
    f"rolling_mean_{k}h": roll.loc[valid].values,
    "seasonal_naive_24h": season.loc[valid].values,
}

rows = []
for name, yhat in preds.items():
    rows.append({"model": name, "MAE": mae(y, yhat), "RMSE": rmse(y, yhat)})

df_metrics = pd.DataFrame(rows).set_index("model")
df_metrics = df_metrics.round(3)

display(df_metrics)
MAE RMSE
model
persistence 0.296 0.362
rolling_mean_6h 0.435 0.545
seasonal_naive_24h 1.324 1.842
Code
# Visual comparison of forecasts on the test window

fig, ax = plt.subplots(figsize=(12, 4.4))
ax.plot(test.index, test.values, color=ACCENT, lw=1.4, label="truth (test)")

ax.plot(test.index[valid], preds["persistence"], color="#ffd166", lw=1.1, alpha=0.9, label="persistence")
ax.plot(test.index[valid], preds[f"rolling_mean_{k}h"], color="#c77dff", lw=1.1, alpha=0.9, label=f"rolling mean ({k}h)")
ax.plot(test.index[valid], preds["seasonal_naive_24h"], color=GOOD, lw=1.1, alpha=0.9, label="seasonal naive (24h)")

ax.set_title("Baseline forecasts (test window)")
ax.set_ylabel("index (0–9, toy)")
ax.grid(True, alpha=0.25)
ax.legend(loc="upper right")
plt.tight_layout()
plt.show()

4) Error inspection (a plot is worth it)

Metrics are a summary. You also want to see: - where errors spike (storms?) - whether a model systematically lags or leads

We’ll plot the residuals for the best baseline.

Code
best = df_metrics.sort_values("RMSE").index[0]
print("Best baseline by RMSE:", best)

best_pred = preds[best]
resid = y - best_pred

fig, axes = plt.subplots(2, 1, figsize=(12, 6.4), sharex=False)

axes[0].plot(test.index[valid], resid, color=DANGER, lw=1.1)
axes[0].axhline(0, color="white", alpha=0.25)
axes[0].set_title(f"Residuals over time — {best}")
axes[0].set_ylabel("truth - pred")
axes[0].grid(True, alpha=0.25)

axes[1].hist(resid, bins=30, color=ACCENT, alpha=0.85)
axes[1].set_title("Residual distribution")
axes[1].set_xlabel("truth - pred")
axes[1].set_ylabel("count")
axes[1].grid(True, alpha=0.25)

plt.tight_layout()
plt.show()
Best baseline by RMSE: persistence

5) What can I do next?

  • Exercise A: change k (rolling window). When does the rolling mean become too sluggish?
  • Exercise B: add more storms (increase size=4). Which baseline handles storms best?
  • Exercise C: build a tiny “feature model”:
    • Use sin/cos of hour-of-day as features
    • Fit a linear regression (optional)

Next: in the project track, you can connect these ideas to real NOAA/NASA/ESA time series.