Skip to content

theta: NaN forecast intervals when len(y) == 4 #1135

@shaun0927

Description

@shaun0927

Observed

AutoTheta(season_length=1).forecast(y, h=3, level=[95]) returns lo-95 = nan / hi-95 = nan when len(y) == 4. A RuntimeWarning: Degrees of freedom <= 0 for slice is also emitted.

Reproducer

import numpy as np
from statsforecast.models import AutoTheta

for n in [4, 5, 6, 8, 50]:
    y = (np.arange(n).astype(float) * 0.3 +
         np.random.default_rng(0).standard_normal(n) * 0.2 + 10.0)
    out = AutoTheta(season_length=1).forecast(y, h=3, level=[95])
    width = float(out["hi-95"][0] - out["lo-95"][0])
    print(f"n={n:>3}  width@h=1 = {width!r}")
n=  4   width@h=1 = nan       # ← NaN PI
n=  5   width@h=1 = 0.040     # ← suspiciously narrow
n=  6   width@h=1 = 0.595
n=  8   width@h=1 = 0.583
n= 50   width@h=1 = 0.716

Root cause

python/statsforecast/theta.py:233:

sigma = np.std(obj["residuals"][3:], ddof=1)

For len(y) == 4, residuals[3:] has 1 element and ddof=1 divides by 0, producing NaN sigma → NaN PIs. The [3:] burn-in matches the burn-in inside thetacalc but is undocumented and has no short-series guard.

Suggested handling

Two reasonable options — happy to send a PR once you have a preference:

  1. Lower-bound the sample size: if len(residuals) - 3 < 4, fall back to np.std(obj["residuals"], ddof=1) and emit a UserWarning about the small sample.
  2. Adaptive burn-in: make the burn-in min(3, len(residuals) // 2) and add a short comment referencing the matching choice in thetacalc.

Option 1 is the smaller change; option 2 is more principled. Either avoids the NaN.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions