Calculating Cpk vs Ppk for Short Production Runs

Short runs — fewer than 50 observations, or fewer than 15 rational subgroups — systematically violate the asymptotic assumptions that make process capability analysis (Cp, Cpk, Pp, Ppk) trustworthy, and this how-to walks the exact Python workflow that keeps the numbers honest anyway: pick the right within-subgroup sigma estimator, fall back to individuals logic when subgroups are too thin, attach a confidence bound instead of quoting a bare point estimate, and diagnose the inverted signals that low-volume data throws off. Quality engineers on prototype lines, launch builds, and clinical-scale batches routinely see Ppk < Cpk, capability that swings run-to-run, or false compliance flags — none of which are measurement-system failures. They are mathematical artifacts of how within-subgroup variation ($\sigma_{\text{within}}$) and overall variation ($\sigma_{\text{overall}}$) behave under a constrained sampling window. For where this sits among the chart families, start from the SPC Fundamentals & Control Chart Taxonomy.

The core distinction is unchanged from the full-volume case: Cpk uses a within-subgroup sigma that estimates short-term potential, while Ppk uses the overall sample deviation that captures every source of drift. What changes on a short run is that the within estimate is fragile — a handful of subgroups of size two or three cannot pin down process noise — so the choice between an X-Bar R chart estimator, an X-Bar S chart estimator, and an Individual Moving Range (I-MR) fallback stops being cosmetic and starts driving the reported number.

Prerequisites

Confirm these are in place before computing a short-run index:

Python 3.10+ with numpy >= 1.24 and scipy >= 1.10 (pip install "numpy>=1.24" "scipy>=1.10"); pandas >= 2.0 if your measurements arrive in a DataFrame
Measurements stored in production order — capability sequence matters, because the I-MR fallback reads consecutive differences
Two-sided specification limits (USL and LSL) as plain floats; for a one-sided spec, compute only the relevant CPU or CPL
Sentinel values and non-numeric payloads already mapped to NaN and dropped by the batch data validation gate before they reach the sigma estimator
A documented subgroup rationale — one machine, one setup, one material lot per subgroup — or an explicit decision to treat the data as individuals
A Gage R&R result on file, so you can attribute any Cpk/Ppk gap to the process rather than to instrument resolution

Why short runs break the standard indices

Cpk measures short-term potential capability from $\sigma_{\text{within}}$, typically $\overline{R}/d_2$ (average range) or $\overline{S}/c_4$ (average standard deviation). That estimator assumes the process is stable within subgroups and that between-subgroup shift is negligible. Ppk measures actual performance from $\sigma_{\text{overall}} = \operatorname{std}(\text{all observations}, \text{ddof}=1)$, which absorbs mean shifts, setup changes, and tool wear alike.

On a short run, $\sigma_{\text{within}}$ is frequently deflated because subgroup ranges lack the degrees of freedom to represent true process noise — a single subgroup of $n=3$ cannot reliably estimate dispersion, so Cpk inflates. Meanwhile $\sigma_{\text{overall}}$ absorbs any transient drift, driving Ppk down. The resulting gap ($\text{Cpk} > \text{Ppk}$) is a diagnostic signal of between-subgroup instability, not proof of a defective process. When rational subgrouping cannot be enforced at low volume, the correct move is to switch to individuals logic, where $\sigma_{\text{within}} = \overline{\text{MR}}/d_2$ with $d_2 = 1.128$ for a span-2 moving range.

Step-by-Step Implementation

Step 1 — Choose the within-subgroup sigma estimator from the data shape

The estimator is dictated by sample structure, not preference. Pick the constant table by subgroup size: $\overline{R}/d_2$ for $n \le 8$, $\overline{S}/c_4$ for $n \ge 9$, and $\overline{\text{MR}}/d_2$ when there is one value per run or too few complete subgroups to trust a range average. Sourcing the constant from the wrong table is the single most common cause of an inflated Cpk on a short run.

Sample structure	Within sigma estimator	Key constant
Subgroups, n ≤ 8	R̄ / d₂	d₂(n=2)=1.128 … d₂(n=8)=2.847
Subgroups, n ≥ 9	S̄ / c₄	c₄(n=9)=0.9693, c₄(n=10)=0.9727
Individuals / one value per run	MR̄ / d₂ (span-2)	d₂ = 1.128

import numpy as np

D2_TABLE = {2: 1.128, 3: 1.693, 4: 2.059, 5: 2.326,
            6: 2.534, 7: 2.704, 8: 2.847, 9: 2.970}


def select_estimator(n_obs: int, subgroup_size, n_subgroups: int) -> str:
    """Return the sigma-within method appropriate to the sample shape."""
    if subgroup_size is None or subgroup_size < 2 or n_subgroups < 2:
        return "imr"          # MR-bar / d2, span-2
    if subgroup_size <= 8:
        return "rbar"         # R-bar / d2
    return "sbar"             # S-bar / c4

Verify in isolation: select_estimator(12, None, 0) must return "imr", select_estimator(30, 3, 10) must return "rbar", and select_estimator(50, 10, 5) must return "sbar".

Step 2 — Compute σ_within with the I-MR fallback wired in

Guard the reshape explicitly. A trailing partial subgroup will crash reshape, and a subgroup size outside the R-chart range silently produces a KeyError on the constant lookup — both must raise a clear error rather than emit a wrong number. When the sample is too thin for reliable subgroup ranges, fall back to the span-2 moving range, exactly as the I-MR chart does for one measurement per run.

def sigma_within(data: np.ndarray, method: str, subgroup_size=None) -> float:
    """Short-term sigma estimate by the selected method, with hard guards."""
    if method == "imr":
        mr_bar = np.mean(np.abs(np.diff(data)))     # span-2 moving range
        return mr_bar / D2_TABLE[2]
    if subgroup_size not in D2_TABLE:
        raise ValueError(f"subgroup_size {subgroup_size} outside R-chart range [2, 9].")
    k = len(data) // subgroup_size
    if k < 2:
        raise ValueError(f"Only {k} complete subgroup(s); collect more data or reduce n.")
    groups = data[: k * subgroup_size].reshape(k, subgroup_size)   # drop partial tail
    r_bar = np.mean(np.ptp(groups, axis=1))
    return r_bar / D2_TABLE[subgroup_size]

Verify: on the constant series np.zeros(10), sigma_within(..., "imr") returns 0.0; on [10, 12, 11, 14] the moving range mean is 2.0, so the I-MR estimate is 2.0 / 1.128 ≈ 1.773.

Step 3 — Compute Cpk, Ppk, and their divergence

Cpk uses $\sigma_{\text{within}}$; Ppk uses $\sigma_{\text{overall}} = s$ (ddof=1). Both take the minimum of the upper and lower one-sided indices so an off-center process is not flattered by the wider tail:

$$\text{Cpk} = \min\!\left(\frac{\text{USL} - \mu}{3\,\sigma_{\text{within}}},\; \frac{\mu - \text{LSL}}{3\,\sigma_{\text{within}}}\right), \quad \text{Ppk} = \min\!\left(\frac{\text{USL} - \mu}{3\,\sigma_{\text{overall}}},\; \frac{\mu - \text{LSL}}{3\,\sigma_{\text{overall}}}\right)$$

def cpk_ppk(data: np.ndarray, usl: float, lsl: float,
            sig_within: float, sig_overall: float) -> dict:
    """Both capability indices and the diagnostic gap between them."""
    mu = float(np.mean(data))
    cpk = min(usl - mu, mu - lsl) / (3 * sig_within)
    ppk = min(usl - mu, mu - lsl) / (3 * sig_overall)
    return {"cpk": round(cpk, 3), "ppk": round(ppk, 3),
            "gap": round(cpk - ppk, 3), "mean": round(mu, 4)}

Verify: with a symmetric spec and a perfectly stable process, $\sigma_{\text{within}} \approx \sigma_{\text{overall}}$ and the gap collapses toward 0.0. A persistent positive gap is the between-subgroup instability signal from the section above.

Step 4 — Attach a 95% lower confidence bound before you publish

For $N < 50$, a bare Cpk is not a defensible claim. Both the AIAG SPC Reference Manual and ISO 22514 expect a lower confidence bound so the report cannot over-state capability. Use the standard large-sample bound; on very short runs treat it as approximate and lean toward tolerance intervals (Step 5).

$$\text{Cpk}_{L,\,95\%} = \text{Cpk}\left(1 - z_{0.975}\sqrt{\tfrac{1}{9N} + \tfrac{\text{Cpk}^2}{2(N-1)}}\right)$$

from scipy.stats import norm


def cpk_lower_bound(cpk: float, n: int, conf: float = 0.95) -> float:
    """One-sided (1 - alpha) lower confidence bound for Cpk."""
    if n < 2:
        raise ValueError("Need N >= 2 for a confidence bound.")
    z = norm.ppf(1 - (1 - conf) / 2)
    return round(cpk * (1 - z * np.sqrt(1 / (9 * n) + cpk ** 2 / (2 * (n - 1)))), 3)

Verify: the bound must be strictly below the point estimate, and the gap must widen as n shrinks — cpk_lower_bound(1.67, 20) sits well below cpk_lower_bound(1.67, 200).

Step 5 — Fall back to a tolerance interval when stability is unprovable

When a short run cannot demonstrate statistical stability — too few points to run Nelson or Western Electric run rules meaningfully, or a normality test that will not clear — capability indices are the wrong tool. Report a statistical tolerance interval instead: a two-sided 95% / 99% interval guarantees that 99% of future production falls within the bounds with 95% confidence, and the NIST/SEMATECH e-Handbook §6.1.6 recommends exactly this substitution for constrained sampling.

def normal_tolerance_k(n: int, coverage: float = 0.99, conf: float = 0.95) -> float:
    """Two-sided normal tolerance factor k (Howe approximation)."""
    from scipy.stats import chi2, norm
    z_p = norm.ppf(0.5 + coverage / 2)
    chi = chi2.ppf(1 - conf, n - 1)
    return round(z_p * np.sqrt((n - 1) * (1 + 1 / n) / chi), 3)

The interval is then $\mu \pm k \cdot s$. Because k grows sharply as n falls, this makes the cost of a short sample explicit instead of hiding it inside an over-confident Cpk.

Verification

Confirm the full path on a minimal synthetic fixture — no live data required. Build a stable short run, compute both indices with the I-MR fallback, and assert the confidence bound sits below the point estimate and the small-N bound is the more conservative one:

import numpy as np

rng = np.random.default_rng(11)
data = rng.normal(50.0, 1.0, size=18)          # short run, one value per unit
usl, lsl = 56.0, 44.0

method = select_estimator(len(data), None, 0)   # -> "imr"
sw = sigma_within(data, method)
so = data.std(ddof=1)
idx = cpk_ppk(data, usl, lsl, sw, so)
lb = cpk_lower_bound(idx["cpk"], len(data))

assert method == "imr", "short single-value run must use the I-MR fallback"
assert lb < idx["cpk"], "confidence bound must be below the point estimate"
assert cpk_lower_bound(1.67, 18) < cpk_lower_bound(1.67, 180), "small N must be more conservative"
print(f"Cpk={idx['cpk']} Ppk={idx['ppk']} gap={idx['gap']} Cpk_L95={lb} method={method}")

Expected output resembles Cpk=1.9xx Ppk=1.9xx gap=0.0xx Cpk_L95=1.4xx method=imr. The lb < cpk assertion is load-bearing: if it ever fails, the confidence-bound term was dropped and the report would over-state capability on exactly the samples where that is most dangerous.

Root-Cause Table

Symptom	Cause	Fix
Cpk far exceeds Ppk on a short run	$\sigma_{\text{within}}$ deflated — too few subgroups, or between-subgroup shift excluded from the range	Treat the gap as a stability signal, not capability; fall back to I-MR sigma or investigate setup/lot changes (Steps 1–2)
Cpk looks fine but auditors reject the report	A bare point estimate was quoted for N < 50	Attach the 95% lower bound and report `Cpk = 1.67 (LCL₉₅ = 1.12)`, never the point alone (Step 4)
`KeyError` or `ValueError` on the constant lookup	Subgroup size outside the R-chart range, or S̄/c₄ needed instead of R̄/d₂	Route n ≥ 9 to the S̄/c₄ estimator; the guard in Step 2 catches the rest
Reshape crash or silently dropped rows	Trailing partial subgroup, or fewer than two complete subgroups	The Step 2 guard truncates the partial tail and raises when k < 2 — do not bypass it
Both indices biased and unstable	Non-normal, skewed, or heavy-tailed short-run data quoted as if normal	Run Anderson-Darling/Shapiro-Wilk; apply Box-Cox, or switch to the tolerance interval (Step 5)

Always pair short-run capability with a Gage R&R result: a gage consuming more than 30% of the tolerance inflates $\sigma_{\text{overall}}$ and depresses Ppk, faking process instability. For teams pushing results back to the floor, carry the sigma method, baseline count, and every exclusion through the same audit path used when connecting Python to MES and SCADA systems.

FAQ

Why is Ppk lower than Cpk on my short run?

Cpk is built from a within-subgroup sigma that only sees short-term, common-cause noise, while Ppk uses the overall sample deviation that also absorbs setup shifts, tool wear, and lot changes. On a short run the within estimate is easily deflated by too few subgroup degrees of freedom, so Cpk inflates while Ppk stays honest. Read the gap as a between-subgroup instability signal and investigate what changed run-to-run before treating either number as final.

How few observations are too few for Cpk and Ppk?

Below roughly 50 individual observations or 15 rational subgroups the asymptotic assumptions weaken enough that you must report a lower confidence bound rather than a point estimate, and below about 30 the point estimate alone is not defensible at all. If you also cannot demonstrate stability, stop quoting capability indices and report a statistical tolerance interval instead. The confidence bound and the tolerance factor both widen sharply as N falls, which is the honest cost of a short sample.

When do I use MR̄/d₂ instead of R̄/d₂ for σ_within?

Use the span-2 moving range estimate whenever there is one value per run, or when there are too few complete subgroups (fewer than two) to trust a range average. R̄/d₂ applies to rational subgroups of size two through eight; for nine or more, switch to S̄/c₄. Picking the wrong table is the most common source of an inflated Cpk on low-volume data, so let the sample structure — not habit — choose the estimator.

Can I trust a normality-based capability index on 15 samples?

Rarely. Fifteen points give a normality test almost no power, so a non-rejection does not confirm normality — it just fails to detect the departure. Skewed or heavy-tailed data biases both Cpk and Ppk in that regime. Apply a Box-Cox or Johnson transformation if the data supports it, or switch to a percentile-based capability index or a tolerance interval that does not lean on the normal assumption.

Up one level: Process Capability Analysis (Cp, Cpk, Pp, Ppk). For chart selection criteria across all chart families see SPC Fundamentals & Control Chart Taxonomy.