Why refresh the token before it actually expires?

A long pagination run can straddle the exact expiry instant: the client sees a valid token, sends the request, and the server rejects it as expired by arrival. A pre-expiry buffer refreshes while the token is unambiguously valid, so no page is requested with a token the server will reject. The defensive 401 re-auth is the backstop for early revocation, not the primary mechanism.

How to Automate MES Data Extraction with REST APIs in Python

Pulling quality measurements from a Manufacturing Execution System over REST looks trivial until a token expires mid-pagination and half a shift's data silently disappears, or a vendor appends a field and your parser starts coercing torque readings to NaN. Both faults are invisible at the HTTP layer and only surface downstream as false out-of-control signals and deflated capability indices. This how-to is the extraction step of connecting Python to MES and SCADA systems within the broader manufacturing data ingestion and preprocessing pipeline: it builds a resilient REST client that survives token expiry, cursor pagination, rate limits, and schema drift, and hands clean, audit-traceable batches to the batch data validation gate before any control limit is ever computed.

The design goal is that the extractor never fabricates or silently drops a measurement: every page is either fully retrieved with a valid token or the job fails loudly, and every record keeps its MES transaction identity so a non-conformance investigation can trace it back to the shop floor.

Prerequisites

Confirm these are in place before running the extractor:

Python 3.10+ with requests >= 2.31, pandas >= 2.0, and pyarrow installed (pip install "requests>=2.31" "pandas>=2.0" pyarrow)
MES REST credentials: an OAuth 2.0 client-credentials pair (or a service-account JWT) with read scope on the quality endpoints
The base URL, the token endpoint path, and the measurement endpoint path from the MES API reference
The pagination style your MES uses (cursor/opaque token vs. offset+limit) — the loop below assumes cursor pagination and notes the offset variant
A documented canonical field mapping: which vendor fields map to measurement_value, timestamp_utc, station_id, and batch_lot
The intended chart type known in advance, because subgroup rules differ for an X-Bar R chart versus an I-MR chart, and the extractor should carry the grouping key through untouched

Step-by-Step Implementation

Step 1 — Build a session that refreshes its own token

The most common production failure is a multi-page pull that outlives the access token, yielding a silent 401 Unauthorized partway through and a truncated dataset that looks like a real process shift. Wrap the session so it refreshes proactively (with a safety buffer before expiry) and re-authenticates defensively if the server revokes a token early. Mount a Retry adapter so transient 429/5xx responses back off instead of aborting the batch.

import time
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry


class MESClient:
    """REST client for MES quality endpoints with self-refreshing OAuth."""

    def __init__(self, base_url: str, client_id: str, client_secret: str):
        self.base_url = base_url.rstrip("/")
        self.session = requests.Session()
        retry = Retry(
            total=3,
            backoff_factor=1.0,               # 0s, 1s, 2s between retries
            status_forcelist=[429, 500, 502, 503, 504],
            allowed_methods=frozenset(["GET", "POST"]),
        )
        self.session.mount("https://", HTTPAdapter(max_retries=retry))
        self._auth = (client_id, client_secret)
        self._token: str = ""
        self._token_expiry: float = 0.0

    def _refresh_token(self) -> None:
        resp = self.session.post(f"{self.base_url}/oauth/token", auth=self._auth)
        resp.raise_for_status()
        payload = resp.json()
        self._token = payload["access_token"]
        # Refresh 30s before the server's stated expiry to avoid mid-page 401s.
        self._token_expiry = time.time() + payload["expires_in"] - 30

    def get(self, endpoint: str, params: dict | None = None) -> dict:
        if time.time() > self._token_expiry:
            self._refresh_token()
        headers = {"Authorization": f"Bearer {self._token}"}
        resp = self.session.get(
            f"{self.base_url}{endpoint}", headers=headers, params=params, timeout=30
        )
        if resp.status_code == 401:                 # token revoked early
            self._refresh_token()
            headers["Authorization"] = f"Bearer {self._token}"
            resp = self.session.get(
                f"{self.base_url}{endpoint}", headers=headers, params=params, timeout=30
            )
        resp.raise_for_status()
        return resp.json()

Verify this step in isolation by requesting a single page twice with a short-lived token: the second call must trigger a refresh rather than a 401.

Step 2 — Walk every page with a bounded cursor loop

MES APIs paginate with an opaque cursor or an offset/limit pair, and page-size ceilings are often undocumented. Drive the loop off the server-returned cursor and stop only when the payload signals exhaustion — never assume a fixed page count. Respect X-RateLimit-Remaining so you pause before the server returns 429 rather than after.

def iter_pages(client: MESClient, endpoint: str, page_size: int = 1000):
    """Yield each page of records, honoring the server's cursor and rate limit."""
    params = {"limit": page_size, "cursor": None}
    while True:
        payload = client.get(endpoint, params=params)
        records = payload.get("data", [])
        if not records:
            break
        yield records

        cursor = payload.get("next_cursor")
        if not cursor:                              # last page reached
            break
        params["cursor"] = cursor
        # Proactive throttle: pause if the window is nearly exhausted.
        remaining = int(payload.get("meta", {}).get("rate_limit_remaining", 1))
        if remaining <= 1:
            time.sleep(float(payload["meta"].get("rate_limit_reset", 1)))

For an offset-based MES, replace the cursor with params["offset"] += page_size and break when len(records) < page_size. The stop condition is load-bearing: an off-by-one that re-requests the final page duplicates rows and biases the grand mean.

Step 3 — Map to a canonical schema and quarantine drift

Vendors append fields and deprecate measurement tags without versioning the endpoint, so validate every page against a strict contract instead of trusting whatever JSON arrives. Map incoming payloads to a canonical internal shape that isolates the SPC-relevant fields; route records that fail the contract to a quarantine list with their MES identity intact rather than failing the whole batch.

import pandas as pd

CANONICAL = {                       # vendor field  -> canonical field
    "meas_val": "measurement_value",
    "ts": "timestamp_utc",
    "stn": "station_id",
    "lot": "batch_lot",
}


def to_canonical(records: list[dict]) -> tuple[pd.DataFrame, list[dict]]:
    """Return (clean canonical frame, quarantined records that broke the contract)."""
    clean, quarantine = [], []
    for r in records:
        try:
            row = {canon: r[vendor] for vendor, canon in CANONICAL.items()}
        except KeyError:            # schema drift: a mapped field vanished
            quarantine.append({"reason": "MISSING_FIELD", "record": r})
            continue
        clean.append(row)
    df = pd.DataFrame(clean, columns=list(CANONICAL.values()))
    return df, quarantine

Isolating SPC fields here means a new marketing tag or a renamed audit column added upstream can never shift the column positions your control chart code depends on.

Step 4 — Stream to memory-safe, audit-ready chunks

Months of high-frequency telemetry will exhaust RAM if loaded into one DataFrame. Compose Steps 2–3 into a generator that yields typed, downcast chunks and persists them to Parquet partitioned by date and line — columnar storage cuts disk I/O by 60–80% versus CSV while preserving the schema for downstream stages.

import pyarrow as pa
import pyarrow.parquet as pq


def stream_mes_to_spc(client: MESClient, endpoint: str, out_dir: str, page_size: int = 1000):
    """Extract, canonicalize, downcast, and persist MES data in bounded chunks."""
    for records in iter_pages(client, endpoint, page_size=page_size):
        df, quarantine = to_canonical(records)
        if quarantine:
            log_quarantine(quarantine)          # never silently discard

        # Deliberate, precision-checked downcast — verify float32 holds your tolerance.
        df["measurement_value"] = pd.to_numeric(
            df["measurement_value"], errors="coerce", downcast="float"
        )
        df["station_id"] = df["station_id"].astype("category")
        df["timestamp_utc"] = pd.to_datetime(df["timestamp_utc"], utc=True, errors="coerce")

        pq.write_to_dataset(
            pa.Table.from_pandas(df, preserve_index=False),
            root_path=out_dir,
            partition_cols=["station_id"],
        )
        yield df                                  # also available for live dashboards

Verification

Confirm the extraction contract holds with a minimal offline fixture — no live MES required. Stub the paged responses and assert that pagination stops cleanly, drift is quarantined, and canonical columns survive:

class _FakeClient:
    def __init__(self, pages):
        self._pages = pages
        self._i = 0

    def get(self, endpoint, params=None):
        page = self._pages[self._i]
        self._i += 1
        return page


pages = [
    {"data": [{"meas_val": 50.2, "ts": "2026-07-01T08:00:00Z", "stn": "ST-1", "lot": "L1"}],
     "next_cursor": "c2"},
    {"data": [{"meas_val": 51.0, "ts": "2026-07-01T08:00:01Z", "stn": "ST-1"}],  # drift: no 'lot'
     "next_cursor": None},
]

seen = list(iter_pages(_FakeClient(pages), "/quality"))
assert len(seen) == 2                              # both pages walked, then stopped

df, quarantine = to_canonical(seen[1])
assert df.empty                                    # drifted record excluded from clean frame
assert quarantine[0]["reason"] == "MISSING_FIELD"  # and preserved for audit
print("extraction contract holds")

Expected output: extraction contract holds. The quarantine assertion is the load-bearing one — an extractor that drops the drifted record without a trace severs the link to the MES transaction and makes the eventual root-cause investigation impossible.

Root-Cause Table

Symptom	Cause	Fix
Dataset truncates mid-shift with no error	Access token expired between pages; server returned a silent `401`	Refresh proactively with a pre-expiry buffer and re-auth defensively on any `401` (Step 1)
Duplicate rows inflate the subgroup count	Cursor loop re-requested the final page (off-by-one stop condition)	Break on empty `data` or absent `next_cursor`; for offset APIs, stop when `len(records) < page_size` (Step 2)
Torque values arrive as `NaN` after extraction	Vendor renamed or appended a field; positional parsing coerced the wrong column	Map to a canonical schema by name and quarantine records that break the contract (Step 3)
Job crashes with `429 Too Many Requests` under load	Fixed-rate polling ignored the rate-limit window	Mount a `Retry` adapter and pause on low `X-RateLimit-Remaining` before the window closes (Steps 1–2)
`MemoryError` loading a multi-month pull	Whole response accumulated into one DataFrame	Stream bounded chunks, downcast dtypes, and persist to partitioned Parquet (Step 4)

Never blind-impute the gaps a failed page leaves behind: forward-fill only short gaps and flag longer ones for the downstream stage, since imputation across a maintenance window distorts Cp/Cpk and masks true special-cause variation. Validated batches then flow to timestamp reconciliation via the time-series alignment pipeline and to the missing-value policy for handling missing values in quality data. Compliance-wise, log every quarantined record with a reason code and timestamp so the electronic batch record stays defensible (21 CFR Part 11; AIAG SPC Reference Manual, ch. I on data integrity; IATF 16949 §7.5.3 on control of documented information).

FAQ

Why refresh the token 30 seconds before it actually expires?

Because a long pagination run can straddle the exact expiry instant: the client checks the clock, sees a still-valid token, sends the request, and the server rejects it as expired by the time it arrives. A pre-expiry buffer refreshes while the current token is unambiguously valid, so no page is ever requested with a token the server will reject. The defensive 401 re-auth in Step 1 is the backstop for early revocation, not the primary mechanism.

Cursor pagination or offset/limit — does it matter for SPC data?

Yes. Offset pagination re-scans from the start on every page, so if new rows are inserted mid-pull the offsets shift and you can skip or duplicate measurements — a silent bias in the grand mean. Cursor pagination anchors to a stable position and is safe against concurrent inserts. Prefer cursor when the MES offers it; if you must use offset, snapshot the query with a fixed upper time bound so the result set cannot grow underneath the loop.

Is downcasting `measurement_value` to float32 safe?

Only after you check it against your measurement tolerance. float32 carries roughly seven significant decimal digits, which is ample for most gauge resolutions, but a high-precision CMM reporting to sub-micron tolerances can lose meaningful digits. Verify that the round-trip through float32 preserves your smallest significant increment before committing to it; when in doubt, keep float64 and pay the memory cost.

Should the extractor drop records that fail schema validation?

No — quarantine them with a reason code and their original MES identity, never delete them. A dropped record is invisible to the eventual investigation, whereas a quarantined one preserves the audit trail and lets you measure drift rate over time. A quarantine rate climbing across successive pulls is an early warning that the vendor changed the payload, and catching it before the charts update prevents a contaminated baseline.

Up one level: Connecting Python to MES and SCADA Systems. For the full ingestion architecture see Manufacturing Data Ingestion and Preprocessing.