Phase E hygiene proper: E1 progress + E2 parity probe + E3 cookbook #13
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "feature/dune-extract-phase-e-hygiene"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Goal: hygine E. Closes the three filed E-tasks. Distinct from the prior quirk-fixes commit (PR #12) which lifted the sharpest-edged items; this is the formal Phase E delivery.
E1 — Verbose progress reporting (always-on heartbeat)
enrichment.enrich_with_row_data+localization.build_localization_indexboth gain aheartbeat=callback (independent of--verbose) that fires once per scanned pak withentries / DTs / rows / elapsedstats.__main__.pywires it to a_stderr_logso non-verbose users see per-pak progress during the ~10-min row-decode pass. FinalDone in Xm Ys (Z.Zs total).summary closes every run.Sample non-verbose output:
Paks that completed silently (0 DTs AND < 5 s) are suppressed to keep the log readable.
E2 — JSON + CSV cross-format parity probe
New
validate/probe_outputs.pybuilds the catalog once (FDI-only, fast — no row decode needed for parity checking) and runs it through all three writer formats, comparing per-category stem sets + DataTable inventory + global totals against the in-memory truth.Result: 8/8 categories OK; DataTable inventory 697/697 OK; JSON summary matches truth.
Markdown re-parse uses a fenced-block regex (lossier than JSON / CSV) with a ≥95% threshold to tolerate edge-case formatting; JSON and CSV match the in-memory catalog exactly. JSON Schema is a Phase F follow-on.
E3 — End-user troubleshooting cookbook
New
tools/dune-extract/TROUBLESHOOTING.mdcovering:Each section maps a verbatim error message to its cause + smallest-possible fix. Targets adopters cold-starting on Linux native / Linux Proton / WSL / native Windows.
No regression
probe_pak_entries.pyprobe_read_uasset.pyprobe_datatable.py Systems.pakprobe_outputs.pyThe heartbeat callback is opt-in (
Nonedefault), so any library consumer ofenrich_with_row_data/build_localization_indexis unaffected.Test plan