B4: wire row decode into catalog + activate FieldFilter end-to-end #7

Merged
Sponge merged 1 commit from feature/dune-extract-b4-catalog-wiring into develop 2026-05-28 02:24:01 +00:00
Owner

Summary

Final Phase B task. Wires B3's DataTable row-decode output through FieldFilter into catalog_writer — every row that B1+B2+B3 can reach now flows into Markdown / JSON / CSV catalog outputs.

What landed

  • uasset_parser.Summary.folder_name — the package's /Game/... path; the row scan identifies a parsed .uasset by its own ground truth (bypasses the B1-FU virtual-vs-physical mapping).
  • enrichment.CategoryCatalog.rowsdict[DataTable_filename, {row_name: {field: value}}] + total_rows property.
  • enrichment.enrich_with_row_data() — walks every pak's physical entries, parses DataTable-class .uasset, pairs with the physically-adjacent .uexp, filters each row through FieldFilter.filter_row, attaches to the matching category.
  • __main__ — calls it after build_catalog; reports row stats; warns when no Oodle backend; DUNE_EXTRACT_SKIP_ROWS=1 disables.
  • catalog_writer — emits rows in all three formats:
    • Markdown — per-category 'DataTable row contents' section, column-per-field tables (✓/✗ bools, backtick-key for FText), plus a 'Rows decoded' column in the inventory.
    • JSONcategories[cat].datatable_rows + rows_decoded.
    • CSV — long-format items-rows.csv.

End-to-end results

FDI parse: 131,654 file entries across 37 paks  (3.3s)
Categorized 25,618 stems into 8 categories
Row data: 14 DataTables parsed, 159 rows recovered (ooz backend)
  Dungeons.pak  DT_NPCRecipes_DNG_Pit / _Pit_2   (50 rows)
  Input.pak     DT_IA_* / DT_IMC_*               (69 rows across 10 tables)
  Systems.pak   DT_ArmorStats (4) + DT_WeaponStats (36)
Wrote 9 .md + items-catalog.json + items-rows.csv (872 long-format rows)

The Weapons catalog now carries DT_WeaponStats Damage row: StatDisplayName=UI/ItemStat_Damage, StatStep=0.1, bIncreasingIsGoodForPlayer=✓.

FieldFilter activation

Genuinely live for the first time — every row dict is .filter_row()-ed before publication. --include-asset-paths / --include-lore / --include-dev-commentary now affect real output. The 14 currently-reached tables are stat-definition tables (mechanical fields), so the flags don't visibly change THIS run; they matter once richer item tables unlock.

No regression

Probe Result
B1 — probe_pak_entries.py Abilities 575/812 decoded, 574/574 (100%) dual-header adjacency
B2 — probe_read_uasset.py 4/4 (header MATCH, uncompressed magic, NIST AES, Oodle 60/60)
B3 — probe_datatable.py Systems.pak 5 DataTables / 60 rows parsed, value-type spread intact
A1 dry-run clean; parser reports implemented

Honest coverage note

The B1 virtual-vs-physical bottleneck that bounded B3 bounds B4: 14 DataTables out of ~1,725 inventoried in the FDI. Richer per-item tables (DT_BaseItems_Weapons, shown as ) unlock when B1 widens or the scan gains an FDI-driven name→offset mapping. Wiring is complete; surface grows with B1.

Test plan

cd tools/dune-extract
python3 validate/probe_pak_entries.py
python3 validate/probe_read_uasset.py
python3 validate/probe_datatable.py Systems.pak
python3 -m dune_extract --output-dir /tmp/test     # expect "14 DataTables, 159 rows recovered"
grep -A8 "DataTable row contents" /tmp/test/ITEMS-Weapons-Paths.md
## Summary Final Phase B task. Wires B3's DataTable row-decode output through `FieldFilter` into `catalog_writer` — every row that B1+B2+B3 can reach now flows into Markdown / JSON / CSV catalog outputs. ## What landed - **`uasset_parser.Summary.folder_name`** — the package's `/Game/...` path; the row scan identifies a parsed `.uasset` by its own ground truth (bypasses the B1-FU virtual-vs-physical mapping). - **`enrichment.CategoryCatalog.rows`** — `dict[DataTable_filename, {row_name: {field: value}}]` + `total_rows` property. - **`enrichment.enrich_with_row_data()`** — walks every pak's physical entries, parses DataTable-class `.uasset`, pairs with the physically-adjacent `.uexp`, filters each row through `FieldFilter.filter_row`, attaches to the matching category. - **`__main__`** — calls it after `build_catalog`; reports row stats; warns when no Oodle backend; `DUNE_EXTRACT_SKIP_ROWS=1` disables. - **`catalog_writer`** — emits rows in all three formats: - **Markdown** — per-category 'DataTable row contents' section, column-per-field tables (✓/✗ bools, backtick-key for FText), plus a 'Rows decoded' column in the inventory. - **JSON** — `categories[cat].datatable_rows` + `rows_decoded`. - **CSV** — long-format `items-rows.csv`. ## End-to-end results ``` FDI parse: 131,654 file entries across 37 paks (3.3s) Categorized 25,618 stems into 8 categories Row data: 14 DataTables parsed, 159 rows recovered (ooz backend) Dungeons.pak DT_NPCRecipes_DNG_Pit / _Pit_2 (50 rows) Input.pak DT_IA_* / DT_IMC_* (69 rows across 10 tables) Systems.pak DT_ArmorStats (4) + DT_WeaponStats (36) Wrote 9 .md + items-catalog.json + items-rows.csv (872 long-format rows) ``` The Weapons catalog now carries `DT_WeaponStats Damage` row: `StatDisplayName=UI/ItemStat_Damage, StatStep=0.1, bIncreasingIsGoodForPlayer=✓`. ## FieldFilter activation Genuinely live for the first time — every row dict is `.filter_row()`-ed before publication. `--include-asset-paths` / `--include-lore` / `--include-dev-commentary` now affect real output. The 14 currently-reached tables are stat-definition tables (mechanical fields), so the flags don't visibly change THIS run; they matter once richer item tables unlock. ## No regression | Probe | Result | |---|---| | B1 — `probe_pak_entries.py` | Abilities 575/812 decoded, **574/574 (100%) dual-header adjacency** | | B2 — `probe_read_uasset.py` | **4/4** (header MATCH, uncompressed magic, NIST AES, Oodle 60/60) | | B3 — `probe_datatable.py Systems.pak` | **5 DataTables / 60 rows** parsed, value-type spread intact | | A1 dry-run | clean; parser reports `implemented` | ## Honest coverage note The B1 virtual-vs-physical bottleneck that bounded B3 bounds B4: **14 DataTables out of ~1,725** inventoried in the FDI. Richer per-item tables (`DT_BaseItems_Weapons`, shown as `—`) unlock when B1 widens or the scan gains an FDI-driven name→offset mapping. Wiring is complete; surface grows with B1. ## Test plan ```bash cd tools/dune-extract python3 validate/probe_pak_entries.py python3 validate/probe_read_uasset.py python3 validate/probe_datatable.py Systems.pak python3 -m dune_extract --output-dir /tmp/test # expect "14 DataTables, 159 rows recovered" grep -A8 "DataTable row contents" /tmp/test/ITEMS-Weapons-Paths.md ```
Lands the final Phase B task: every row that B1+B2+B3 can reach now
flows through FieldFilter into Markdown / JSON / CSV outputs.

Changes:
  * uasset_parser.Summary gains folder_name (the package's /Game/...
    path) so enrich_with_row_data identifies a parsed .uasset by its
    own ground truth, bypassing the B1-FU virtual-vs-physical mapping.
  * enrichment.CategoryCatalog gains rows: dict (DataTable filename ->
    {row_name: {field: value}}) + total_rows property.
  * enrichment.enrich_with_row_data(catalog, pak_map, paks_dir,
    field_filter) walks every pak's physical entries, parses any
    DataTable-class .uasset, pairs it with the adjacent .uexp, filters
    each row through FieldFilter.filter_row, and attaches to the
    matching category. Returns stats {tables_parsed, rows_recovered,
    oodle_available, scanned_*}.
  * __main__.py calls it after build_catalog; reports row stats; warns
    when no Oodle backend. DUNE_EXTRACT_SKIP_ROWS=1 disables.
  * catalog_writer emits rows in every format:
      - Markdown: per-category "DataTable row contents" section, one
        subsection per reached table with column-per-field tables
        (bools as ✓/✗, FText as `key`, truncated long strings); the
        DataTable inventory shows a "Rows decoded" column.
      - JSON: categories[cat].datatable_rows + rows_decoded.
      - CSV: long-format items-rows.csv (category, datatable,
        row_name, field_name, value).

End-to-end run: 14 DataTables / 159 rows recovered with the ooz
backend (Dungeons.pak NPCRecipes 50, Input.pak IA/IMC tables 69,
Systems.pak ArmorStats 4 + WeaponStats 36). Weapons catalog now
carries Damage row: StatDisplayName=UI/ItemStat_Damage, StatStep=0.1,
bIncreasingIsGoodForPlayer=✓.

FieldFilter is genuinely live for the first time — every row dict is
.filter_row()-ed before publication. The --include-asset-paths /
--include-lore / --include-dev-commentary flags now affect real
output (no-visible effect on the current 14 stat-definition tables;
they'll matter once richer item tables unlock).

No regression — B1/B2/B3 probes reproduce same numbers (B1 574/574
adjacency, B2 4/4, B3 5 tables / 60 rows). Coverage of the broader
DataTable set remains bounded by B1's physical-entry recovery.
Sponge merged commit 163364a0d9 into develop 2026-05-28 02:24:01 +00:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
Sponge/Dune-Awakening-Server-Tools!7
No description provided.