GDF keeps a small required core and lets unknown fields pass through - so the format can survive vendor and version changes without breaking readers. Structure only matters if it helps agents find the right chunk.
This benchmark answers:
When our agent retrieves from GDF versus the same content as plain text, do precision and recall improve - with numbers?
Hypothesis: GDF arms should score higher on citation accuracy (correct item_id) and often on recall@k because pages, breadcrumbs, and jsonl chunks align with gold labels.
Null result is valid: if plaintext matches GDF, you may not need structure for that corpus shape - you still learn something.