Name: Connecting AI to Guides
Author: Scott Annan

What to measure

Define metrics before you run. Suggested set (from the eval kit):

Metric	Definition
recall@k	Gold expected_item_id appears in top-k retrieved chunks
precision@k	Share of retrieved chunks that are relevant
citation_accuracy	Final answer cites the correct item_id (GDF arm only)
answer_accuracy	Final answer matches gold (human or LLM judge)

Arms (same host, same slug):

Arm	Fetch URL
Structured (GDF)	https://guides.co/g/{slug}/gdf or ?format=jsonl
Plaintext baseline	https://guides.co/g/{slug}/gdf?format=plaintext

Use jsonl when your retriever indexes one record per page - each line includes item_id, title, and body.