Classroom evaluation set template
A small CSV format we use to build repeatable retrieval tests with expected sources.
This is the template we use when we’re building a classroom evaluation set for retrieval.
The idea is simple: students propose questions, and we record what sources should be retrieved for those questions. That gives us repeatable tests for:
- retrieval quality (Recall@k / nDCG@k)
- coverage gaps (by domain, jurisdiction, language)
- drift over time (does retrieval change after ingestion updates?)
Fields (what each column means)
query: the question students will ask.expected_sources: one or more source URLs/IDs that must appear in the top‑k retrieval (separate multiple sources with;).domain:policy/standards/academic/course/other.jurisdiction: the relevant region (e.g.,NG,EU,US,Global).language: best-effort, e.g.en,fr.difficulty:easy/medium/hard.notes: why the query matters, edge cases, grading guidance.
Download the CSV header
- Download: /downloads/evaluation_set_template.csv
If you don’t want to download a file, here’s the same header row:
query,expected_sources,domain,jurisdiction,language,difficulty,notes
How we recommend using it
- Start with 50–200 queries.
- Require each query to include at least 1 expected source.
- Tag each query (domain/jurisdiction/language) so you can slice results and see what the corpus is missing.
- Keep the file under version control so the class can see what changed and why.