Classroom evaluation set template · Docs · AIethicsChat

A small CSV format we use to build repeatable retrieval tests with expected sources.

This is the template we use when we’re building a classroom evaluation set for retrieval.

The idea is simple: students propose questions, and we record what sources should be retrieved for those questions. That gives us repeatable tests for:

retrieval quality (Recall@k / nDCG@k)
coverage gaps (by domain, jurisdiction, language)
drift over time (does retrieval change after ingestion updates?)

Fields (what each column means)

query: the question students will ask.
expected_sources: one or more source URLs/IDs that must appear in the top‑k retrieval (separate multiple sources with ;).
domain: policy / standards / academic / course / other.
jurisdiction: the relevant region (e.g., NG, EU, US, Global).
language: best-effort, e.g. en, fr.
difficulty: easy / medium / hard.
notes: why the query matters, edge cases, grading guidance.

Download the CSV header

Download: /downloads/evaluation_set_template.csv

If you don’t want to download a file, here’s the same header row:

query,expected_sources,domain,jurisdiction,language,difficulty,notes

How we recommend using it

Start with 50–200 queries.
Require each query to include at least 1 expected source.
Tag each query (domain/jurisdiction/language) so you can slice results and see what the corpus is missing.
Keep the file under version control so the class can see what changed and why.