AI & Agent Evaluation
475total visitsadmin
reading room / notes / evals

Reading room

Short summaries of AI and agent evaluation research, organized by broad tags.

$ evals.index --public
posts: 1
mode: short summaries
storage: sqlite
status: listening
Sort by source dateLatest firstEarliest first

Filtering by rubrics. Clear filter.

ResearchGate — From Holistic Evaluation to Structured Criteria: A Survey of Rubrics Across the Evolving LLM Landscape

preprint · source date 2026-05-31 · 0 comments · original

1. Problems / challenges / motivations - As LLMs move from task-specific systems toward open-ended agents, one scalar score is often too opaque. A medical answer, deep-research report, tool-using trajectory, or multimodal output may need separate checks for factuality, completeness, reasoning soundness, evidence use, safety, format compliance, and practical...