Results

The results command family works on existing local AgentV run workspaces and index.jsonl manifests. Use it after an eval run to inspect failures, validate manifests, export artifact layouts, combine/delete local run workspaces, or generate a shareable HTML report.

Remote result repository exchange is intentionally not part of agentv results. New eval runs can auto-export to a configured results repo when auto_push: true; manual remote status and sync are Dashboard/API workflows. See Dashboard Remote Results for configuration and sync behavior.

Subcommands

Subcommand	Purpose
`results report`	Generate a self-contained static HTML report from an existing run workspace
`results export`	Materialize or normalize the artifact workspace structure for a manifest
`results combine`	Combine partial local run workspaces into a new local run workspace
`results delete`	Delete one or more local run workspaces
`results summary`	Print aggregate metrics for a run
`results failures`	Show only failing cases
`results show`	Display case-level rows from a run workspace
`results validate`	Validate that a workspace or manifest resolves correctly

`results report`

The results report command turns an existing run workspace or index.jsonl manifest into a self-contained HTML report for sharing, inspection, and human review.

agentv results report <run-workspace-or-index.jsonl>

Examples:

# Generate report.html next to the run manifest
agentv results report .agentv/results/runs/2026-03-14T10-32-00_claude

# Use an explicit output path
agentv results report .agentv/results/runs/2026-03-14T10-32-00_claude/index.jsonl \
  --out ./reports/human-review.html

What it shows:

Summary stats — total tests, passed, failed, pass rate, duration, and cost
Eval file groups — test cases grouped by eval file with pass rate, test count, and duration
Expandable details — unified assertions with pass/fail indicators and type badges, collapsible input/output
Criteria column — shows the test prompt or description inline for quick scanning

AgentV results report showing an expanded failing test case with unified assertions, deterministic type badges, pass/fail indicators, evidence text, and collapsible input/output

Option	Description
`--out`, `-o`	Output HTML file (defaults to `<run-dir>/report.html`)
`--dir`, `-d`	Working directory used to resolve the source path

`results export`

Use results export when you need the artifact workspace layout itself rather than a rendered report.

agentv results export <run-workspace-or-index.jsonl> [--out <dir>]

This is useful when a manifest needs to be materialized into a predictable artifact tree for other tooling, review, or archiving. The run workspace is also where generated task bundles live: index.jsonl rows may point to per-result task_dir, eval_path, targets_path, files_path, and graders_path entries. Keep those generated artifacts with the run when sharing or auditing results.

Inspection helpers

For lightweight terminal workflows:

agentv results summary .agentv/results/runs/<timestamp>
agentv results failures .agentv/results/runs/<timestamp>
agentv results show .agentv/results/runs/<timestamp> --test-id my-case
agentv results validate .agentv/results/runs/<timestamp>

For a review-centric workflow built around these artifacts, see Human Review Checkpoint.

Remote results sync/status

The CLI contract is deliberately narrow: agentv results manages local result artifacts only. It does not expose results remote status or results remote sync subcommands.

Use these supported remote workflows instead:

Automatic publishing: configure projects[].results.auto_push: true; new agentv eval and agentv pipeline bench runs push their artifacts after the run completes.
Manual Dashboard sync: run agentv dashboard, open the project, and use Sync Project.
Manual API sync: while Dashboard is running, call GET /api/projects/:projectId/remote/status or POST /api/projects/:projectId/remote/sync for project-scoped automation. Single-project sessions also expose GET /api/remote/status and POST /api/remote/sync.
Git escape hatch: for advanced recovery, inspect or repair the configured projects[].results.path clone with git directly, then sync again.