Harness Regression Evaluator

Evaluate proposed harness changes against benchmark prompts and red-team scenarios before adoption.

evaluation, benchmarks, harness, regression

Install this skill

./bin/skills-hub install agentops/[email protected] --runtime codex

./bin/skills-hub install agentops/[email protected] --runtime claude

./bin/skills-hub install agentops/[email protected] --runtime generic --target ./my-agent/skills

Operational Summary

Use when: Use when proposed harness changes need benchmark and red-team evaluation before adoption.

Execution mode: may-run-local-verification

Approval boundary: Safe for harness evaluation; require human approval before applying accepted harness changes.

Readiness: Experimental

Security reviewed: no

Lifecycle: Active

ID: agentops/harness-regression-evaluator

Runtimes: codex, claude, generic

Tool dependencies: 0

API dependencies: 0

Tools: Not documented

APIs: Not documented