Is your skill still doing the right thing?

Learn about a new tool to test if your SKILL.md is followed by popular agents. Verify skill loading, file access, commands, tokens, and output for confident workflow behavior.

Claude Code Codex Cursor Agent OpenCode skillgym

Overview

A new testing tool that lets you write tests to verify that your SKILL.md is followed by the most popular agents (Codex, Claude Code, OpenCode, and Cursor Agent) and that the workflow behaves exactly as expected, with assertions on loaded skills, files read, commands invoked, tokens used, and returned text.

Links

https://github.com/callstackincubator/skillgym
Skillgym benchmarks AI agent skills using TypeScript-based execution assertions.
https://incubator.callstack.com/skillgym
Skillgym benchmarks AI agents using TypeScript assertions and token snapshots.

Tech stack

Claude Code

Anthropic's agentic coding tool: Unleash Claude's raw power directly in your terminal or IDE to turn complex, hours-long workflows into a single command.

Claude Code is Anthropic’s powerful agentic coding assistant, designed for high-velocity development. It operates natively within your terminal, IDE (VS Code, JetBrains), or via a web interface, allowing you to delegate complex tasks like feature building, bug fixing, and codebase navigation. The agent plans, edits files, executes commands, and creates commits, maintaining awareness of your entire project structure. Internally, Anthropic engineers using Claude Code reported a 67% increase in productivity, demonstrating its capacity to deliver significant gains for Pro and Max plan users.

https://claude.com/code

View projects
Codex

Codex is OpenAI's autonomous AI software engineering agent: it executes full development tasks in a sandboxed cloud environment.

Codex is the advanced, cloud-based software engineering agent from OpenAI, built on a specialized model like `codex-1` (a fine-tuned version of `o3`). It operates on an asynchronous delegation model, allowing developers to assign complete tasks—not just receive suggestions—via the ChatGPT interface. The agent works independently in a secure, isolated cloud container provisioned with the user's GitHub repository and environment. It reads code, writes new features, fixes bugs, runs tests, and drafts Pull Requests (PRs) for review, significantly accelerating the development lifecycle. Access is provided through ChatGPT Plus, Pro, and Enterprise plans.

https://openai.com/codex

View projects
Cursor Agent

An AI-native code editor that uses a specialized agent to write, refactor, and navigate entire codebases via natural language.

Cursor Agent (Composer) transforms the IDE into an active collaborator by indexing your local files and documentation to execute complex, multi-file edits. It handles the heavy lifting: refactoring legacy modules, fixing TypeScript errors across your project, and generating boilerplate with high-context accuracy. By combining Claude 3.5 Sonnet with a custom RAG engine, it eliminates the manual hunt-and-peck of traditional coding, allowing developers to ship features by describing intent rather than typing every line.

https://cursor.com

View projects
OpenCode

OpenCode is the open-source AI coding agent (CLI tool), integrating LLMs like GPT-5 and Claude Sonnet 4 directly into the terminal for fast, context-aware development.

OpenCode is the open-source AI coding agent, built for terminal-first developers who demand speed and privacy. It connects your local files, Git history, and a choice of LLMs (e.g., OpenAI's GPT-5 Nano, Anthropic's Claude Sonnet 4) to execute complex tasks directly from the command line . The tool bypasses IDE and browser dependencies, allowing developers to triage issues, fix errors, or implement features with commands like `opencode fix error in main.go` . With over 26,000 GitHub stars by October 2025, OpenCode delivers a secure, context-aware coding partner that keeps your code local and your workflow efficient .

https://opencode.ai

View projects
skillgym

An AI-driven video role-play platform that accelerates soft skills development through immersive, real-time conversational simulations.

Skillgym replaces static training with high-fidelity Digital Role Plays, utilizing proprietary AI and interactive video to simulate high-stakes professional conversations. Users engage with lifelike avatars in scenarios ranging from performance reviews to complex sales negotiations, receiving instant feedback on 12 distinct behavioral competencies. The platform tracks progress via the Skillgym Proficiency Index (SPI), providing leaders with granular data on team readiness and behavioral shifts. By combining neuroscientific principles with a cloud-based interface, Skillgym delivers a scalable environment where managers and reps practice critical interactions until they reach mastery.

https://www.skillgym.com

View projects