Claude Skills and Subagents: Escaping the Prompt Engineering Hamster Wheel

You craft the perfect prompt after 20 minutes of iteration. It works brilliantly. Three days later, you need that same behavior again, so you... start prompting from scratch, because who remembers where they saved that template? Welcome to what I call the prompt engineering hamster wheel—and it's a fundamentally broken workflow.

Claude Skills are Anthropic's answer to this reusable prompt problem, but they're much more than glorified prompt storage. They introduce a fundamentally different approach to context management, token economics, and AI workflow architecture that could save you hundreds of dollars monthly while dramatically improving response quality.

Why This Matters: The Hidden Cost of Current Workflows

Every token in your AI's context window costs you in three compounding ways:

Direct cost: You're paying per token through API usage or hitting usage limits faster
Latency tax: More input tokens mean slower responses due to attention mechanism overhead
Quality degradation: LLMs demonstrably perform worse when context is cluttered with irrelevant information

Let's put real numbers on this. A typical MCP setup for development work might include:

AWS servers (infrastructure): ~8,500 tokens for 13 tools
GitHub (code search): ~2,000 tokens for 26 tools
Linear (project management): ~3,250 tokens for 33 tools
Sentry (error tracking): ~12,500 tokens for 22 tools
Plus others: ~5,750 tokens for miscellaneous tools

That's roughly 32,000 tokens of tool metadata loaded into every single message, whether you use those tools or not. At Claude Opus 4.6's $5 per million input tokens, those idle MCP descriptions add $0.16 to every message. Send 50 messages daily over a 20-day work month? That's $160/month in pure overhead—before you account for the latency and quality impacts.

The fundamental problem isn't the tools themselves—it's the eager loading pattern that dumps everything into context upfront, regardless of relevance.

What Claude Skills Actually Are

At their core, Claude Skills are reusable instruction sets that AI agents can automatically access when relevant to a conversation. You write a skill.md file with metadata and instructions, drop it into a .claude/skills/ directory, and Claude handles the rest.

The Three-Level Loading System

Skills use progressive disclosure across three distinct levels, each with its own context budget:

Metadata (loaded at startup): Skill name (max 64 characters) and description (max 1,024 characters). Costs roughly ~100 tokens per skill—negligible overhead even with hundreds registered.
Skill body (loaded on invocation): The full instruction set inside skill.md, up to ~5,000 tokens. Only enters context when the agent determines the skill is relevant.
Referenced files (loaded on demand): Additional markdown files, folders, or scripts within the skill directory. No practical limit, loaded only when instructions reference them and the current task requires it.

A basic skill looks like this:

---
name: code-reviewer
description: Reviews pull requests following team conventions and security best practices
---

## Code Review Guidelines

When reviewing code:
1. Check for security vulnerabilities first
2. Verify adherence to team style guide in /docs/style.md
3. Ensure proper test coverage
4. Reference related Linear ticket in comments

Key insight: Skills are lazy-loaded context. The agent doesn't consume the full instruction set upfront—it progressively discloses information to itself, pulling in only what's needed for the current step.

Auto-Invocation Changes Everything

The real power isn't just storage—it's automatic relevance detection. When you start a conversation, Claude scans skill metadata to understand what expertise is available. When it detects a skill might be relevant, it loads the full body. If that body references additional files, it reads those too, but only on demand.

This means you could register 300 skills and still consume fewer tokens at startup than a typical MCP setup. In practice, most conversations invoke one or two skills while the rest remain invisible to the context window.

Skills vs MCP: Apples and Oranges (That Work Together)

Before we go further, let's clear up a common misconception. Skills aren't "better MCPs"—they're solving different problems:

MCP (Model Context Protocol) gives an agent capabilities—the "what" it can do
Skills give an agent expertise—the "how" to do it well

MCP is an open standard that lets any LLM interact with external applications. Before MCP, connecting M models to N tools required M × N custom integrations. MCP collapses that to M + N: each model implements the protocol once, each tool exposes it once, and they all interoperate.

Skills are "glorified prompts" (in the best possible way). They give agents expertise on how to approach tasks, what conventions to follow, when to use which tools, and how to structure output.

A Concrete Example

Say you connect GitHub's MCP server to your agent. MCP gives the agent the ability to create pull requests, list issues, and search repositories. But it doesn't tell the agent:

How your team structures PRs
That you always include a testing section
That you tag by change type
That you reference Linear tickets in titles

That's what a GitHub workflow skill provides—the playbook for using the tools effectively.

The real insight: MCP's eager loading pattern is the problem, not MCP itself. Skills prove that lazy-loading works. So why can't MCP tool access be lazy-loaded too?

Subagents: The Best of Both Worlds

Subagents are specialized child agents with isolated context windows and dedicated tool sets. Two properties make them powerful:

Isolated Context

A subagent starts with a clean context window, pre-loaded with its own system prompt and only assigned tools. Everything it reads, processes, and generates stays in its own context. The main agent only sees the final result.

Isolated Tools

Each subagent gets its own MCP servers and skills. The main agent doesn't know about (or pay for) tools it never directly uses.

The Token Economics Magic

Once a subagent finishes its task, its entire context gets discarded. Tool metadata, intermediate reasoning, API responses—all gone. Only the result flows back to the main agent.

Imagine a subagent that researches a library's API. It might:

Search across multiple documentation sources
Read through dozens of pages
Try several queries before finding the right answer
Generate thousands of tokens of intermediate work

You pay for the subagent's token usage, but all that intermediate work—the dead ends, irrelevant pages, failed searches—gets discarded. None of it compounds into the main agent's context, so every subsequent message stays clean and cheap.

Practical Architecture

You can design setups where MCP servers are only accessible through specific subagents, never loaded on the main agent at all. Instead of carrying ~32,000 tokens of tool metadata in every message, the main agent carries nearly zero.

When it needs to open a pull request, it spawns a GitHub subagent with:

GitHub MCP server (2,000 tokens of tool metadata)
Code review skill (500 tokens when invoked)
Team workflow skill (300 tokens when invoked)

The subagent does its work, returns a clean result, and disappears. The main conversation continues without any GitHub-related bloat.

Bottom line: Subagents let you have the MCP tool ecosystem without paying the context window tax on every message.

Building Your Skills Architecture

Here's how to start implementing this approach:

1. Audit Your Current Prompt Patterns

Look through your recent conversations and identify:

Prompts you've used multiple times
Instructions you find yourself repeating
Specific workflows or formats you always want

2. Create Your First Skill

Start simple with a high-frequency use case:

---
name: api-documentation
description: Creates comprehensive API documentation following OpenAPI standards
---

## Documentation Standards

When documenting APIs:
1. Use OpenAPI 3.0 format
2. Include example requests/responses
3. Document all error codes
4. Add rate limiting information
5. Reference authentication in /docs/auth-guide.md

3. Design Your Subagent Architecture

Group related tools into specialized subagents:

Code subagent: GitHub, Linear, code analysis tools
Infrastructure subagent: AWS, monitoring, deployment tools
Research subagent: Documentation search, web search, knowledge bases

4. Measure the Impact

Track these metrics before and after:

Average tokens per message
Response latency
Monthly API costs
Subjective quality of responses

The Bottom Line

The prompt engineering hamster wheel exists because we've been thinking about AI interactions wrong—as isolated conversations instead of reusable systems. Claude Skills and subagents represent a fundamental shift toward treating AI as infrastructure: modular, efficient, and composable.

Skills solve the reusability problem through lazy-loaded expertise. Subagents solve the context bloat problem through isolated, specialized workers. Together, they can cut your token costs by 80%+ while dramatically improving response quality and development velocity. The real question isn't whether to adopt this architecture—it's how quickly you can migrate your current workflows to take advantage of it.

Why This Matters: The Hidden Cost of Current Workflows

Every token in your AI's context window costs you in three compounding ways:

Direct cost: You're paying per token through API usage or hitting usage limits faster
Latency tax: More input tokens mean slower responses due to attention mechanism overhead
Quality degradation: LLMs demonstrably perform worse when context is cluttered with irrelevant information

Let's put real numbers on this. A typical MCP setup for development work might include:

AWS servers (infrastructure): ~8,500 tokens for 13 tools
GitHub (code search): ~2,000 tokens for 26 tools
Linear (project management): ~3,250 tokens for 33 tools
Sentry (error tracking): ~12,500 tokens for 22 tools
Plus others: ~5,750 tokens for miscellaneous tools

The fundamental problem isn't the tools themselves—it's the eager loading pattern that dumps everything into context upfront, regardless of relevance.

What Claude Skills Actually Are

The Three-Level Loading System

Skills use progressive disclosure across three distinct levels, each with its own context budget:

Metadata (loaded at startup): Skill name (max 64 characters) and description (max 1,024 characters). Costs roughly ~100 tokens per skill—negligible overhead even with hundreds registered.
Skill body (loaded on invocation): The full instruction set inside skill.md, up to ~5,000 tokens. Only enters context when the agent determines the skill is relevant.
Referenced files (loaded on demand): Additional markdown files, folders, or scripts within the skill directory. No practical limit, loaded only when instructions reference them and the current task requires it.

A basic skill looks like this:

---
name: code-reviewer
description: Reviews pull requests following team conventions and security best practices
---

## Code Review Guidelines

When reviewing code:
1. Check for security vulnerabilities first
2. Verify adherence to team style guide in /docs/style.md
3. Ensure proper test coverage
4. Reference related Linear ticket in comments

Key insight: Skills are lazy-loaded context. The agent doesn't consume the full instruction set upfront—it progressively discloses information to itself, pulling in only what's needed for the current step.

Auto-Invocation Changes Everything

Skills vs MCP: Apples and Oranges (That Work Together)

Before we go further, let's clear up a common misconception. Skills aren't "better MCPs"—they're solving different problems:

MCP (Model Context Protocol) gives an agent capabilities—the "what" it can do
Skills give an agent expertise—the "how" to do it well

Skills are "glorified prompts" (in the best possible way). They give agents expertise on how to approach tasks, what conventions to follow, when to use which tools, and how to structure output.

A Concrete Example

Say you connect GitHub's MCP server to your agent. MCP gives the agent the ability to create pull requests, list issues, and search repositories. But it doesn't tell the agent:

How your team structures PRs
That you always include a testing section
That you tag by change type
That you reference Linear tickets in titles

That's what a GitHub workflow skill provides—the playbook for using the tools effectively.

The real insight: MCP's eager loading pattern is the problem, not MCP itself. Skills prove that lazy-loading works. So why can't MCP tool access be lazy-loaded too?

Subagents: The Best of Both Worlds

Subagents are specialized child agents with isolated context windows and dedicated tool sets. Two properties make them powerful:

Isolated Context

Isolated Tools

Each subagent gets its own MCP servers and skills. The main agent doesn't know about (or pay for) tools it never directly uses.

The Token Economics Magic

Once a subagent finishes its task, its entire context gets discarded. Tool metadata, intermediate reasoning, API responses—all gone. Only the result flows back to the main agent.

Imagine a subagent that researches a library's API. It might:

Search across multiple documentation sources
Read through dozens of pages
Try several queries before finding the right answer
Generate thousands of tokens of intermediate work

Practical Architecture

When it needs to open a pull request, it spawns a GitHub subagent with:

GitHub MCP server (2,000 tokens of tool metadata)
Code review skill (500 tokens when invoked)
Team workflow skill (300 tokens when invoked)

The subagent does its work, returns a clean result, and disappears. The main conversation continues without any GitHub-related bloat.

Bottom line: Subagents let you have the MCP tool ecosystem without paying the context window tax on every message.

Building Your Skills Architecture

Here's how to start implementing this approach:

1. Audit Your Current Prompt Patterns

Look through your recent conversations and identify:

Prompts you've used multiple times
Instructions you find yourself repeating
Specific workflows or formats you always want

2. Create Your First Skill

Start simple with a high-frequency use case:

---
name: api-documentation
description: Creates comprehensive API documentation following OpenAPI standards
---

## Documentation Standards

When documenting APIs:
1. Use OpenAPI 3.0 format
2. Include example requests/responses
3. Document all error codes
4. Add rate limiting information
5. Reference authentication in /docs/auth-guide.md

3. Design Your Subagent Architecture

Group related tools into specialized subagents:

Code subagent: GitHub, Linear, code analysis tools
Infrastructure subagent: AWS, monitoring, deployment tools
Research subagent: Documentation search, web search, knowledge bases

4. Measure the Impact

Track these metrics before and after:

Average tokens per message
Response latency
Monthly API costs
Subjective quality of responses

Why This Matters: The Hidden Cost of Current Workflows

What Claude Skills Actually Are

The Three-Level Loading System

Auto-Invocation Changes Everything

Skills vs MCP: Apples and Oranges (That Work Together)

A Concrete Example

Subagents: The Best of Both Worlds

Isolated Context

Isolated Tools

The Token Economics Magic

Practical Architecture

Building Your Skills Architecture

1. Audit Your Current Prompt Patterns

2. Create Your First Skill

3. Design Your Subagent Architecture

4. Measure the Impact

The Bottom Line

Try This Now

How many Orkos does this deserve?

Sources (1)

Why This Matters: The Hidden Cost of Current Workflows

What Claude Skills Actually Are

The Three-Level Loading System

Auto-Invocation Changes Everything

Skills vs MCP: Apples and Oranges (That Work Together)

A Concrete Example

Subagents: The Best of Both Worlds

Isolated Context

Isolated Tools

The Token Economics Magic

Practical Architecture

Building Your Skills Architecture

1. Audit Your Current Prompt Patterns

2. Create Your First Skill

3. Design Your Subagent Architecture

4. Measure the Impact

The Bottom Line

Try This Now

How many Orkos does this deserve?

Sources (1)