BattlecatAI
HomeBrowsePathsToolsLevel UpRewardsBookmarksSearchSubmit

Battlecat AI — Built on the AI Maturity Framework

Claude Skills and Subagents: Escaping the Prompt Engineering Hamster Wheel
L3 SupervisorPracticeadvanced8 min read

Claude Skills and Subagents: Escaping the Prompt Engineering Hamster Wheel

That tedious cycle of crafting great prompts, then losing them, then starting over? There's finally a better way. Claude Skills and subagents are fundamentally changing how we build with AI—and the token savings alone will shock you.

Claude SkillsMCP (Model Control Protocol)subagentstoken economicscontext managementprompt engineeringAI workflow optimizationClaude

You craft the perfect prompt after 20 minutes of iteration. It works brilliantly. Three days later, you need that same behavior again, so you... start prompting from scratch, because who remembers where they saved that template? Welcome to what I call the prompt engineering hamster wheel—and it's a fundamentally broken workflow.

Claude Skills are Anthropic's answer to this reusable prompt problem, but they're much more than glorified prompt storage. They introduce a fundamentally different approach to context management, token economics, and AI workflow architecture that could save you hundreds of dollars monthly while dramatically improving response quality.

Why This Matters: The Hidden Cost of Current Workflows

Every token in your AI's context window costs you in three compounding ways:

  • Direct cost: You're paying per token through API usage or hitting usage limits faster
  • Latency tax: More input tokens mean slower responses due to attention mechanism overhead
  • Quality degradation: LLMs demonstrably perform worse when context is cluttered with irrelevant information

Let's put real numbers on this. A typical MCP setup for development work might include:

  • AWS servers (infrastructure): ~8,500 tokens for 13 tools
  • GitHub (code search): ~2,000 tokens for 26 tools
  • Linear (project management): ~3,250 tokens for 33 tools
  • Sentry (error tracking): ~12,500 tokens for 22 tools
  • Plus others: ~5,750 tokens for miscellaneous tools

That's roughly 32,000 tokens of tool metadata loaded into every single message, whether you use those tools or not. At Claude Opus 4.6's $5 per million input tokens, those idle MCP descriptions add $0.16 to every message. Send 50 messages daily over a 20-day work month? That's $160/month in pure overhead—before you account for the latency and quality impacts.

The fundamental problem isn't the tools themselves—it's the eager loading pattern that dumps everything into context upfront, regardless of relevance.


What Claude Skills Actually Are

At their core, Claude Skills are reusable instruction sets that AI agents can automatically access when relevant to a conversation. You write a skill.md file with metadata and instructions, drop it into a .claude/skills/ directory, and Claude handles the rest.

The Three-Level Loading System

Skills use progressive disclosure across three distinct levels, each with its own context budget:

  1. Metadata (loaded at startup): Skill name (max 64 characters) and description (max 1,024 characters). Costs roughly ~100 tokens per skill—negligible overhead even with hundreds registered.

  2. Skill body (loaded on invocation): The full instruction set inside skill.md, up to ~5,000 tokens. Only enters context when the agent determines the skill is relevant.

  3. Referenced files (loaded on demand): Additional markdown files, folders, or scripts within the skill directory. No practical limit, loaded only when instructions reference them and the current task requires it.

A basic skill looks like this:

---
name: code-reviewer
description: Reviews pull requests following team conventions and security best practices
---

## Code Review Guidelines

When reviewing code:
1. Check for security vulnerabilities first
2. Verify adherence to team style guide in /docs/style.md
3. Ensure proper test coverage
4. Reference related Linear ticket in comments

Key insight: Skills are lazy-loaded context. The agent doesn't consume the full instruction set upfront—it progressively discloses information to itself, pulling in only what's needed for the current step.

Auto-Invocation Changes Everything

The real power isn't just storage—it's automatic relevance detection. When you start a conversation, Claude scans skill metadata to understand what expertise is available. When it detects a skill might be relevant, it loads the full body. If that body references additional files, it reads those too, but only on demand.

This means you could register 300 skills and still consume fewer tokens at startup than a typical MCP setup. In practice, most conversations invoke one or two skills while the rest remain invisible to the context window.


Skills vs MCP: Apples and Oranges (That Work Together)

Before we go further, let's clear up a common misconception. Skills aren't "better MCPs"—they're solving different problems:

  • MCP (Model Context Protocol) gives an agent capabilities—the "what" it can do
  • Skills give an agent expertise—the "how" to do it well

MCP is an open standard that lets any LLM interact with external applications. Before MCP, connecting M models to N tools required M × N custom integrations. MCP collapses that to M + N: each model implements the protocol once, each tool exposes it once, and they all interoperate.

Skills are "glorified prompts" (in the best possible way). They give agents expertise on how to approach tasks, what conventions to follow, when to use which tools, and how to structure output.

A Concrete Example

Say you connect GitHub's MCP server to your agent. MCP gives the agent the ability to create pull requests, list issues, and search repositories. But it doesn't tell the agent:

  • How your team structures PRs
  • That you always include a testing section
  • That you tag by change type
  • That you reference Linear tickets in titles

That's what a GitHub workflow skill provides—the playbook for using the tools effectively.

The real insight: MCP's eager loading pattern is the problem, not MCP itself. Skills prove that lazy-loading works. So why can't MCP tool access be lazy-loaded too?


Subagents: The Best of Both Worlds

Subagents are specialized child agents with isolated context windows and dedicated tool sets. Two properties make them powerful:

Isolated Context

A subagent starts with a clean context window, pre-loaded with its own system prompt and only assigned tools. Everything it reads, processes, and generates stays in its own context. The main agent only sees the final result.

Isolated Tools

Each subagent gets its own MCP servers and skills. The main agent doesn't know about (or pay for) tools it never directly uses.

The Token Economics Magic

Once a subagent finishes its task, its entire context gets discarded. Tool metadata, intermediate reasoning, API responses—all gone. Only the result flows back to the main agent.

Imagine a subagent that researches a library's API. It might:

  • Search across multiple documentation sources
  • Read through dozens of pages
  • Try several queries before finding the right answer
  • Generate thousands of tokens of intermediate work

You pay for the subagent's token usage, but all that intermediate work—the dead ends, irrelevant pages, failed searches—gets discarded. None of it compounds into the main agent's context, so every subsequent message stays clean and cheap.

Practical Architecture

You can design setups where MCP servers are only accessible through specific subagents, never loaded on the main agent at all. Instead of carrying ~32,000 tokens of tool metadata in every message, the main agent carries nearly zero.

When it needs to open a pull request, it spawns a GitHub subagent with:

  • GitHub MCP server (2,000 tokens of tool metadata)
  • Code review skill (500 tokens when invoked)
  • Team workflow skill (300 tokens when invoked)

The subagent does its work, returns a clean result, and disappears. The main conversation continues without any GitHub-related bloat.

Bottom line: Subagents let you have the MCP tool ecosystem without paying the context window tax on every message.


Building Your Skills Architecture

Here's how to start implementing this approach:

1. Audit Your Current Prompt Patterns

Look through your recent conversations and identify:

  • Prompts you've used multiple times
  • Instructions you find yourself repeating
  • Specific workflows or formats you always want

2. Create Your First Skill

Start simple with a high-frequency use case:

---
name: api-documentation
description: Creates comprehensive API documentation following OpenAPI standards
---

## Documentation Standards

When documenting APIs:
1. Use OpenAPI 3.0 format
2. Include example requests/responses
3. Document all error codes
4. Add rate limiting information
5. Reference authentication in /docs/auth-guide.md

3. Design Your Subagent Architecture

Group related tools into specialized subagents:

  • Code subagent: GitHub, Linear, code analysis tools
  • Infrastructure subagent: AWS, monitoring, deployment tools
  • Research subagent: Documentation search, web search, knowledge bases

4. Measure the Impact

Track these metrics before and after:

  • Average tokens per message
  • Response latency
  • Monthly API costs
  • Subjective quality of responses

The Bottom Line

The prompt engineering hamster wheel exists because we've been thinking about AI interactions wrong—as isolated conversations instead of reusable systems. Claude Skills and subagents represent a fundamental shift toward treating AI as infrastructure: modular, efficient, and composable.

Skills solve the reusability problem through lazy-loaded expertise. Subagents solve the context bloat problem through isolated, specialized workers. Together, they can cut your token costs by 80%+ while dramatically improving response quality and development velocity. The real question isn't whether to adopt this architecture—it's how quickly you can migrate your current workflows to take advantage of it.

Try This Now

  • 1Audit your recent Claude conversations and identify 3-5 prompt patterns you've used multiple times
  • 2Create your first skill using the `.claude/skills/` directory structure for your most common use case
  • 3Install and configure MCP servers like GitHub or Linear, then design subagent architecture to isolate their context
  • 4Measure your current average tokens per message using Claude's usage dashboard before implementing skills
  • 5Set up a code review skill that references your team's style guide and workflow documentation

How many Orkos does this deserve?

Rate this tutorial

Sources (1)

  • https://towardsdatascience.com/claude-skills-and-subagents-escaping-the-prompt-engineering-hamster-wheel/
← All L3 tutorialsBrowse all →