Agent Skills: How Addy Osmani Is Turning AI Coding Agents Into Senior Engineers

Agent Skills: How Addy Osmani Is Turning AI Coding Agents Into Senior Engineers

Addy Osmani just open-sourced the secret weapon: 20 production-grade skills that teach AI agents to code like seasoned engineers instead of unsupervised interns.


On this page

Your AI agent just shipped code without tests, a spec, or a review. It looked fine. You deployed it. Then production called at 3 AM. Sound familiar?

Addy Osmani, Director of Google Cloud AI, just solved that problem. Agent Skills GitHub stars is an open-source collection of 20 production-grade engineering practices packaged as Markdown files. Drop them into Claude Code, Cursor, Windsurf, Copilot, Codex, or any agent that accepts system prompts, and watch your AI pair programmer start behaving like a staff engineer instead of an eager intern.

The vibe coding problem: why smart AI writes bad code

Here’s the paradox: your AI agent is fluent in every programming language, knows design patterns that would make architects weep, and can explain Redux in 47 different ways. But it still ships code like someone who learned programming from Reddit threads at midnight.

The problem isn’t intelligence. It’s guardrails.

Without explicit discipline, AI agents follow the path of least resistance:

  • Spec? Skip it. “I’ll start coding and figure it out.”
  • Tests? Later. “Let me get the feature working first.”
  • Review? Nah. “Looks right to me. Ship it.”

The gap between AI that writes code and AI that ships products is discipline. That’s where Agent Skills enters.

Think of it like the difference between a talented but reckless driver and a professional chauffeur. Same hands on the wheel. The chauffeur follows procedures: mirrors checked, seatbelt first, speed limits respected. The reckless driver is faster until they crash.

Agent Skills teaches your agent to be the chauffeur.

What Agent Skills actually is

Agent Skills is a collection of 20 markdown files, each encoding a workflow that senior engineers use. MIT licensed, hosted on GitHub, and designed to work with any AI agent via system prompts.

Addy’s framing is perfect: “Skills encode the workflows, quality gates, and best practices that senior engineers use.”

The core structure revolves around 7 slash commands that mirror a development lifecycle:

1. /spec — Specification Before Code

Write a PRD (Product Requirements Document) before touching code. Forces the agent to ask the right questions: What’s the user problem? What are edge cases? What’s success?

# Spec Example
## Problem Statement
Users want to export data as CSV but the current export is missing metadata columns.

## Requirements
- Include timestamp of export
- Add data provenance (which API version?)
- Metadata in header row
- Handle 100k+ rows efficiently

## Success Criteria
- Export completes in <5s for 100k rows
- Metadata validated against schema
- No data loss or truncation

2. /plan — Decompose Into Atomic Tasks

Break the spec into tiny, verifiable tasks. Each task is small enough to code in 5-15 minutes. This forces incremental thinking instead of “build the whole thing at once.”

3. /build — One Vertical Slice at a Time

Implement features as thin slices that work end-to-end. Add CSV export, test it, ship it with a feature flag. Then add metadata support as the next slice. This is how senior engineers prevent “half-built” features from reaching production.

4. /test — Tests Are Proof

TDD mentality: reproduce the bug as a failing test first. Watch it fail. Fix the code. Watch it pass. The test becomes proof that you understood the problem and solved it.

# Test-first approach
def test_csv_export_includes_metadata():
    export_data = export_to_csv(sample_data)
    lines = export_data.split("\n")
    
    # First line should be metadata comment
    assert lines[0].startswith("# Exported:")
    assert "version=" in lines[0]
    
    # Verify data integrity
    assert len(lines) == len(sample_data) + 2  # header + metadata

5. /review — Code Health Across 5 Axes

Review checklist: Correctness (does it work?), Readability (can humans parse it?), Architecture (is it structured right?), Security (can it be attacked?), Performance (is it fast enough?). Most junior engineers miss 3-4 of these.

6. /code-simplify — Clarity Over Cleverness

Apply Chesterton’s Fence and the Rule of 500: don’t refactor code you don’t understand, and keep functions under 500 lines. Delete code that’s too clever. Simplicity is a feature.

7. /ship — Staged Rollouts Win

Deploy to 1% of users first. Monitor for 24 hours. Ramp to 10%. Monitor again. Staged rollouts catch 80% of bugs before they hit everyone.


The 20 skills: your agent’s playbook

The full suite includes:

  • spec-driven-development: Always spec first
  • incremental-implementation: Thin vertical slices
  • test-driven-development: Red-Green-Refactor
  • browser-testing-with-devtools: Chrome DevTools as MCP (agents get “eyes” in the browser)
  • security-and-hardening: OWASP Top 10, three-tier boundary model
  • context-engineering: Feed agents the right info at the right time
  • source-driven-development: Ground decisions in official docs
  • code-review-and-quality: 5-axis review, ~100 line change sizing
  • documentation-as-code: API docs, runbooks, inline comments
  • performance-profiling: Measure before optimizing
  • database-migrations: Safe schema changes without downtime
  • ci-cd-best-practices: Automated gates, no manual deploys
  • Plus 8 more covering observability, error handling, and team workflows.

Each skill includes:

  • Explanation of what it is
  • Anti-rationalization table (the killer feature)
  • Workflow steps you can follow
  • Code examples and templates
  • Common pitfalls to avoid

The anti-rationalization tables: why this matters

This is the secret sauce.

Each skill has a table that lists every excuse your brain (and your agent) will make to skip the practice, plus the concrete counter-argument:

ExcuseReality
”I’ll add tests later""Later” never comes. Tests catch 70% of bugs before they’re bugs. Add them first.
”This is just a small change""Small” is how most incidents start.
”We don’t need a spec, I know what to build""Knowing” and “knowing with aligned stakeholders” are different things. 40% of rework happens because specs were misaligned.
”Performance review can happen after launch""After” means production is degraded for real users. Profile now.
”This code is simple enough to understand without docs""Understand” = you, today. In 6 months, you won’t remember why you did this. In 2 years, someone else will hate you.

The tables exist because AI agents will rationalize shortcuts. They’re trained on every GitHub repository ever written, which means they’ve absorbed both good practices and common corner-cutting.

Agent Skills explicitly calls out the rationalizations and blocks them with facts.


How to use it: three entry points

Agent Skills works with any LLM-powered code editor. Here’s how to get started:

Setup for Claude Code

# 1. Clone the repository
git clone https://github.com/addyosmani/agent-skills.git ~/agent-skills

# 2. Copy skills to Claude Code config
cp -r ~/agent-skills/skills ~/.claude/skills

# 3. In your project, create a .claude-system-prompt file
cat > .claude-system-prompt << 'EOF'
You are an AI software engineer. Follow these skills:
- spec-driven-development
- incremental-implementation
- test-driven-development
- code-review-and-quality

Always read the relevant skill before starting a task.
EOF

Setup for Cursor

In Cursor settings, add to your system prompt:

Load skills from ~/agent-skills/skills/
Always follow:
1. Write spec before code
2. Break into atomic tasks
3. Test-first development
4. 5-axis code review
5. Staged rollouts for production

The Agent Personas

Pre-built personas turn any agent into a specialized reviewer:

@code-reviewer: Review this PR like a Staff Engineer would
@test-engineer: Validate the test coverage
@security-reviewer: Check for OWASP issues
@performance-engineer: Profile this code

Use them in chat:

@code-reviewer: review this function for correctness, readability, architecture
@security-reviewer: does this handle untrusted input safely?
@performance-engineer: will this scale to 1M requests/day?

A real example: building CSV export the right way

Let’s say your agent needs to add CSV export to your app. Without Agent Skills:

[Agent] I'll add CSV export. Give me 10 minutes.
[You] Okay, let's go!
[Agent] *writes code* Done!
[You] Where are the tests?
[Agent] Tests? I thought you wanted it fast.

With Agent Skills:

Step 1: /spec

Agent writes the PRD:

# CSV Export Feature Specification

## Problem
Users need to export filtered datasets as CSV with metadata.

## Functional Requirements
- Export supports up to 1M rows
- Includes data version and export timestamp in header
- Handle special characters and UTF-8 correctly
- Support column selection

## Non-Functional Requirements
- Export <5 seconds for 100k rows
- <100MB memory footprint
- Resumable for large exports

## Success Metrics
- Users can export and import without data loss
- P99 export time under 5 seconds

Step 2: /plan

Agent breaks it down:

  1. Create CSV writer class with streaming support
  2. Add unit tests for CSV generation
  3. Implement metadata header logic
  4. Add e2e test: export → import → verify
  5. Add performance test: 1M row export
  6. Integration test: UI → CSV → Database
  7. Staged rollout: 5% of users for 24h

Step 3: /build (one slice)

Agent implements step 1 only:

# csv_writer.py
class StreamingCSVWriter:
    def __init__(self, output_file, metadata=None):
        self.writer = csv.DictWriter(output_file, fieldnames=[...])
        self.metadata = metadata or {}
    
    def write_metadata_header(self):
        """Write metadata as comment row."""
        timestamp = datetime.now().isoformat()
        self.writer.writerow({"metadata": f"# {timestamp}"})
    
    def write_rows(self, data_iterator):
        """Stream rows without loading everything in memory."""
        for row in data_iterator:
            self.writer.writerow(row)

Step 4: /test

Agent writes tests before the full implementation:

def test_csv_export_metadata_header():
    output = StringIO()
    writer = StreamingCSVWriter(output, metadata={"version": "2.0"})
    writer.write_metadata_header()
    
    result = output.getvalue()
    assert "version: 2.0" in result
    assert result.startswith("#")

def test_csv_export_large_file():
    # Verify memory stays under 100MB for 1M rows
    output = StringIO()
    writer = StreamingCSVWriter(output)
    
    def generate_rows():
        for i in range(1_000_000):
            yield {"id": i, "name": f"user_{i}"}
    
    writer.write_rows(generate_rows())
    # Assert memory usage stayed low

Step 5: /review

Before shipping, the agent reviews against 5 axes:

  • Correctness: Does it preserve data integrity? ✓
  • Readability: Is the code easy to understand? ✓ (streaming pattern is clear)
  • Architecture: Is it structured right? ✓ (single responsibility: just CSV writing)
  • Security: Can it be attacked? ✓ (validates input, escapes quotes)
  • Performance: Is it fast enough? ✓ (streaming prevents memory blowup)

Step 6: /ship

Deploy with feature flag:

if settings.CSV_EXPORT_ENABLED:
    # Show CSV export button

Ship to 5% of users. Monitor for 24 hours. No issues? Ramp to 100%.


Why this matters for the industry

Agent Skills isn’t just a tool. It’s a bet on the future of AI-assisted development.

We’re at an inflection point. In 2024, AI coding felt like magic. By 2026, it’s table stakes. But there’s a gap: AI that writes code is now mainstream, but AI that ships products is still rare.

The difference? Discipline at scale.

Today, junior engineers onboard with a style guide and a code review checklist. By 2030, AI agents will onboard with Agent Skills. Not because they’re dumb and need babysitting, but because discipline is how you build reliable systems.

Addy’s insight is that you can’t just hope your agent does the right thing. You have to encode what the right thing is. Agent Skills is that encoding.

Companies that adopt this approach will:

  • Cut production incidents from 2-3x per quarter to near-zero
  • Reduce review time (agents follow the checklist before human review)
  • Improve code quality systematically (no more “vibe coding”)
  • Ship faster with more confidence

This is the software engineering equivalent of shifting from “good vibes” to “standard procedures.”


The takeaway: install today

Agent Skills is open source, MIT licensed, and maintained by Addy Osmani’s team at Google Cloud. No lock-in. No vendor dependency. Just 20 markdown files that encode 30 years of engineering best practices.

Get it here: addyosmani/agent-skills GitHub stars

Next time you spin up Claude Code or Cursor for a project:

  1. Clone Agent Skills
  2. Add the skills to your system prompt
  3. Use /spec before coding
  4. Write tests first
  5. Review against 5 axes
  6. Ship staged

Your AI pair programmer doesn’t need magic—it needs discipline. Agent Skills provides both.

Welcome to engineering that actually scales.

Thread

0
⌘/Ctrl+Enter to sendType / for commands · Tab to @mention