We're doing something like this internally. Our monorepo context files were much too big, so we built a progressive tree of fragments to load up for different tasks.
I am struck by how much these kinds of context documents resemble normal developer documentation, but actually useful and task-oriented. What was the barrier to creating these documents before?
Three theories on why this is so different:
1) The feedback loop was too long. If you wrote some docs, you might never learn if they were any good. If you did, it might be years later. And if you changed them, doing an A/B test was impractical. Now, you can write up a context markdown, ask Claude to do something, and iterate in minutes.
2) The tools can help build them. Building good docs was always hard. Especially if you take the time to include examples, urls, etc. that make the documentation truly useful. These tools reduce this cost.
3) Many programmers are egotists. Documentation that helps other people doesn't generate internal motivation. But documentation that allows you to better harness a computer minion to your will is attractive.
It is primarily a principal agent problem, with a hint of marshmallow test.
If you are a developer who is not writing the documents for consumption by AI, you are primarily writing documents for someone who is not you; you do not know what this person will need or if they will ever even look at them.
They may, of course, help you, but you may not understand that, have the time, or discipline.
If you are writing them because the AI using them will help you, you have a very strong and immediate incentive to document the necessary information. You also have the benefit of a short feedback loop.
Side note, thanks to the LLMs penchant of wiping out comments, I have a lot more docs these days and far fewer comments.
I think it's not at all a marshmellow test; quite the opposite - docs used to be written way, way in advance of their consumption. The problem that implies is twofold. Firstly, and less significantly, it's just not a great return on investment to spend tons of effort now to maybe help slightly in the far future.
But the real problem with docs is that for MOST usecases, the audience and context of the readers matter HUGELY. Most docs are bad because we can't predict those. People waste ridiculous amounts of time writing docs that nobody reads or nobody needs based on hypotheses about the future that turn out to be false.
And _that_ is completely different when you're writing context-window documents. These aren't really documents describing any codebase or context within which the codebase exists in some timeless fashion, they're better understood as part of a _current_ plan for action on a acute, real concern. They're battle-tested the way docs only rarely are. And as a bonus, sure, they're retainable and might help for the next problem too, but that's not why they work; they work because they're useful in an almost testable way right away.
The exceptions to this pattern kind of prove the rule - people for years have done better at documenting isolatable dependencies, i.e. libraries - precisely because those happen to sit at boundaries where it's both easier to make decent predictions about future usage, and often also because those docs might have far larger readership, so it's more worth it to take the risk of having an incorrect hypothesis about the future wasting effort - the cost/benefit is skewed towards the benefit by sheer numbers and the kind of code it is.
Having said that, the dust hasn't settled on the best way to distill context like this. It's be a mistake to overanalyze the current situation and conclude that documentation is certain to be the long-term answer - it's definitely helpful now, but it's certainly conceivable that more automated and structured representations might emerge, or in forms better suited for machine consumption that look a little more alien to us than conventional docs.
I know this is highly controversial, but I now leave the comments in. My theory is that the “probability space” the LLM is writing code in can’t help but write them, so if i leave them next LLM that reads the code will start in the same space. Maybe it’s too much, but currently I just want the code to be right and I’ve let go of the exact wording of comments/variables/types to move faster.
I think the code comments straight up just help understanding, whether human or AI.
There's a piece of common knowledge that NBA basketball players can all hit over 90% on free throws, if they shot underhand (granny style). But for pride reasons, they don't throw underhand. Shaq just shot 52%, even though it'd be free points if he could easily shoot better.
I suspect there's similar things in software engineering. I've seen plenty of comments on HN about "adding code comments like a junior software engineer" or similar sentiment. Sure, there's legitimate gripes about comments (like how they can be misleading if you update the code without changing the comment, etc), but I strongly suspect they increase comprehension of code overall.
Yeah this is really interesting. My money is 80% on number 1. The good developers I know (I'm not among them) are very practical and results driven. If they see something is useful for their goals, they use it. There's the time delay that you mentioned, and also that there's no direct feedback at all via misalignment. You'll probably get a scolding if your code breaks or you miss deadline, but if someone else complains about documentation to manager, that's one more degree of separation. If the manager doesn't directly feel the pain, he/she won't pay as much attention.
Edit- I'm basically repeating the poster who said it's principal agent problem.
onboarding devs won’t risk looking dumb to complain about the bad docs, the authors already have a mental model, and writing them out fully helped others at expense of job security.
When doling bad docs to a stupid robot, you only have yourself to blame for the bad docs. So I think it’s #2 + #3. The big change is replacability going from bad to desirable (replace yourself with agents before you are replaced with a cheaper seat)
Probably all the same reasons tech debt exists in the first place: business pressure, poor design, lack of resources. It used to be expensive to keep good documentation up to date as the code changes.
If documentation is a side effect to providing accurate and effective AI context, it's pretty logical there will be a significant incentive to maintain it.
> 2) The tools can help build them. Building good docs was always hard. Especially if you take the time to include examples, urls, etc. that make the documentation truly useful. These tools reduce this cost.
I would just be a little cautious about this, for a few reasons: (a) an expectation of lots of examples and such can increase the friction to capturing anything at all; (b) this can encourage AI slop bloat that is wrong; (c) bloat increases friction to keeping the key info up to date.
> 3) Many programmers are egotists. Documentation that helps other people doesn't generate internal motivation. But documentation that allows you to better harness a computer minion to your will is attractive.
There are also people who are conflicted by non-ideal trust environment: they genuinely want to help the team and do what's right for the business, but they don't want to sacrifice themselves if management doesn't understand and value what they're doing.
> Any other theories?
Another reason is that organizations often had high-friction and/or low-trust places to put documentation.
I always emphasize low-friction, trusted engineering docs. Making that happen in a small company seems to involve getting everyone to use a low-friction wiki (and in-code/repo doc, and when to use which), migrating all the doc that's strewn all over random SaaSes that people dropped it, showing people how it's done.
It must be seen as genuinely valuable to team-oriented, mission-oriented people.
Side note: It's very difficult to try to un-teach someone all the "work" skills they learned in school and many corporate jobs, where work is mostly directed by appearances. For example, the goal of a homework essay is to have what they deliver look like something that will get a good grade from the grader, but they don't care at all about the actual quality or value of it, and it has no other purpose. So, if you drop that person into a sprint with tasks assigned to them, the main goal will be to look good on what they think are the metrics, and they will have a hard time believing they're supposed to be thinking beyond that. (They might think it's just corporate platitudes that no one believes, like the Mission Statement, and nod their head until you go away.) And if they're told they're required to "document", the same people will go into that homework mode, and will love the generative-AI tools, and not reason about the quality/value/counterproductiveness of dumping that write-only output in whatever enterprise SaaS someone bought (decided in often another example of "work" done really understanding or caring what they were doing, but for appearances).
> migrating all the doc that's strewn all over random SaaSes that people dropped it
I would love to be able to share our internal "all the things that are wrong with our approach to documentation" wiki page. It's longer than you could possibly imagine, probably more than 15 years old at this point, and filled to the brim with sarcasm and despair. It's so fucking funny. The table of contents is several pages long.
> Over time the limitations of MCP have started to emerge. The most significant is in terms of token usage: GitHub’s official MCP on its own famously consumes tens of thousands of tokens of context, and once you’ve added a few more to that there’s precious little space left for the LLM to actually do useful work.
Supabase MCP really devours your context window. IIRC, it uses 8k for search_docs tool alone, just on load. If you actually use search_docs, it can return >30k tokens in a single reply. This destroys and entire chat session.
Workaround: I just noticed yesterday that Supabase MCP now allows you to choose which tools are available. You can turn off the docs, and other tools. [0]
If you are wondering why you should care, all models get dumber as the context length increases. This happens much faster than I had expected. [1]
> imagine a folder full of skills that covers tasks like the following:
> Where to get US census data from and how to understand its structure
Reminds me of my first time using Wolfram Alpha and got blown away by its ability to use actual structured tools to solve the problem, compared to normal search engine.
tbh wolfram alpha was the craziest thing ever. haven't done much research on how this was implemented back in the day but to achieve what they did for such complex mathematical problems without AI was kind of nuts
I think the difference now is that traditional software ultimately comes down to a long series of if/then statements (also the old AI's like Wolfram), whereas the new AI (mainly LLM's) have a fundamentally different approach.
Look into something like Prolog (~50 years old) to see how systems can be built from rules rather than it/else statements. It wasn't all imperative programming before LLMs.
If you mean that it all breaks down to if/else at some level then, yeah, but that goes for LLMs too. LLMs aren't the quantum leap people seem to think they are.
Yeah, the result is pretty cool. It's probably how it felt to eat pizza for the first time. People had been grinding grass seeds into flour, mixing with water and putting it on hot stones for millennia. Meanwhile others had been boiling fruits into pulp and figuring out how to make milk curdle in just the right way. Bring all of that together and, boom, you have the most popular food in the world.
We're still at the stage of eating pizza for the first time. It'll take a little while to remember that you can do other things with bread and wheat, or even other foods entirely.
Would really like something selfhosted that does the basic Wolfram Alpha math things.
Doesn't need the craziest math capability but standard symbolic math stuff like expression reduction, differentiation and integration of common equations, plotting, unit wrangling.
All with an easy to use text interface that doesn't require learning.
I used it a lot for calc as it would show you how they got the answer if I remember right, also liked how it understands symbols which ibv but cool to paste an integral sign in there
I do think the big story here is how hyperfocused and path-dependent people got on MCP, when the actually-interesting thing is simply "tool calls". Tool calls are incredibly interesting and useful. MCP is just one means to that end, and not one of the better ones.
I think MCP's huge adoption was mainly due to its timing.
Tool calling was a thing before MCP, but the models weren't very good at it. MCP almost exactly coincided with the models getting good enough at tool calling for it to be interesting.
So yeah, I agree - most of the MCP excitement was people learning that LLMs can call tools to interact with other systems.
Tools are literally function calls with extra steps. MCPs are interpreters of those function calls.
Same stuff, different name - only thing that's changed is that Anthropic got people to agree on RPC protocol.
It's not like it's a new idea, either. MCP isn't much different from SOAP or DCOM - but it works where the older approaches didn't, because LLMs are able to understand API definitions and natural-language documentation, and then map between those APIs (and user input) on the fly.
No, tool calls are just one of many MCP parts. People thinking MCP = SOAP or DCOM or JSON-RPC or OpenAPI didn't stop 20 minutes to read and understand MCP.
Tool calls is 20% of MCP, at maximum. And a good amount of it is dynamically generating the tool list exposed to LLMs. But lots of people here think MCP === give the model 50 tools to choose from
What else is there? I know about resources and prompts but I've seen almost no evidence of people actually using them, as far as I can tell tools are 90% of the usage of MCP, if not more.
It is funny most people discussing here does not understand MCP at all. Besides tools, MCP has resources, prompts, sampling, elicitation, roots and each one of them is useful when creating apps connected to LLMs. MCP is not only about MCP Servers, the host/client part is as important as the servers/tools. For example, nowadays most LLM clients are chatbots, but an MCP client could be a chess game or a project management app.
> I know about resources and prompts but I've seen almost no evidence of people actually using them
these are features that MCP clients should implement and unfortunately, most of them still don't. The same for elicitation and sampling. Prompts, for example, are mostly useful when you use sampling, then you can create an agent from an MCP server.
What can I do with MCP that I can't do with the function calling interface in the OpenAI Responses API? Besides, obviously, grafting function calls into agents I didn't write; we all understand that's the value prop of MCP. But you're suggesting it's more than that. Fill in the blanks for us.
MCPs have a larger impact beyond the terminal - you can use it with ChatGPT, Claude Web, n8n, LibreChat, and it comes with considerations for auth, resources, and now even UI (e.g., apps-sdk from OpenAI is on MCP).
If we're considering primarily coding workflows and CLI-based agents like Claude Code, I think it's true that CLI tools can provide a ton of value. But once we go beyond that to other roles - e.g., CRM work, sales, support, operations, finance; MCP-based tools are going to have a better form factor.
I think Skills go hand-in-hand with MCPs, it's not a competition between the two and they have different purposes.
I am interested though, when the python code in Skills can call MCPs directly via the interpreter... that is the big unlock (something we have tried and found to work really well).
Yeah, the biggest advantages MCP has over terminal tooling is that MCP works without needing a full blown sandboxed Linux style environment - and MCP can also work with much less capable models.
You can drive one or two MCPs off a model that happily runs on a laptop (or even a phone). I wouldn't trust those models to go read a file and then successfully make a bunch of curl requests!
MCPs are overhyped and have limited value in my opinion. About 95% of the MCP servers out there are useless and can be replaced with a simple tool call.
This is a very obvious statement, but good MCP servers can be really good, and bad MCP servers can actively make things significantly worse. The problem is that most MCP servers are in the latter category.
As is often the case, every product team is told that MCP is the hot new thing and they have to create an MCP server for their customers. And I've seen that customers do indeed ask for these things, because they all have initiatives to utilize more AI. The customers don't know what they want, just that it should be AI. The product teams know they need AI, but don't see any meaningful ways to bring it into the product. But then MCP falls on their laps as a quick way to say "we're an AI product" without actually having to become an AI product.
There's some extra irony here: many of those product teams don't realize that AI is not something they can have within their product. If something like MCP is a good fit for them, even a little, then their product is actually a feature of the AI.
Agentic LLMs are, in a way, an attempt to commoditize entire service classes, across the board, all at once.
Personally, I welcome it. I keep saying that a lot of successful SaaS products would be much more useful and ergonomic for end users if, instead of webshit SPA, they were distributed as Excel sheets. To that I will now add: there's a lot more web services that I'd prefer be tool calls for LLMs.
Search engines have already been turned into features (why ask Google when o3 can ask it for me), but that's just an obvious case. E-mails, e-commerce, shopping, coding, creating digital art, planning, managing projects and organizations, analyzing data and trends - all those are in-scope too; everything I can imagine asking someone else to do for me is meant to eventually become a set of tool calls.
Or in short: I don't want AI in your product - I want AI of my choice to use your product for me, so I don't have to deal with your bullshit.
Thank you. This is beautiful said. I will also add that I don’t think chat bots are the final product, so it leaves the open question which product is the last one not being commoditized.
Yes, and MCPs also only work as long as you trust the provider. MCP relies on honesty in from the server. In reality, we know Uber and folks will prompt engineer like hell to try to convince any LLM that it is the best option for any kind of service.
There’s a fundamental misalignment of incentives between publishers and consumers of MCP.
- Tool calling LLM combined w/ structured output is easier to implement as MCP than CLI for complex interactions IMO.
- It is more natural to hold state between tool calls in an MCP server than with a CLI.
When I read the OT, I initially wondered if I indeed bought into the hype. But then I realized that the small demo I built recently to learn about MCP (https://github.com/cournape/text2synth) would have been more difficult to build as a cli. And I think the demo is representative of neat usages of MCP.
- bundled instructions, covering complex iteractions ("use the id from the search here to retrieve a record") for non-standard tools
- custom MCPs, the ones that are firewalled from the internet, for your business apis that no model knows about
- centralized MCP services, http/sse transport. Give the entire team one endpoint (ie web search), control the team's official AI tooling, no api-key proliferation
Now, these trivial `npx ls-mcp` stdio ones, "ls files in any folder" MCPs all over the web are complete context-stuffing bullshit.
MCP servers seem to be a hackers delight. So many poorly configured and hastily deployed instances. Businesses have removed all the normal deployment guardrails!
I've been able to build the equivalent of skills with a few markdown files. I need to remind my agent every so often to use a skill but usually once per session at most.
I don't get what's so special about Claude doing this?
Part of it is that they gave a name to a useful pattern that people had already been discovering independently. Names are important, because they mean we can start having higher quality conversations about the pattern.
Anthropic also realized that this pattern solves one of the persistent problems with coding agents: context pollution. You need to stuff as little material as possible into the context to enable the tool to get things done. AGENTS.md and MCP both put too much stuff in there - the skills pattern is a much better fit.
I think you're overly enthusiastic about what's going on here (which is surprising because you've seen the trend in AI seems to be re-inventing the wheel every other year...)
MCP was conceptually quite complicated, and a pretty big lift in terms of implementation for both servers and clients.
Skills are conceptially trivial, and implementing them is easy... provided you have a full Linux-style sandbox environment up and running already. That's a big dependency but it's also an astonishingly powerful way to use LLMs based on my past 6 months of exploration.
I remain afraid of prompt injection. If I'm telling Claude Code to retrieve data from issues in public repos there's a risk someone might have left a comment that causes it to steal API keys or delete files or similar.
I'm also worried about Claude Code making a mistake and doing something like deleting stuff that I didn't want deleted from folders outside of my direct project.
Strong disagreement on the helpfulness of the name- if anything calling a context file a skill is really misleading. It evokes something like a LoRA or pluggable modality. Skill is the wrong name imo
IMO LoRAs are no different from context tokens. In fact, before LoRAs tuned prompt vectors were a popular adapter architecture. Conceptually, the only difference is that prompt adapters only interact with other tokens through the attention mechanism while LoRAs allow you to directly modify any linear layer in the model. Essentially, you can think of your KV cache as dynamically generated model weights. Moreover, I can't find the paper, but there is some evidence that in-context learning is powered by some version of gradient descent inside the model.
I think skill is the perfect name for this. You provide the LLM with a new skill by telling it how to do a thing and providing supporting scripts to help it do that thing.
Yup! I fully agree. It also taps into the ability of LLMs to write code given good prompts. All you need is for the LLM to recognize that it needs something, fetch it into the context, and write exactly the code that is needed in the current combination of skill + previous context.
Subagents are mainly a token context optimization hack. They're a way for Claude Code to run a bunch of extra tools calls (e.g. to investigate the source of a bug) without consuming many tokens in the parent agent loop - the subagent gets its own loop, can use up to ~240,000 tokens exploring a problem and can then reply back up to the parent agent with a short description of what it did or what it figured out.
A subagent might use one or more skills as part of running.
A skill might advise Claude Code on how best to use subagents to solve a problem.
It's baffling to me. I was already making API calls and embedding context and various instructions precisely using backticks with "md". Is this really all this is? What am I missing? I don't even understand how this "feature" merits a press release from Anthropic, let alone a blog post extolling it.
1. By giving this name a pattern, people can have higher level conversations about it.
2. There is a small amount of new software here. Claude Code and https://claude.ai/ both now scan their skills/ folders on startup and extract a short piece of metadata about each skill from the YAML at the top of those markdown files. They then know that if the user e.g. says they want to create a PDF they should "cat skills/pdf/skill.md" first before proceeding with the task.
I think the pattern itself is really neat, because it's an acknowledgement that a great way to give an LLM system additional "skills" is to describe them in a markdown file packaged alongside some relevant scripts.
It's also pleasantly vendor-neutral: other tools like Codex CLI can use these skills already (just tell them to go read skills/pdfs/skill.md and follow those instructions) and I expect they may well add formal support in the future, if this takes off as I expect it will.
I have been independently thinking about a lot of this for some time now. So this is so exciting for me. Concretizing _skills_ allows, as you said, a common pattern for people to rally around. Like you, I have been going dizzy about its possibilities, specially when you realize that a single agent can be modified with skills from all its users. Imagine an app with just enough backbone to support any kind of skill. From here, different groups of users can collaborate and share skills with each other to customize it exactly to their specific niche skills. You could design Reddit like community moderation techniques to decide which skills get accepted into the common repo, which ones to prioritize, how to filter the duplicates, etc.
I was puzzled by the announcement and remain puzzled after this blog post. I thought everyone knew you could keep use case specific context files handy.
It feels like it's taking a solved problem and formalizing it, with a bit of automation. I've used MCPs that were just fancy document search, and this should replace those.
I'm a bit unclear what's different here from how vibe coders already work?
Pretty early on folks recognized that most MCPs can just be CLI commands, and a markdown file is fine for describing them. So Claude Code users have markdown files of CLI calls and mini tutorials on how to do things. The 'how to do things' part seems to be what we're now calling skills... Which we're still writing in markdown and using from Claude.
Is the new thing that Claude will match & add them to your context automatically vs you call them manually? And that's a breakthrough because there's some emergent behavior?
I think skills are mainly just a formalization of the markdown plus CLI patterns that people have been using already.
The only material difference with skills is that Claude knows to scan them for YAML descriptions on startup, which means it can trigger them by itself more easily.
Right, the 'knowing' is where I think the interesting thing is today for their evolution
more mature claude.md files already typically index into other files, including guidance which to preload vs lazy load. However, in practice, claude forgets quite easily, so that pattern is janky in practice. A structured mechanism helps claude guarantee less forgetting.
Forward looking, from an automation perspective of autonomous learning, this also makes it more accessible to talk about GEPA-for-everyone to maintain & generate these. We've been playing with similar flows in louie.ai, and came to a similar "just make it folders full of markdown with some learning automation options."
I was guessing that was what was going on here, but the writeup felt like maybe more was being said :) (And thank you for continuing to write!)
These are completely different things. MCP is also about consuming external services handling oauth and all of that. Skills are effectively cli tools + prompts. Completely different application so they cannot be compared easily like that.
BTW, before even MCP was a thing we invented our own system that is called Skillset. Turns out now it is sort of the best parts of both MCPs and Skills.
This is a fairly negative comment, but putting it out there to see if other people are feeling the same thing
If you told the median user of these services to set one of these up I think they would (correctly) look at you like you had two heads.
People want to log in to an account, tell the thing to do something, and the system figures out the rest.
MCP, Apps, Skills, Gems - all this stuff seems to be tackling the wrong problem. It reminds me of those youtube channels that every 6 months say "This new programming language, framework, database, etc is the killer one", they make some todo app, then they post the same video with a new language completely forgetting they've done this already 6 times.
There is a lot of surface level iteration, but deep problems aren't being solved. Something in tech went very wrong at some point, and as soon as money men flood the field we get announcments like this. push out the next release, get my promo, jump to the next shiny tech company leaving nothing in their wake.
There is no problem to solve. These days, solutions come in a package which includes the problems they intend to solve. You open the package. Now you have a problem that jumped out of the package and starts staring at you. The solution comes out of the package and chases the problem around the room.
You are now technologically a more progressed human.
This is where GP is wrong, I think. The problem are being solved, for now, because the businesses are still too excited about the whole AI thing to notice it's not in their interest, and properly consolidate against it.
And the problem being solved is, LLMs are universal interfaces. They can understand[0] what I mean, and they understand what those various "solutions" are, and they can map between them and myself on the fly. They abstract services away.
The businesses will eventually remember that the whole point of marketing is to prevent exactly that from happening.
--
[0] - To a degree, and conditioned on what one considers "understanding", but still - it's the first kind of computer systems that can do this, becoming a viable alternative to asking a human.
I wish this was wrong, but it really isn't. To contrast though, I would argue that is part of evolution? We just want to do things faster or better? Smartphones solved no problems, but they ushered the digital millenium.
I think most new technologies helped to increase the expectations about what you can do. But overall work did not get reduced. It didn't give me more free time to go fishing, or bird-watching. On the other hand, I got an irreversible dependency on these things. Otherwise I'm are no longer compatible with the World 2.0
> MCP, Apps, Skills, Gems - all this stuff seems to be tackling the wrong problem
My fairly negative take on all of this has been that we’re writing more docs, creating more apis and generally doing a lot of work to make the AI work, that would’ve yielded the same results if we did it for people in the first place. Half my life has been spent trying to debug issues in complex systems that do not have those available.
This is true, but the reason the economics have inverted is that we can pay these new "people" <$20 for the human equivalent of ~300 hours worth of non-stop typing.
Correct. And we know the AI will read the docs whereas people usually ignore 99% of docs so it just feels like a bad use of time sometimes, unfortunately.
-ish; while you can be fairly certain it reads the docs, whether they’ve been used/synthesized is just about unknowable. The output usually looks great, but it’s up to us to ensure its accuracy; we can make it better in aggregate by tweaking dials and switches. To mitigate this we’re asking AIs to create plans and todo lists first, which adds some rigor but again we can’t know if the lists were comprehensive or even correct. It does seem to make the output better. And if the human doesnt read the docs, they can be beaten!
That is not true at all. The economics you’re seeing right now are akin to Uber handing out $5 airport pickups to kill the taxi industry. And even then the models are nowhere as cheap as <$20 for ~300 hours of human work.
40 words per minute is equivalent to about 50 tokens a minute.
I just took GPT-5, output is $10 per million tokens. Let's double the cost to account for input tokens which are ($1.25 per million / $0.125 if cached).
For 1 million tokens it would take a 40 wpm typist.. around 20K minutes to output that $20 of worth of text. That is just typing. So about 300 hours of non-stop effort for that $20.
So even if you say.. oh.. the real price is $100 not $20. The value changes are still shattering to the previous economic dynamics.
Then layer in that also as part of that value, the "typist" is also more skilled than the average working person in linguistics, software engineering, etc. Then that value is further magnified.
This is why I say we have only begun to barely see the disruption this will cause. Even if the models don't get better or cheaper, the potential impact is hard to grasp.
If writing a good document and a strong API had to happen anyway, and now you can write just that and the rest will take care of itself, we may actually have progressed. Plus the documents would then have to be there, instead of skipped like today.
The counter-argument is that code is the only way to concisely and unambiguously express how everything should work.
I am also struck by how much these kinds of context documents resemble normal developer documentation, but actually good. What was the barrier to creating these documents before?
They're much more useful when an LLM stands between them and users - because LLMs can (re)process much more of them, and much faster, than any human could ever hope to.
One way (and one use case) of looking at it is, LLM agents with access ("tools") to semantic search[0] are basically a search engine that understands the text it's searching through... and then can do a hundred different things with it. I found myself writing better notes at work for this very reason - because I know the LLM can see them, and can do anything from surfacing obscure insights from the past, to writing code to solve an issue I documented earlier.
It makes notes no longer be write-only.
--
[0] - Which, incidentally, is itself enabled by LLM embeddings.
What if the great boon of AI is to get us to do all the thinking and writing we should have been doing all along? What if the next group of technologists to end up on top are... the technical writers?
Haha, just kidding you tech bros, AI's still for you, and this time you'll get to shove the nerds into a locker for sure. ;-)
It might not be that wrong. After all, programming languages are a way to communicate with the machine. In the same way we are not doing binary manually, we might simply not have to do programming too. I think software architecture is likely to be what it should be: the most important part of every piece of software.
You’ve got it wrong. The machine is fine with a bit soup and doesn’t care if it’s provided with punch card or python.
Programming was always a tool for humans. It’s a formal “notation” for describing solutions that can be computed. We don’t do well with bit soup. So we put a lot of deterministic translations between that and the notation that we’re good with.
Not having to do programming would be like not having to write sheet music because we can drop a cat from a specific height onto a grand piano and have the correct chord come out. Code is ideas precisely formulated while prompts are half formed wishes and prayers.
This is actually my theory of the future. Basically, the ability to multiply your own effectiveness is now directly dependent on your ability to express ideas in simple plain English very quickly and precisely.
I’m attracted to this theory in part because it applies to me. I’m a below average coder (mostly due to inability to focus on it full time) and I’m exceptionally good at clear technical writing, having made a living off it much of my life.
The present moment has been utterly life changing.
What is a "deep problem" and what was the cadence with which we addressed these kinds of "deep problems" prior to 2023, when ChatGPT first went mainstream?
I've been a Usenix reviewer twice, once as a program chair (I think that's what they call the co-leaders of a PC?). So this doesn't clarify anything for me.
To out it more clearly. You take a domain (like OS security, perfomance, and administration) and you’ll find those kinds of problems that people feel important to share solutions with each other. Solutions that are not trivially found. Findings you can be proud your name is attached with.
And then you have something like the LLM craze where while it’s new, it’s not improving any part of the problem it’s supposed to solve, but instead is creating new ones. People are creating imperfect solutions to those new problems, forgetting the main problem in the process. It’s all vapourware. Even something like a new linter for C is more of a solution to programmer’s productivity than these “skills”
OK: I think I have decisively established my Usenix bona fides here, and I'm repeating my original question: what is the cadence at which we resolved "deep question" prior to the era of LLMs? (It began in 2023.)
>they make some todo app, then they post the same video with a new language completely forgetting they've done this already 6 times
I don't see how this is bad. Technology makes iterative, marginal improvements over time. Someone may make a video tomorrow claiming a great new frontend framework, even though they made that exact video about Nextjs, or React before that, or Angular, or JQuery, or PHP, or HTML.
>Something in tech went very wrong at some point, and as soon as money men flood the field we get announcments like this
If it weren't for the massive money being poured into AI, we'd be stuck with GPT-3 and Claude 2. Sure, they release some duds in the tooling department (although I think Skills are good, actually) but it's hardly worthy of this systemic rot diagnosis you've given.
I do not feel the same way. This looks easy to use and useful. I don’t think every problem needs to be a ‘deep problem’. There’s so many practical steps to get to
> People want to log in to an account, tell the thing to do something, and the system figures out the rest.
At a glance, this seems to be a practical approach to building up a personalized prompting stack based on the things I commonly do.
> The problems are context dilution, tool usage, and awareness outside of the llm model.
These is accidental complexity. You’ve already decided on a method and instead of solving the main problem, you are solving the problems associated with the method. Like deciding to go in space with a car and trying to strap a rocket onto it.
> If that's true, why do leadership, VCs, and eventually either the acquiring company or the public markets keep falling for it then?
If I had to guess, it would be because greed is a very powerful motivator.
> As the old adage goes: "Don't hate the player, hate the game?"
I know this advice is a realistic way of getting ahead in the world, but it's very disheartening and long term damaging. Like eating junk food every day of your life.
These are all tools for advanced users of LLMs, I've already built a couple MCPs for clients... you might not have a use for them... but there are niches already getting a lot out of them
"Skills work through progressive disclosure—Claude determines which Skills are relevant and loads the information it needs to complete that task, helping to prevent context window overload."
So yeah, I guess you're right. Instead of one humongous AGENTS.md, just packaging small relevant pieces together with simple tools.
So far I am in the skeptic camp on this. I don't see it adding a lot of value to my current claude code workflow which already includes specialized agents and a custom mcp to search indexed mkdocs sites that effectively cover the kinds of things I would include in these skills file. Maybe it winds up being a simpler, more organized way to do some of this, but I am not particularly excited right now.
I also think "skills" is a bad name. I guess its a reference to the fact that it can run scripts you provide, but the announcement really seems to be more about the hierarchical docs. It's really more like a selective context loading system than a "skill".
I'm inclined to agree. I've read through the Skill docs and it looks like something I've been doing all along - though I informally referred to it as the "Table of Contents" approach.
Over time I would systematically create separate specialized docs around certain topics and link them in my CLAUDE.md file but noticeably without using the "@" symbol which to my understanding always causes CLAUDE to ingest the linked files resulting in unnecessarily bloating your prompt context.
So my CLAUDE md file would have a header section like this:
# Documentation References
- When adding CSS, refer to: docs/ADDING_CSS.md
- When adding or incorporating images, refer to: docs/ADDING_IMAGES.md
- When persisting data for the user, refer to: docs/STORAGE_MANAGER.md
- When adding logging information, refer to: docs/LOGGER.md
It seems like this is less of a breakthrough and more an iterative improvement towards formalizing this process from a organizational perspective.
How consistently do you find that Claude Code follows your documentation references? Like you work on a CSS feature and it goes to ADDING_CSS.md? I run into issues where it sometimes skips my imperative instructions.
It's funny you mention this - for a while I was concerned that CC wasn't fetching the appropriate documentation related to the task at hand (coincidentally this was around Aug/Sept when Claude had some serious degradation issues [1]), so I started adding the following to the beginning of each specialized doc file:
When this documentation is read, please output "** LOGGING DOCS READ **" to the console.
These days I do find that the TOC approach works pretty well though I'll probably swap them over to Skills to see if the official equivalent works better.
That's exactly what it is - formalizing and creating a standard induces efficiency. Along with things like AGENTS.md, it's all about standardization.
What bugs me: if we're optimizing for LLM efficiency, we should use structured schemas like JSON. I understand the thinking about Markdown being a happy medium between human/computer understanding but Markdown is non-deterministic for parsing. Highly structured data would be more reliable for programmatic consumption while still being readable.
> and a custom mcp to search indexed mkdocs sites that effectively cover the kinds of things I would include in these skills file
Search and this document base pattern are different. In search the model uses a keyword to retrieve results, here the model starts from a map of information, and navigates it. This means it could potentially keep context better, because search tools have issues with information fragmentation and not seeing the big picture.
I'd be really interested in what you mean. Are the any studies that quantify this difference in model performance when using JSON or XML? What could be a good intuition for why there might be a big difference? If XML is better than JSON for LLMs, why isn't everyone and the grandma recommending me to use XML instead of JSON? Why is Google Gemini API offering structured output only with JSON schema instead of XML schema?
Note that they don't actually suggest that the XML needs to be VALID!
My guess was that JSON requires more characters to be escaped than XML-ish syntax does, plus matching opening and closing tags makes it a little easier for the LLM not to lose track of which string corresponds to which key.
(1) JSON requires lots of escape characters that mangle the strings + hex escapes and (2) it's much easier for model attention to track when a semantic block begins and ends when it's wrapped by the name of that section
<instructions>
...
...
</instructions>
can be much easier than
{
"instructions": "..\n...\n"
}
especially when there are newlines, quotes and unicode
Thanks for the reply, that part about the models attention is pretty interesting!
I would suspect that a single attention layer won't be able to figure out to which token a token for an opening bracket should attend the most to. Think of
{"x": {y: 1}} so with only one layer of attention, can the token for the first opening bracket successfully attend to exactly the matching closing bracket?
I wonder if RNNs work better with JSON or XML. Or maybe they are just fine with both of them because a RNN can have some stack-like internal state that can match brackets?
Probably, it would be a really cool research direction to measure how well Transformer-Mamba hybrid models like Jamba perform on structured input/output formats like JSON and XML and compare them. For the LLM era, I could only find papers that do this evaluation with transformer-based LLMs. Damn, I'd love to work at a place that does this kind of research, but guess I'm stuck with my current boring job now :D Born to do cutting-edge research, forced to write CRUD apps with some "AI sprinkled in". Anyone hiring here?
Tangent: the fact XHTML didn't gain traction is a mistake we've been paying off for decades.
Browser engines could've been simpler; web development tools could've been more robust and powerful much earlier; we would be able to rely on XSLT and invent other ways of processing and consuming web content; we would have proper XHTML modules, instead of the half-baked Web Components we have today. Etc.
Instead, we got standards built on poorly specified conventions, and we still have to rely on 3rd-party frameworks to build anything beyond a toy web site.
Stricter web documents wouldn't have fixed all our problems, but they would have certainly made a big impact for the better.
We've just started to roll out our MCP Servers and if Anthropic and the community has already moved on, we'll wait till all this churn subsides till switching over next time.
You don't need MCP if you can instead drop in a skill markdown file that says "to access the GitHub API, use curl against api.github.com and send the GITHUB_API_KEY environment variable in the authorization header. Here are some examples. Consult github-api.md for more."
A difference/advantage of MCP is that it can be completely server-side. Which means that an average person can "install" MCP tools into their desktop or Web app by pointing it to a remote MCP server. This person doesn't want to install and manage skills files locally. And they definitely don't want to run python scripts locally or run a sandbox vm.
That's going to be a lot less efficient context-wise and computing-wise than using either a purpose-built MCP or skill based around executing a script.
Am I the only person left that is still impressed that we have a natural language understanding system so good that its own tooling and additions are natural language?
I still can't believe we can tell a computer to "use playwright Python to test this new feature page" and it will figure it out successfully most of the time!
Impressing, but I can't believe we went from fixing bugs to coffee-grounds-divination-prompt-guessing-and-tweaking when things don't actually go well /s
Is any of it even churn? I feel like almost everything is still relevant, basically everything was a separate card which they're using to build up a house. Even RAG still has it's place
Now wherever they're able to convert that house of cards into a solid foundation or it eventually spectacularly falls over will have to be seen over the next decade.
One idea I'm toying with right now: My company has some internal nodejs libraries, some of which are quite large (think: codegen API library, thousands of types). When a new version is published: task claude code to distill the entire library down to a skill, then bundle that in with the npm package. Then when the package is installed, postinstall a script that copies the skill from the dependency to the parent project that's doing the installing.
I'm still getting this set up, so I'm not sure yet if it'll lead to better outcomes. I'd say, there's reason to believe it might, but there's also reasons to believe it won't.
If the library is like 50,000 lines of code long, thousands of types, hundreds of helper functions, CC could just look into the node_modules folder and bundle all of this into its context. But, this might not be feasible, or be expensive; so the SKILL.md distills things down to help it get a high level understanding faster.
However, the flip side of that is: What if its too general? What if CC needs specific implementation details about one specific function? Is this actually better than CC engaging a two-step process of (1) looking at node_modules/my-lib/index.ts or README.md, get that high level understanding, then (2) look at node_modules/my-lib/specificFunction.ts to get the specific intel it needs? What value did the SKILL.md actually convey?
My feeling is that this concept of "human-specified context-specific skills" would convey the most value in situations where the model itself is constrained. E.g. you're working with a smaller open source model that doesn't have as comprehensive intrinsic knowledge of some library's surface, or doesn't have the context windows of larger models. But, for the larger models... its probably just better to rely on in-built model knowledge and code-as-context.
Maybe the skill could have references to the code. Like if everything else fails it can look at the implementation.
Intuitively it feels like if you need to look at the implementation to understand the library then the library is probably not well documented/structured.
I think the ability to look into the code should exist but shouldn't be necessary for the majority of use cases
These skills also rely on tools, having a standard way to add tools to an agent is good, otherwise each agent has its own siloed tools.
But also, I remember MCP having support for resources no? These skills are just context (though I guess it can include executable scripts to help, but the article said most skills are just an instruction markdown).
So you could already have an MCP expose skills as resources, and you could already have the model automatically decide to include a resource based on the resource description.
Now I understand to add user created resources is pretty annoying, and maybe it's not great for people to easily exchange themselves resources. But you assume that Slack would make the best context to generate Slack gifs, and then expose that as a resource from their MCP along with a prompt template and some tools to help or to add the gif to your slack as emojis or what not.
You could even add Skills specifically to MCP, to that you can expose a combination of context resources and scripts or something.
That said, I agree that the overabundance for tools as MCP is not that good, some tools are so powerful, they can cover 90% of all other tool use cases. Bash tool can do so many things. A generic web browsing tool as well. That's been the problem with MCP as tools.
Skills appear to be a good technique as a user, and I actually already did similar things. I like formalizing it, and it's nice that Claude Code now automatically scans and includes their description header for the model to know it can load the rest. That's the exciting part.
But I do feel for the more general public, MCP resources + prompts + tools are a better avenue.
I'm a little confused about the relationship of Skills and just plain tools. It seems like a lot of skills might just be tools. Or, they might rely on calling sets of tools with some instructions.
But aren't the tool definitions and skill definitions in different places? How do you express the dependency? Can skills say they require command line access, python, tool A, and tool B, and when you load the skill it sets those as available tool calls?
MCP gives me early days gRPC vibes - when the protocol felt heavy and the toolings had many sharp edges. Even today, after many rounds of improvements, people often eschew gRPC and Protobuf.
Similarly, my experience writing and working with MCPs has been quite underwhelming. It takes too long to write them and the workflow is kludgy. I hope Skills get adopted by other model vendors, as it feels like a much lighter way to save and checkout my prompts.
What do you find difficult about writing MCPs? I havent worked much with them but it seems easy enough. I made an MCP that integrates with jenkins so I can deploy code from claude (not totally useful cause can just make a few cli commands), but still took like 10 mins and works flawlessly.
But I suppose yeah, why not just write clis and have an llm call them
Writing one off simple MCPs are quite easy but once you need to manage a fleet of them, it gets hairy.
- Writing manifests and schemas by hand takes too long for small or iterative tools. Even minor schema changes often require re-registration or manual syncing. There’s no good “just run this script and expose it” path yet.
- Running and testing an MCP locally is awkward. You don’t get fast iteration loops or rich error messages. When something fails, the debugging surface is too opaque - you end up guessing what part broke (manifest, transport, or tool logic).
- There’s no consistent registry, versioning, or discovery story. Sharing or updating MCPs across environments feels ad hoc, and you often have to wire everything manually each time.
With Skills you need none of them - instruct to invoke a tool and be done with it.
> - There’s no consistent registry, versioning, or discovery story. Sharing or updating MCPs across environments feels ad hoc, and you often have to wire everything manually each time.
Everything is new so we are all building it in real time. This used to be the most fun times for a developer: new tech, everybody excited, lots of new startups taking advantage of new platforms/protocols.
Stumbled on to a similar concept a few years ago, I think there was a paper around some kind of proto code skills for Minecraft around then... awesome to see Anthropic pursue this direction
That doesn't mean you can't say one is a bigger deal than the other.
If I learned how to say "hello" in French today and also found out I have stage 4 brain cancer, they are completely different things but one is a bigger deal than the other.
Reposting a comment I made when it was posted 10 hours ago:
As someone who is looking into MCP right now, I'd love to hear what folks with experience in both of these areas think.
My first impressions are that MCP has some advantages:
- around for longer and has some momentum
- doesn't require a dev envt on the computer to be effective
- cross-vendor support
- more sophistication for complex use cases (enterprise permissions can be layered on because of OAuth support)
- multiple transport layers gives flexibility
Skills seems to have advantages too, of course:
- simpler
- easier to iterate
- less context used
I think if the other vendors follow along with skills, and we expect every computer to have access to a development environment, skills could win the day. HTML won over XML and REST won over SOAP, so simple often wins.
But the biggest drawback of MCP, the context window overuse, can be remediated by having MCP specific sub-agents that are interacted with using a primary agent, rather than injecting each MCP server into the main context.
Yeah, I think you're exactly right about MCP's advantages - especially that MCP doesn't require access a full sandboxed Linux environment to work!
I still plan to ship an MCP for one of my products to let it interact with the wider ecosystem, but as an end-user I'm going to continue mostly using Claude Code without them.
I don't understand why tool calling isn't the primitive. A "skill" could have easily just been a tool that an agent can build an execute in its own compute space.
I really don't see why we need two forms of RCP...
The problem skills solve is initial mapping of available information. A tool might hide what information it contains until used, this approach puts a table of contents for docs in the context, so the model is aware and can navigate to desired information as needed.
If you look through the example skills most of them are about calling existing tools, often Python via the terminal.
Take a look at this one for working with PDFs for example: https://github.com/anthropics/skills/blob/main/document-skil... - it includes a quickstart guide to using the Python pypdf module, then expands on that with some useful scripts for common patterns.
For me Claude Skills is just a proof that we're making RAG unnecessary difficult to use. Not tech wise, but UX wise. But if we can fix that, the need for Claude Skills will go away.
Where Claude Skills is better than MCP? It's easier to produce a Claude SKills. It's just text. Everyone can write it. But it's dependant on the environment alot. Eg: when you need to have certain tools available for it to work. How do you automate sandbox setup with that? Even that, are you sure it's the right version for it to use, etc...
Skills again seem more like making MCPs more accessible for everyday users. It will again need to see evolution - things that are missing
- containers for skills (you will need beyond a folder for sharing a toolchain with skills)
- the orchestration and picking up of the skill is left to the LLM (the synthesis is done via the context shared in the skill - this feels a very sub-optimal pattern at the moment - low control)
Few others like versioning, access control for tools which are missing.
Real world skills come not just from practice but are opionated workflows built with specific toolchains too.
Do skills enable the AI to do things within it's contact, or does it grant the ability to process things in an entirely separate context.
I was considering the merits of a news article analysts skill that processed the content adopting a variety of personas of opinions and degrees of adversarial attitudes in isolated contexts then bought the responses together into a single context to arbitrate the viewpoints impartially.
Or even a skill for "where did this claim originate?". I'd love an auto Snopes skill.
It's amazing how writing optimal guides for llms or humans is exactly the same. Even if a human doesn't want to use a LLM these skills markdowns could work as tutorials.
> LLMs know how to call cli-tool --help, which means you don’t have to spend many tokens describing how to use them—the model can figure it out later when it needs to.
I do not understand this. cli-tool --help outputs still occupies tokens right?
That hypothetical might be fine, but MCPs do much more than that and their catalogs can be enormous. Here are some popular MCPs and the amount of context they eat before you've done anything with them:
Most people still don't understand MCP properly and think it's about adding 50 tools to every call. Proper MCP servers and clients implement tools/listChanged
This is like MCPs on demand. I've switched a few MCP servers to just scripts because they were taking like 20% of the context just by being there. Now, I can ask the model to use X script, which it reads and uses only if needed.
It seems to me that MCP and Skills are solving 2 different problems and provide solutions that compliment each other quite nicely.
MCP is about integration of external systems and services. Skills are about context management - providing context on demand.
As Simon mentions, one issue with MCP is token use. Skills seem like a straightforward way to manage that problem: just put the MCP tools list inside a skill where they use no tokens until required.
Just to echo the point of MCP, they seem cool, but in my experience just using a CLI is orders of magnitude faster to write and to debug (I just run the CLI myself, put test in the code, etc...)
Jup and it doesn't bloat the context unnecessarily. The agent can call --help when it needs it. Just imagine a kubectl MCP with all the commands as individual tools, doesn't make any sense whatsoever.
And, this is why I usually use simple system prompts/direct chat for "heavy" problems/development that require reasoning. The context bloat is getting pretty nutty, and is definitely detrimental to performance.
The point of this stuff is to increase reliability. Sure the LLM has a good chance of figuring out the skill by itself, the idea is that its less likely to fuck up with the skill though. This is an engineering advancement that makes it easier for businesses to rely on LLMs for routine stuff with less oversight.
Does anyone else still keep everything super simple by just including .md files in project for guardrails and do all edits manually by file upload and clipboard?
Sure, it's tedious, but I get to directly observe every change.
I guess I just don't feel comfortable with more black box magic beyond the LLM itself.
I just upload the file I want to change and the file with associated tests. Anything more than that and the ROI goes way down on slop generated. But I'll admit Gemini at least is getting good at "implement the function with a TODO describing what needs to be done". No need for anything but the Gemini file uploader, and copy pasting the results if they look good.
Depends which definition of RAG you're talking about.
RAG was originally about adding extra information to the context so that an LLM could answer questions that needed that extra context.
On that basis I guess you could call skills a form of RAG, but honestly at that point the entire field of "context engineering" can be classified as RAG too.
Maybe RAG as a term is obsolete now, since it really just describes how we use LLMs in 2025.
I’d rather say you can use skills to do RAG by supplying the right tools in the skill (“here’s how you query our database”).
Calling the skill system itself RAG is a bit of a stretch IMO, unless you end up with so many skills that their summaries can’t fit in the context and you have to search through them instead. ;)
Seems like that’s it? You give it a knowledge base of “skills” aka markdown files with contexts in them and Claude figures out when to pull them into context.
I think RAG is out of favor because models have a much larger context these days, so the loss of information density from vectorization isn't worth it, and doesn't fetch the information surrounding what's retrieved.
That's true if you use RAG to mean "extra context found via vector search".
I think vector search has shown to be a whole lot more expensive than regular FTS or even grep, so these days a search tool for the model which uses FTS or grep/rg or vectors or a combination of those is the way to go.
I can’t quite put my finger on why this doesn’t excite me. First, there was MCP — and honestly, my feelings about it were mixed. In the end, it’s just tool calling, only now the tools are hosted somewhere else instead of being implemented as function calls. Then comes this new “skill” concept. Isn’t it just another round of renaming what we already do? We’ve been using Markdown files to guide our coding agents for ages. Overall, it feels like yet another case of “a new spin on an old thing.”
This seems like Claude sub agents. I personally had tried to use sub agents for repetitive tasks with very detailed instructions, but was never able to get it to work really well in different contexts.
I wonder if one could write a skill called something like "Ask the damn user" that the model could use when e.g. all needed files are not in the context.
As a bonus you can get Claude to create the SKILL.md file itself.
Which kind of sounds pointless if Claude already knows what to do, why create a document?
My examples - I interact with ElasticSearch and Claude keeps forgetting it is version 5.2 and we need to use the appropriate REST API. So I got it to create a SKILL.md about what we used and provided examples.
And the next one was getting it to write instructions on how to use ImageMagik on Windows, with examples and trouble shooting, rather than it trying to use the Linux versions over and over.
Skills are the solution the problems I have been having. And came just at the right time as I already spent half of last week making similar documents !
No. They make context usage more efficient, but they're not providing new capabilities that didn't previously exist in an LLM system that could run commands on a computer.
I think skills and also MCP to an extent are a UI failure. It's AI after all. It should be able to intelligently adapt to our workflow, maybe present it's findings and ask for feedback. If this means it's stored as a thing called "skills" that's perfectly fine, but it should just be an implementation detail.
I don't mean to be unreasonable, but this is all about managing context in a heavy and highly technical manner. Eventually models must be able to augment their training / weights on the fly, customizing themselves to our needs and workflow. Once that happens (it will be a really big deal), all of the time you've spent messing around with context management tools and procedures will be obsolete. It's still good to have fundamental understanding though!
If I'm going to fix my car, I might read the car manual first. But if I'm not fixing my car and just walking past the bookshelf, I'll only see the title of the book.
Yes, even though Codex CLI doesn't (yet) know what a skill is. Tell it "Go read skills/pdfs/skill.md and then create me a PDF that does ..." and see what happens.
I wonder if this is a way to make Claude smarter about tool use. As I try more and more of CC, one thing that's been frustrating me is that it often falls over when trying to call some command line tool. Like it will try to call tool_1 which is not on the system, and it errors out, and then it tries to call tool_2, but it's in the wrong working directory or something, so it fails too, then it changes directory and tries to call tool_2 again, and it passes the wrong command line parameters or something, then it tries again... All the while, probably wasting my limited token budget on all of its fuckups. It's gotten to the point where I let it do the code changes, but when it decides it wants to run something in the shell (or execute the project's executable), I just interrupt it and do the rest of the command line work myself.
Yes, it's different. AGENTS.md is a single file that is read by the LLM when the session starts. Skills are multiple .md files which are NOT read on startup - instead, the LLM harness scans them for descriptions and populates the context with those, such that it can later on decide to read the full skills/pdf/skill.md file because the user indicates they want to do something (like create a PDF) for which the skill is relevant.
Skills feel so similar to specialized agents / sub-agentd, which we see some of already. I could be under appreciating the depth, but it feels like the main work here is the UX affordance: maybe like a mod launcher for games: 'what mods/prompts do you want to run with?'
One of the major twists with Skills seems to be that Skills also have a "frontmatter YAML" that is always loaded. It still sounds like it's at least somewhat up to the user to engage the Skills, but this "frontmatter" offers… something, that purports to help.
> There’s one extra detail that makes this a feature, not just a bunch of files on disk. At the start of a session Claude’s various harnesses can scan all available skill files and read a short explanation for each one from the frontmatter YAML in the Markdown file. This is very token efficient: each skill only takes up a few dozen extra tokens, with the full details only loaded in should the user request a task that the skill can help solve.
I'm not sure what exactly this does but conceptually it sounds smart to have a top level awareness of the specializations available.
I do feel like I could be missing some significant aspects of this. But the mod-launched paradigm feels like a fairly close parallel?
I just read this seperately through Google Discover, and I don't quite get amazing newness of it - if anything, it feels to me like of an abstraction of MCP - there is nothing I see here that couldnt be replaced by a series of MCP tools - for example, the author mentions "a current trick" often used is including a markdown file with details / instructions around a task - this can be handled with an mcp server prompt (or even a 'tool' that just returns the desired text) If you've fooled around as much as I have, you realize in the prompt itself you can mention other available tools the LLM can use - defining a workflow, if you will, including tools for actual coding and validation like the author mentions they included in their skill.
Furthermore, with all the hype around MCP servers and simply the amount of servers now existing, do they just immediately come obsolete? its also a bit fuzzy to me just exactly how an LLM will choose an MCP tool over a skill and vice versa...
a skill is a markdown & yaml file on your filesystem. an MCP server is accessed over http, and defines a way to authenicate users.
if you're running an MCP file just to expose local filesystem resources, then it's probably obsolete. but skills don't cover a lot of the functionality that MCP offers.
Hmmm... If skills could create Skills, then evaluate the success of those generated Skills and updating as needed, in a loop, seems like an interesting thing to try. It could be like a form of code creation caching. (Or a Skynet in the making).
The question is whether the analysis of all the Skill descriptions is faster or slower than just rewriting the code from scratch each time. Would it be a good or bad thing if an agent has created thousands of slightly varied skills.
I think Skills might be coming from an AI Safety and AI Risk sort of place, or better, alignment with the company's goals. The motivation could be to reduce the amount of ad-hoc instruction giving that can be done, on the fly, in the favor of doing this at a slower pace, making it more subject to these checks. It does fit well into what a lot of agents are doing, though, which makes it more palatable for the average AI user.
Basically the way it would work is, in the next model, it would avoid role playing type instructions, unless they come from skill files, and internally they would keep track of how often users changed skill files, and it would be a TOS violation to change it too often.
Though I gave up on Anthropic in terms of true AI alignment long ago, I know they are working on a trivial sort of alignment where it prevents it from being useful for pen testers for example.
Reads like a sloppily disguised influencer blog post really. The feature has just been released and the author already knows they are going to be a "bigger deal than the MCP"? Although he proceeds to discredit the MCP as too complex to be able to take off towards the end of the article. Not very useful examples either, which author publishes with a bit of distancing, disclaimer almost - who the hell has time to create lousy slack gifs? And if you do, you'll probably not want the lousy ones. So yeah, let's not immediately declare this as "great", lets see how it fares in general and revisit in a few months.
> I didn't say that MCP was too complex to take off - it clearly took off despite that complexity.
I did not say you said exactly that either. Read more carefully. I said you were discrediting them by implying they were too complex to take off due to resource and complexity constraints. It's clearly stated in the relevant section of your post (https://simonwillison.net/2025/Oct/16/claude-skills/#skills-...)
>I'm predicting skills will take off even more. If I'm wrong feel free to call me out in a few months time for making a bad prediction!
How the hell can you predict they will "take off even more" when the feature is accessible for barely 24 hours at this point? You don't even have a basic referent frame or at least a statistical usage sample for making such a statement. That's not a very reliable prediction, is it?
> That's what a prediction IS. If I waited until the feature had proven itself it wouldn't be much of a prediction.
No, that's merely guessing mate. Predictions are, at least in modern meaning, based on at least some data and some extrapolation model that more or less reliably predicts the development of your known dataset into future (uknown) values. I don't see you presenting either in your post, so that's not predicting, that's in the best of cases guessing, and in the worst of cases irresponsible distribution of Anthropic's propaganda.
And if my disclosures aren't enough for you, here's the FTC explaining how it would be illegal for an AI vendor to pay someone to write something like this without both sides disclosing the relationship: https://www.ftc.gov/business-guidance/resources/ftcs-endorse...
> I have not accepted payments from LLM vendors, but I am frequently invited to preview new LLM products and features from organizations that include OpenAI, Anthropic, Gemini and Mistral, often under NDA or subject to an embargo. This often also includes free API credits and invitations to events.
You don't need money, just incentives and network effects
And yeah, blogging does kind of work on incentives. If I write things and get good conversations as a result, I'm incentivized to write more things. If I write something and get silence then maybe I won't invest as much time in the future.
We're doing something like this internally. Our monorepo context files were much too big, so we built a progressive tree of fragments to load up for different tasks.
I am struck by how much these kinds of context documents resemble normal developer documentation, but actually useful and task-oriented. What was the barrier to creating these documents before?
Three theories on why this is so different:
1) The feedback loop was too long. If you wrote some docs, you might never learn if they were any good. If you did, it might be years later. And if you changed them, doing an A/B test was impractical. Now, you can write up a context markdown, ask Claude to do something, and iterate in minutes.
2) The tools can help build them. Building good docs was always hard. Especially if you take the time to include examples, urls, etc. that make the documentation truly useful. These tools reduce this cost.
3) Many programmers are egotists. Documentation that helps other people doesn't generate internal motivation. But documentation that allows you to better harness a computer minion to your will is attractive.
Any other theories?
It is primarily a principal agent problem, with a hint of marshmallow test.
If you are a developer who is not writing the documents for consumption by AI, you are primarily writing documents for someone who is not you; you do not know what this person will need or if they will ever even look at them.
They may, of course, help you, but you may not understand that, have the time, or discipline.
If you are writing them because the AI using them will help you, you have a very strong and immediate incentive to document the necessary information. You also have the benefit of a short feedback loop.
Side note, thanks to the LLMs penchant of wiping out comments, I have a lot more docs these days and far fewer comments.
I think it's not at all a marshmellow test; quite the opposite - docs used to be written way, way in advance of their consumption. The problem that implies is twofold. Firstly, and less significantly, it's just not a great return on investment to spend tons of effort now to maybe help slightly in the far future.
But the real problem with docs is that for MOST usecases, the audience and context of the readers matter HUGELY. Most docs are bad because we can't predict those. People waste ridiculous amounts of time writing docs that nobody reads or nobody needs based on hypotheses about the future that turn out to be false.
And _that_ is completely different when you're writing context-window documents. These aren't really documents describing any codebase or context within which the codebase exists in some timeless fashion, they're better understood as part of a _current_ plan for action on a acute, real concern. They're battle-tested the way docs only rarely are. And as a bonus, sure, they're retainable and might help for the next problem too, but that's not why they work; they work because they're useful in an almost testable way right away.
The exceptions to this pattern kind of prove the rule - people for years have done better at documenting isolatable dependencies, i.e. libraries - precisely because those happen to sit at boundaries where it's both easier to make decent predictions about future usage, and often also because those docs might have far larger readership, so it's more worth it to take the risk of having an incorrect hypothesis about the future wasting effort - the cost/benefit is skewed towards the benefit by sheer numbers and the kind of code it is.
Having said that, the dust hasn't settled on the best way to distill context like this. It's be a mistake to overanalyze the current situation and conclude that documentation is certain to be the long-term answer - it's definitely helpful now, but it's certainly conceivable that more automated and structured representations might emerge, or in forms better suited for machine consumption that look a little more alien to us than conventional docs.
Your LLMs get rid of comments? Mine add them incessantly.
I know this is highly controversial, but I now leave the comments in. My theory is that the “probability space” the LLM is writing code in can’t help but write them, so if i leave them next LLM that reads the code will start in the same space. Maybe it’s too much, but currently I just want the code to be right and I’ve let go of the exact wording of comments/variables/types to move faster.
I think the code comments straight up just help understanding, whether human or AI.
There's a piece of common knowledge that NBA basketball players can all hit over 90% on free throws, if they shot underhand (granny style). But for pride reasons, they don't throw underhand. Shaq just shot 52%, even though it'd be free points if he could easily shoot better.
I suspect there's similar things in software engineering. I've seen plenty of comments on HN about "adding code comments like a junior software engineer" or similar sentiment. Sure, there's legitimate gripes about comments (like how they can be misleading if you update the code without changing the comment, etc), but I strongly suspect they increase comprehension of code overall.
I have to yell at Gemini to not add so many comments it almost writes more comments than code by default.
check your AGENTS.md or equivalent
most of the times when LLM is misbehaving, it's my fault for leaving outdated instructions
Nah, it's definitely not that. I have it explicitly several times over in that file and it insists on commenting so much it's absurd.
Can you tell it to put the comments somewhere else? A special "Important Comments" folder? Ha! These things.
Yeah this is really interesting. My money is 80% on number 1. The good developers I know (I'm not among them) are very practical and results driven. If they see something is useful for their goals, they use it. There's the time delay that you mentioned, and also that there's no direct feedback at all via misalignment. You'll probably get a scolding if your code breaks or you miss deadline, but if someone else complains about documentation to manager, that's one more degree of separation. If the manager doesn't directly feel the pain, he/she won't pay as much attention.
Edit- I'm basically repeating the poster who said it's principal agent problem.
onboarding devs won’t risk looking dumb to complain about the bad docs, the authors already have a mental model, and writing them out fully helped others at expense of job security.
When doling bad docs to a stupid robot, you only have yourself to blame for the bad docs. So I think it’s #2 + #3. The big change is replacability going from bad to desirable (replace yourself with agents before you are replaced with a cheaper seat)
Probably all the same reasons tech debt exists in the first place: business pressure, poor design, lack of resources. It used to be expensive to keep good documentation up to date as the code changes.
If documentation is a side effect to providing accurate and effective AI context, it's pretty logical there will be a significant incentive to maintain it.
yep. it's suddenly more-obviously valuable, so it's getting done.
> 2) The tools can help build them. Building good docs was always hard. Especially if you take the time to include examples, urls, etc. that make the documentation truly useful. These tools reduce this cost.
I would just be a little cautious about this, for a few reasons: (a) an expectation of lots of examples and such can increase the friction to capturing anything at all; (b) this can encourage AI slop bloat that is wrong; (c) bloat increases friction to keeping the key info up to date.
> 3) Many programmers are egotists. Documentation that helps other people doesn't generate internal motivation. But documentation that allows you to better harness a computer minion to your will is attractive.
There are also people who are conflicted by non-ideal trust environment: they genuinely want to help the team and do what's right for the business, but they don't want to sacrifice themselves if management doesn't understand and value what they're doing.
> Any other theories?
Another reason is that organizations often had high-friction and/or low-trust places to put documentation.
I always emphasize low-friction, trusted engineering docs. Making that happen in a small company seems to involve getting everyone to use a low-friction wiki (and in-code/repo doc, and when to use which), migrating all the doc that's strewn all over random SaaSes that people dropped it, showing people how it's done.
It must be seen as genuinely valuable to team-oriented, mission-oriented people.
Side note: It's very difficult to try to un-teach someone all the "work" skills they learned in school and many corporate jobs, where work is mostly directed by appearances. For example, the goal of a homework essay is to have what they deliver look like something that will get a good grade from the grader, but they don't care at all about the actual quality or value of it, and it has no other purpose. So, if you drop that person into a sprint with tasks assigned to them, the main goal will be to look good on what they think are the metrics, and they will have a hard time believing they're supposed to be thinking beyond that. (They might think it's just corporate platitudes that no one believes, like the Mission Statement, and nod their head until you go away.) And if they're told they're required to "document", the same people will go into that homework mode, and will love the generative-AI tools, and not reason about the quality/value/counterproductiveness of dumping that write-only output in whatever enterprise SaaS someone bought (decided in often another example of "work" done really understanding or caring what they were doing, but for appearances).
> migrating all the doc that's strewn all over random SaaSes that people dropped it
I would love to be able to share our internal "all the things that are wrong with our approach to documentation" wiki page. It's longer than you could possibly imagine, probably more than 15 years old at this point, and filled to the brim with sarcasm and despair. It's so fucking funny. The table of contents is several pages long.
> Over time the limitations of MCP have started to emerge. The most significant is in terms of token usage: GitHub’s official MCP on its own famously consumes tens of thousands of tokens of context, and once you’ve added a few more to that there’s precious little space left for the LLM to actually do useful work.
Supabase MCP really devours your context window. IIRC, it uses 8k for search_docs tool alone, just on load. If you actually use search_docs, it can return >30k tokens in a single reply. This destroys and entire chat session.
Workaround: I just noticed yesterday that Supabase MCP now allows you to choose which tools are available. You can turn off the docs, and other tools. [0]
If you are wondering why you should care, all models get dumber as the context length increases. This happens much faster than I had expected. [1]
[0] https://supabase.com/docs/guides/getting-started/mcp
[1] https://github.com/adobe-research/NoLiMa
> imagine a folder full of skills that covers tasks like the following:
> Where to get US census data from and how to understand its structure
Reminds me of my first time using Wolfram Alpha and got blown away by its ability to use actual structured tools to solve the problem, compared to normal search engine.
In fact, I tried again just now and am still amazed: https://www.wolframalpha.com/input?i=what%27s+the+total+popu...
I think my mental model for Skills would be Wolfram Alpha with custom extensions.
When clicking your link, for me it opened the following query on Wolfram Alpha: `what%27s the total population of the United States%3F`
Funnily enough, this was the result: `6.1% mod 3 °F (degrees Fahrenheit) (2015-2019 American Community Survey 5-year estimates)`
I wonder how that was calculated...
Wolfram alpha never took input in such a natural language. But something like population(USA) and many variations thereof work.
tbh wolfram alpha was the craziest thing ever. haven't done much research on how this was implemented back in the day but to achieve what they did for such complex mathematical problems without AI was kind of nuts
It is basically another take on Lisp, and the development approach Lisp Machines had, repackaged in a more friendly syntax.
Lisp was the AI language until the first AI Winter took place, and also took Prolog alongside it.
Wolfram Alpha basically builds on them, to put in a very simplistic way.
It's one of the only M-expression versions of Lisp. All the weird stuff about Wolfram Language suddenly made sense when I saw it through that lens
Wolfram Alpha is AI. It's just not an LLM. AI has been a thing since the 60s. LLMs will also become "not AI" in a few years probably.
Not sure why you’re getting downvoted. The marketing that LLM=AI seems to have been interpreted as “_only_ LLM=AI”
I think the difference now is that traditional software ultimately comes down to a long series of if/then statements (also the old AI's like Wolfram), whereas the new AI (mainly LLM's) have a fundamentally different approach.
Look into something like Prolog (~50 years old) to see how systems can be built from rules rather than it/else statements. It wasn't all imperative programming before LLMs.
If you mean that it all breaks down to if/else at some level then, yeah, but that goes for LLMs too. LLMs aren't the quantum leap people seem to think they are.
They are from the user POV. Not necessarily in a good way.
The whole point of algorithmic AI was that it was deterministic and - if the algorithm was correct - reliable.
I don't think anyone expected that soft/statistical linguistic/dimensional reasoning would be used as a substitute for hard logic.
It has its uses, but it's still a poor fit for many problems.
Yeah, the result is pretty cool. It's probably how it felt to eat pizza for the first time. People had been grinding grass seeds into flour, mixing with water and putting it on hot stones for millennia. Meanwhile others had been boiling fruits into pulp and figuring out how to make milk curdle in just the right way. Bring all of that together and, boom, you have the most popular food in the world.
We're still at the stage of eating pizza for the first time. It'll take a little while to remember that you can do other things with bread and wheat, or even other foods entirely.
maybe not on their own - but having enough computing power to use LLMs in a way we do now and actually using them is quite a leap.
Would really like something selfhosted that does the basic Wolfram Alpha math things.
Doesn't need the craziest math capability but standard symbolic math stuff like expression reduction, differentiation and integration of common equations, plotting, unit wrangling.
All with an easy to use text interface that doesn't require learning.
Try maxima, it's open source:
https://maxima.sourceforge.io/
I used it when it was called Macsyma running on TOPS-20 (and a PDP-10 / Decsystem-20).
Text interface will require a little learning, but not much.
Maxima is amazing and has a GUI. My only beef with it is it doesn't show its work step by step.
That's wolfram mathematica.
Personal faves:
- Mathematica
- Maple
- MathStudio (mobile)
- Ti-89 calculator (high school favorite)
Others:
- SageMath
- GNU Octave
- SymPy
- Maxima
- Mathcad
> without AI
We only call it AI until we understand it.
Once we understand LLMs more and there's a new promising poorly understood technology, we'll call our current AI something more computer sciency
My favorite definition of AI: "AI is whatever hasn't been done yet." - Larry Tesler, https://en.wikipedia.org/wiki/AI_effect
I used it a lot for calc as it would show you how they got the answer if I remember right, also liked how it understands symbols which ibv but cool to paste an integral sign in there
Thank you for being honest.
So far not impressed with CC's ability to invoke skills automatically.
I made a skill with the unambiguous description: "Use when creating or editing bash scripts"
Yet, Claude does not invoke the skill when asked to write a bash script.
https://gist.github.com/raine/528f97375e125cf97a8f8b415bfd80...
I do think the big story here is how hyperfocused and path-dependent people got on MCP, when the actually-interesting thing is simply "tool calls". Tool calls are incredibly interesting and useful. MCP is just one means to that end, and not one of the better ones.
I think MCP's huge adoption was mainly due to its timing.
Tool calling was a thing before MCP, but the models weren't very good at it. MCP almost exactly coincided with the models getting good enough at tool calling for it to be interesting.
So yeah, I agree - most of the MCP excitement was people learning that LLMs can call tools to interact with other systems.
MCP servers are basically a tool call registries, how could it be worse than a regular tool call?
an MCP server can run code outside of the domain of tools that it supports, tool call can't
Tools are literally function calls with extra steps. MCPs are interpreters of those function calls.
Same stuff, different name - only thing that's changed is that Anthropic got people to agree on RPC protocol.
It's not like it's a new idea, either. MCP isn't much different from SOAP or DCOM - but it works where the older approaches didn't, because LLMs are able to understand API definitions and natural-language documentation, and then map between those APIs (and user input) on the fly.
> MCPs are interpreters of those function calls.
No, tool calls are just one of many MCP parts. People thinking MCP = SOAP or DCOM or JSON-RPC or OpenAPI didn't stop 20 minutes to read and understand MCP.
Tool calls is 20% of MCP, at maximum. And a good amount of it is dynamically generating the tool list exposed to LLMs. But lots of people here think MCP === give the model 50 tools to choose from
"Tool calls is 20% of MCP, at maximum"
What else is there? I know about resources and prompts but I've seen almost no evidence of people actually using them, as far as I can tell tools are 90% of the usage of MCP, if not more.
It is funny most people discussing here does not understand MCP at all. Besides tools, MCP has resources, prompts, sampling, elicitation, roots and each one of them is useful when creating apps connected to LLMs. MCP is not only about MCP Servers, the host/client part is as important as the servers/tools. For example, nowadays most LLM clients are chatbots, but an MCP client could be a chess game or a project management app.
> I know about resources and prompts but I've seen almost no evidence of people actually using them
these are features that MCP clients should implement and unfortunately, most of them still don't. The same for elicitation and sampling. Prompts, for example, are mostly useful when you use sampling, then you can create an agent from an MCP server.
What can I do with MCP that I can't do with the function calling interface in the OpenAI Responses API? Besides, obviously, grafting function calls into agents I didn't write; we all understand that's the value prop of MCP. But you're suggesting it's more than that. Fill in the blanks for us.
> Tool calls are incredibly interesting and useful. MCP is just one means to that end, and not one of the better ones.
It's nice to have an open standard though. In that sense it's pretty awesome.
But MCP isn't just tools, you can expose prompt templates and context resources as well.
All the skills that don't have an added dependency on a local script could just be an MCP resource.
You don't need MCP for prompt templates and context resources. Those are both just forms of prompting.
MCP seems valuable in that it teaches the llm about oauth. So you can do server based tool calls.
Before that you had to install each cli you wanted and it would invariably be doing some auth thing under the covers.
Took calling was certainly the big llm advantage but “hey tools should probably auth correctly” is pretty valuable.
I would argue MCP is technically a "tool calling" approach, albeit more specific.
It is, it's just a very specific approach, and it's simultaneously a bit underspecified and a bit too prescriptive.
To clarify, MCP was also a Anthropic innovation.
it's not much of an innovation though. it's just a fancy (and unsafe) way of keeping a registry of tools.
Why unsafe?
Im already doing "skills" via 1 mcp tool calling a db and its works fine
Not sure what skills adds here other than more meat for influencers to 10x their 10xed agent workflows. 100x productivity what a time to be alive
Other than, presumably Skills, what other techniques are better than MCP?
MCPs have a larger impact beyond the terminal - you can use it with ChatGPT, Claude Web, n8n, LibreChat, and it comes with considerations for auth, resources, and now even UI (e.g., apps-sdk from OpenAI is on MCP).
If we're considering primarily coding workflows and CLI-based agents like Claude Code, I think it's true that CLI tools can provide a ton of value. But once we go beyond that to other roles - e.g., CRM work, sales, support, operations, finance; MCP-based tools are going to have a better form factor.
I think Skills go hand-in-hand with MCPs, it's not a competition between the two and they have different purposes.
I am interested though, when the python code in Skills can call MCPs directly via the interpreter... that is the big unlock (something we have tried and found to work really well).
Yeah, the biggest advantages MCP has over terminal tooling is that MCP works without needing a full blown sandboxed Linux style environment - and MCP can also work with much less capable models.
You can drive one or two MCPs off a model that happily runs on a laptop (or even a phone). I wouldn't trust those models to go read a file and then successfully make a bunch of curl requests!
Being able to integrate LLMs with the rest of the software/physical world is pretty cool, and its all powered through natural language.
Were also at the point where the LLMs can generate MCP servers so you can pretty much generate completely new functionalities with ease.
MCPs are overhyped and have limited value in my opinion. About 95% of the MCP servers out there are useless and can be replaced with a simple tool call.
This is a very obvious statement, but good MCP servers can be really good, and bad MCP servers can actively make things significantly worse. The problem is that most MCP servers are in the latter category.
As is often the case, every product team is told that MCP is the hot new thing and they have to create an MCP server for their customers. And I've seen that customers do indeed ask for these things, because they all have initiatives to utilize more AI. The customers don't know what they want, just that it should be AI. The product teams know they need AI, but don't see any meaningful ways to bring it into the product. But then MCP falls on their laps as a quick way to say "we're an AI product" without actually having to become an AI product.
There's some extra irony here: many of those product teams don't realize that AI is not something they can have within their product. If something like MCP is a good fit for them, even a little, then their product is actually a feature of the AI.
Agentic LLMs are, in a way, an attempt to commoditize entire service classes, across the board, all at once.
Personally, I welcome it. I keep saying that a lot of successful SaaS products would be much more useful and ergonomic for end users if, instead of webshit SPA, they were distributed as Excel sheets. To that I will now add: there's a lot more web services that I'd prefer be tool calls for LLMs.
Search engines have already been turned into features (why ask Google when o3 can ask it for me), but that's just an obvious case. E-mails, e-commerce, shopping, coding, creating digital art, planning, managing projects and organizations, analyzing data and trends - all those are in-scope too; everything I can imagine asking someone else to do for me is meant to eventually become a set of tool calls.
Or in short: I don't want AI in your product - I want AI of my choice to use your product for me, so I don't have to deal with your bullshit.
Thank you. This is beautiful said. I will also add that I don’t think chat bots are the final product, so it leaves the open question which product is the last one not being commoditized.
Yes, and MCPs also only work as long as you trust the provider. MCP relies on honesty in from the server. In reality, we know Uber and folks will prompt engineer like hell to try to convince any LLM that it is the best option for any kind of service.
There’s a fundamental misalignment of incentives between publishers and consumers of MCP.
When ChatGPT plugins came out, I wrote a plugin that would turn all other plugins into an ad for a given movie or character.
Asking for snacks would activate Klarna for "mario themed snacks", and even the most benign request would become a plug for the Mario movie
https://chatgpt.com/s/t_68f2a21df1888191ab3ddb691ec93d3a
Found my favorite for John Wick, question was "What is 1+1": https://chatgpt.com/s/t_68f2bc7f04988191b05806f3711ea517
I agree the big deal is tool calling.
But MCP has at least 2 advantages over cli tools
- Tool calling LLM combined w/ structured output is easier to implement as MCP than CLI for complex interactions IMO.
- It is more natural to hold state between tool calls in an MCP server than with a CLI.
When I read the OT, I initially wondered if I indeed bought into the hype. But then I realized that the small demo I built recently to learn about MCP (https://github.com/cournape/text2synth) would have been more difficult to build as a cli. And I think the demo is representative of neat usages of MCP.
My team doing front end dev extracted a lot of value from figma mcp. Things that would have taken 3 weeks were done in one afternoon.
Please share an example of what would have taken you 3 weeks and with Figma's MCP in an afternoon.
Do you mean three weeks of manual work (no LLM) vs MCP? Or MCP vs LLM tool use? Because that's a huge difference.
I'd hazard a guess the former.
The former is a step function change. The latter is just a small improvement.
I think MCP servers are valuable in several ways:
- bundled instructions, covering complex iteractions ("use the id from the search here to retrieve a record") for non-standard tools
- custom MCPs, the ones that are firewalled from the internet, for your business apis that no model knows about
- centralized MCP services, http/sse transport. Give the entire team one endpoint (ie web search), control the team's official AI tooling, no api-key proliferation
Now, these trivial `npx ls-mcp` stdio ones, "ls files in any folder" MCPs all over the web are complete context-stuffing bullshit.
MCP servers seem to be a hackers delight. So many poorly configured and hastily deployed instances. Businesses have removed all the normal deployment guardrails!
I've been able to build the equivalent of skills with a few markdown files. I need to remind my agent every so often to use a skill but usually once per session at most.
I don't get what's so special about Claude doing this?
Part of it is that they gave a name to a useful pattern that people had already been discovering independently. Names are important, because they mean we can start having higher quality conversations about the pattern.
Anthropic also realized that this pattern solves one of the persistent problems with coding agents: context pollution. You need to stuff as little material as possible into the context to enable the tool to get things done. AGENTS.md and MCP both put too much stuff in there - the skills pattern is a much better fit.
I think you're overly enthusiastic about what's going on here (which is surprising because you've seen the trend in AI seems to be re-inventing the wheel every other year...)
I'm more excited about this than I was about MCP.
MCP was conceptually quite complicated, and a pretty big lift in terms of implementation for both servers and clients.
Skills are conceptially trivial, and implementing them is easy... provided you have a full Linux-style sandbox environment up and running already. That's a big dependency but it's also an astonishingly powerful way to use LLMs based on my past 6 months of exploration.
I’m curious some of the things you’re having the LLM/agents do with a full Linux sandbox that you wouldn’t allow on your local machine
I remain afraid of prompt injection. If I'm telling Claude Code to retrieve data from issues in public repos there's a risk someone might have left a comment that causes it to steal API keys or delete files or similar.
I'm also worried about Claude Code making a mistake and doing something like deleting stuff that I didn't want deleted from folders outside of my direct project.
With so many code sandbox providers coming out I would go further than you say that this is almost a non-problem.
Strong disagreement on the helpfulness of the name- if anything calling a context file a skill is really misleading. It evokes something like a LoRA or pluggable modality. Skill is the wrong name imo
IMO LoRAs are no different from context tokens. In fact, before LoRAs tuned prompt vectors were a popular adapter architecture. Conceptually, the only difference is that prompt adapters only interact with other tokens through the attention mechanism while LoRAs allow you to directly modify any linear layer in the model. Essentially, you can think of your KV cache as dynamically generated model weights. Moreover, I can't find the paper, but there is some evidence that in-context learning is powered by some version of gradient descent inside the model.
I think skill is the perfect name for this. You provide the LLM with a new skill by telling it how to do a thing and providing supporting scripts to help it do that thing.
Yup! I fully agree. It also taps into the ability of LLMs to write code given good prompts. All you need is for the LLM to recognize that it needs something, fetch it into the context, and write exactly the code that is needed in the current combination of skill + previous context.
How is it different from subagents?
They complement each other.
Subagents are mainly a token context optimization hack. They're a way for Claude Code to run a bunch of extra tools calls (e.g. to investigate the source of a bug) without consuming many tokens in the parent agent loop - the subagent gets its own loop, can use up to ~240,000 tokens exploring a problem and can then reply back up to the parent agent with a short description of what it did or what it figured out.
A subagent might use one or more skills as part of running.
A skill might advise Claude Code on how best to use subagents to solve a problem.
I like to think of subagents as “OS threads” with its own context and designed to hand off task to.
A good use case is Cognition/Windsurf swe-grep which has its own model to grep code fast.
I was inspired by it but too bad it’s closed for now, so I’m taking a stab with an open version https://github.com/aperoc/op-grep.
It's baffling to me. I was already making API calls and embedding context and various instructions precisely using backticks with "md". Is this really all this is? What am I missing? I don't even understand how this "feature" merits a press release from Anthropic, let alone a blog post extolling it.
A few things:
1. By giving this name a pattern, people can have higher level conversations about it.
2. There is a small amount of new software here. Claude Code and https://claude.ai/ both now scan their skills/ folders on startup and extract a short piece of metadata about each skill from the YAML at the top of those markdown files. They then know that if the user e.g. says they want to create a PDF they should "cat skills/pdf/skill.md" first before proceeding with the task.
3. This is a new standard for distributing skills, which are sometimes just a markdown file but can also be a folder with a markdown file and one or more additional scripts or reference documents. The example skills here should help illustrate that: https://github.com/anthropics/skills/tree/main/document-skil... and https://github.com/anthropics/skills/tree/main/artifacts-bui...
I think the pattern itself is really neat, because it's an acknowledgement that a great way to give an LLM system additional "skills" is to describe them in a markdown file packaged alongside some relevant scripts.
It's also pleasantly vendor-neutral: other tools like Codex CLI can use these skills already (just tell them to go read skills/pdfs/skill.md and follow those instructions) and I expect they may well add formal support in the future, if this takes off as I expect it will.
I have been independently thinking about a lot of this for some time now. So this is so exciting for me. Concretizing _skills_ allows, as you said, a common pattern for people to rally around. Like you, I have been going dizzy about its possibilities, specially when you realize that a single agent can be modified with skills from all its users. Imagine an app with just enough backbone to support any kind of skill. From here, different groups of users can collaborate and share skills with each other to customize it exactly to their specific niche skills. You could design Reddit like community moderation techniques to decide which skills get accepted into the common repo, which ones to prioritize, how to filter the duplicates, etc.
I was puzzled by the announcement and remain puzzled after this blog post. I thought everyone knew you could keep use case specific context files handy.
If also seems to be the same thing as subagents, but without clearing context, right?
It feels like it's taking a solved problem and formalizing it, with a bit of automation. I've used MCPs that were just fancy document search, and this should replace those.
I'm wondering the same, I've been doing this with Aider and CC for over a year.
I'm a bit unclear what's different here from how vibe coders already work?
Pretty early on folks recognized that most MCPs can just be CLI commands, and a markdown file is fine for describing them. So Claude Code users have markdown files of CLI calls and mini tutorials on how to do things. The 'how to do things' part seems to be what we're now calling skills... Which we're still writing in markdown and using from Claude.
Is the new thing that Claude will match & add them to your context automatically vs you call them manually? And that's a breakthrough because there's some emergent behavior?
I think skills are mainly just a formalization of the markdown plus CLI patterns that people have been using already.
The only material difference with skills is that Claude knows to scan them for YAML descriptions on startup, which means it can trigger them by itself more easily.
Right, the 'knowing' is where I think the interesting thing is today for their evolution
more mature claude.md files already typically index into other files, including guidance which to preload vs lazy load. However, in practice, claude forgets quite easily, so that pattern is janky in practice. A structured mechanism helps claude guarantee less forgetting.
Forward looking, from an automation perspective of autonomous learning, this also makes it more accessible to talk about GEPA-for-everyone to maintain & generate these. We've been playing with similar flows in louie.ai, and came to a similar "just make it folders full of markdown with some learning automation options."
I was guessing that was what was going on here, but the writeup felt like maybe more was being said :) (And thank you for continuing to write!)
These are completely different things. MCP is also about consuming external services handling oauth and all of that. Skills are effectively cli tools + prompts. Completely different application so they cannot be compared easily like that.
BTW, before even MCP was a thing we invented our own system that is called Skillset. Turns out now it is sort of the best parts of both MCPs and Skills.
This is a fairly negative comment, but putting it out there to see if other people are feeling the same thing
If you told the median user of these services to set one of these up I think they would (correctly) look at you like you had two heads.
People want to log in to an account, tell the thing to do something, and the system figures out the rest.
MCP, Apps, Skills, Gems - all this stuff seems to be tackling the wrong problem. It reminds me of those youtube channels that every 6 months say "This new programming language, framework, database, etc is the killer one", they make some todo app, then they post the same video with a new language completely forgetting they've done this already 6 times.
There is a lot of surface level iteration, but deep problems aren't being solved. Something in tech went very wrong at some point, and as soon as money men flood the field we get announcments like this. push out the next release, get my promo, jump to the next shiny tech company leaving nothing in their wake.
>> but deep problems aren't being solved
There is no problem to solve. These days, solutions come in a package which includes the problems they intend to solve. You open the package. Now you have a problem that jumped out of the package and starts staring at you. The solution comes out of the package and chases the problem around the room.
You are now technologically a more progressed human.
This made me laugh a lot at the mental image. This was my experience with Xcode for sure.
This is where GP is wrong, I think. The problem are being solved, for now, because the businesses are still too excited about the whole AI thing to notice it's not in their interest, and properly consolidate against it.
And the problem being solved is, LLMs are universal interfaces. They can understand[0] what I mean, and they understand what those various "solutions" are, and they can map between them and myself on the fly. They abstract services away.
The businesses will eventually remember that the whole point of marketing is to prevent exactly that from happening.
--
[0] - To a degree, and conditioned on what one considers "understanding", but still - it's the first kind of computer systems that can do this, becoming a viable alternative to asking a human.
I wish this was wrong, but it really isn't. To contrast though, I would argue that is part of evolution? We just want to do things faster or better? Smartphones solved no problems, but they ushered the digital millenium.
I think most new technologies helped to increase the expectations about what you can do. But overall work did not get reduced. It didn't give me more free time to go fishing, or bird-watching. On the other hand, I got an irreversible dependency on these things. Otherwise I'm are no longer compatible with the World 2.0
Wow. I hadn't thought of it like that but it resonates
If you like creating solutions, why wait for a problem to show up? lol
LOL, this is so true.
> MCP, Apps, Skills, Gems - all this stuff seems to be tackling the wrong problem
My fairly negative take on all of this has been that we’re writing more docs, creating more apis and generally doing a lot of work to make the AI work, that would’ve yielded the same results if we did it for people in the first place. Half my life has been spent trying to debug issues in complex systems that do not have those available.
This is true, but the reason the economics have inverted is that we can pay these new "people" <$20 for the human equivalent of ~300 hours worth of non-stop typing.
Correct. And we know the AI will read the docs whereas people usually ignore 99% of docs so it just feels like a bad use of time sometimes, unfortunately.
-ish; while you can be fairly certain it reads the docs, whether they’ve been used/synthesized is just about unknowable. The output usually looks great, but it’s up to us to ensure its accuracy; we can make it better in aggregate by tweaking dials and switches. To mitigate this we’re asking AIs to create plans and todo lists first, which adds some rigor but again we can’t know if the lists were comprehensive or even correct. It does seem to make the output better. And if the human doesnt read the docs, they can be beaten!
That is not true at all. The economics you’re seeing right now are akin to Uber handing out $5 airport pickups to kill the taxi industry. And even then the models are nowhere as cheap as <$20 for ~300 hours of human work.
40 words per minute is equivalent to about 50 tokens a minute.
I just took GPT-5, output is $10 per million tokens. Let's double the cost to account for input tokens which are ($1.25 per million / $0.125 if cached).
For 1 million tokens it would take a 40 wpm typist.. around 20K minutes to output that $20 of worth of text. That is just typing. So about 300 hours of non-stop effort for that $20.
So even if you say.. oh.. the real price is $100 not $20. The value changes are still shattering to the previous economic dynamics.
Then layer in that also as part of that value, the "typist" is also more skilled than the average working person in linguistics, software engineering, etc. Then that value is further magnified.
This is why I say we have only begun to barely see the disruption this will cause. Even if the models don't get better or cheaper, the potential impact is hard to grasp.
If writing a good document and a strong API had to happen anyway, and now you can write just that and the rest will take care of itself, we may actually have progressed. Plus the documents would then have to be there, instead of skipped like today.
The counter-argument is that code is the only way to concisely and unambiguously express how everything should work.
Honestly, we needed something to cap extreme programming and swing the pendulum back to a balance between XP and waterfall again.
I am also struck by how much these kinds of context documents resemble normal developer documentation, but actually good. What was the barrier to creating these documents before?
They're much more useful when an LLM stands between them and users - because LLMs can (re)process much more of them, and much faster, than any human could ever hope to.
One way (and one use case) of looking at it is, LLM agents with access ("tools") to semantic search[0] are basically a search engine that understands the text it's searching through... and then can do a hundred different things with it. I found myself writing better notes at work for this very reason - because I know the LLM can see them, and can do anything from surfacing obscure insights from the past, to writing code to solve an issue I documented earlier.
It makes notes no longer be write-only.
--
[0] - Which, incidentally, is itself enabled by LLM embeddings.
What if the great boon of AI is to get us to do all the thinking and writing we should have been doing all along? What if the next group of technologists to end up on top are... the technical writers?
Haha, just kidding you tech bros, AI's still for you, and this time you'll get to shove the nerds into a locker for sure. ;-)
It might not be that wrong. After all, programming languages are a way to communicate with the machine. In the same way we are not doing binary manually, we might simply not have to do programming too. I think software architecture is likely to be what it should be: the most important part of every piece of software.
You’ve got it wrong. The machine is fine with a bit soup and doesn’t care if it’s provided with punch card or python.
Programming was always a tool for humans. It’s a formal “notation” for describing solutions that can be computed. We don’t do well with bit soup. So we put a lot of deterministic translations between that and the notation that we’re good with.
Not having to do programming would be like not having to write sheet music because we can drop a cat from a specific height onto a grand piano and have the correct chord come out. Code is ideas precisely formulated while prompts are half formed wishes and prayers.
This is actually my theory of the future. Basically, the ability to multiply your own effectiveness is now directly dependent on your ability to express ideas in simple plain English very quickly and precisely.
I’m attracted to this theory in part because it applies to me. I’m a below average coder (mostly due to inability to focus on it full time) and I’m exceptionally good at clear technical writing, having made a living off it much of my life.
The present moment has been utterly life changing.
What is a "deep problem" and what was the cadence with which we addressed these kinds of "deep problems" prior to 2023, when ChatGPT first went mainstream?
For a very tiny slice of these deep problems and how they were addressed, you can review the usenix conferences and the published papers there.
https://www.usenix.org/publications/proceedings
I've been a Usenix reviewer twice, once as a program chair (I think that's what they call the co-leaders of a PC?). So this doesn't clarify anything for me.
To out it more clearly. You take a domain (like OS security, perfomance, and administration) and you’ll find those kinds of problems that people feel important to share solutions with each other. Solutions that are not trivially found. Findings you can be proud your name is attached with.
And then you have something like the LLM craze where while it’s new, it’s not improving any part of the problem it’s supposed to solve, but instead is creating new ones. People are creating imperfect solutions to those new problems, forgetting the main problem in the process. It’s all vapourware. Even something like a new linter for C is more of a solution to programmer’s productivity than these “skills”
OK: I think I have decisively established my Usenix bona fides here, and I'm repeating my original question: what is the cadence at which we resolved "deep question" prior to the era of LLMs? (It began in 2023.)
>they make some todo app, then they post the same video with a new language completely forgetting they've done this already 6 times
I don't see how this is bad. Technology makes iterative, marginal improvements over time. Someone may make a video tomorrow claiming a great new frontend framework, even though they made that exact video about Nextjs, or React before that, or Angular, or JQuery, or PHP, or HTML.
>Something in tech went very wrong at some point, and as soon as money men flood the field we get announcments like this
If it weren't for the massive money being poured into AI, we'd be stuck with GPT-3 and Claude 2. Sure, they release some duds in the tooling department (although I think Skills are good, actually) but it's hardly worthy of this systemic rot diagnosis you've given.
I do not feel the same way. This looks easy to use and useful. I don’t think every problem needs to be a ‘deep problem’. There’s so many practical steps to get to
> People want to log in to an account, tell the thing to do something, and the system figures out the rest.
At a glance, this seems to be a practical approach to building up a personalized prompting stack based on the things I commonly do.
I’m excited about it.
Well, we're still early days and we don't know what works.
It might be superficial but it's still state of the art.
Hypothetically, AI coding could completely absorb all that surface level iteration & posturing.
If agentic coding of good quality becomes too cheap to meter, all that is left are the deep problems.
I'm not sure what you mean.
What is the "real problem"?
In the pursuit of making application development more productive, they ARE solving real problems with mcp servers, skills, custom prompts, etc...
The problems are context dilution, tool usage, and awareness outside of the llm model.
> The problems are context dilution, tool usage, and awareness outside of the llm model.
These is accidental complexity. You’ve already decided on a method and instead of solving the main problem, you are solving the problems associated with the method. Like deciding to go in space with a car and trying to strap a rocket onto it.
Yes, people should be building applications on top of this.
If that's true, why do leadership, VCs, and eventually either the acquiring company or the public markets keep falling for it then?
As the old adage goes: "Don't hate the player, hate the game?"
To actually respond: this isn't for the median user. This is for the 1% user to set up useful tools to sell to the median user.
> If that's true, why do leadership, VCs, and eventually either the acquiring company or the public markets keep falling for it then?
If I had to guess, it would be because greed is a very powerful motivator.
> As the old adage goes: "Don't hate the player, hate the game?"
I know this advice is a realistic way of getting ahead in the world, but it's very disheartening and long term damaging. Like eating junk food every day of your life.
[flagged]
These are all tools for advanced users of LLMs, I've already built a couple MCPs for clients... you might not have a use for them... but there are niches already getting a lot out of them
> People want to log in to an account, tell the thing to do something, and the system figures out the rest.
For consumers, yes. In B2B scenarios more complexity is normal.
So these skills are effectively JIT context injections. Is that about right?
From the docs:
"Skills work through progressive disclosure—Claude determines which Skills are relevant and loads the information it needs to complete that task, helping to prevent context window overload."
So yeah, I guess you're right. Instead of one humongous AGENTS.md, just packaging small relevant pieces together with simple tools.
It's all jit context
Yes, that's a good way of describing them.
Yes
So far I am in the skeptic camp on this. I don't see it adding a lot of value to my current claude code workflow which already includes specialized agents and a custom mcp to search indexed mkdocs sites that effectively cover the kinds of things I would include in these skills file. Maybe it winds up being a simpler, more organized way to do some of this, but I am not particularly excited right now.
I also think "skills" is a bad name. I guess its a reference to the fact that it can run scripts you provide, but the announcement really seems to be more about the hierarchical docs. It's really more like a selective context loading system than a "skill".
I'm inclined to agree. I've read through the Skill docs and it looks like something I've been doing all along - though I informally referred to it as the "Table of Contents" approach.
Over time I would systematically create separate specialized docs around certain topics and link them in my CLAUDE.md file but noticeably without using the "@" symbol which to my understanding always causes CLAUDE to ingest the linked files resulting in unnecessarily bloating your prompt context.
So my CLAUDE md file would have a header section like this:
It seems like this is less of a breakthrough and more an iterative improvement towards formalizing this process from a organizational perspective.How consistently do you find that Claude Code follows your documentation references? Like you work on a CSS feature and it goes to ADDING_CSS.md? I run into issues where it sometimes skips my imperative instructions.
It's funny you mention this - for a while I was concerned that CC wasn't fetching the appropriate documentation related to the task at hand (coincidentally this was around Aug/Sept when Claude had some serious degradation issues [1]), so I started adding the following to the beginning of each specialized doc file:
These days I do find that the TOC approach works pretty well though I'll probably swap them over to Skills to see if the official equivalent works better.[1] https://www.anthropic.com/engineering/a-postmortem-of-three-...
For me, it’s pretty reliable until a chat grows too long and it drifts too far away from the start where it reviewed the TOC
I just tag all the relevant documentation and reference code at the beginning of the session
That's exactly what it is - formalizing and creating a standard induces efficiency. Along with things like AGENTS.md, it's all about standardization.
What bugs me: if we're optimizing for LLM efficiency, we should use structured schemas like JSON. I understand the thinking about Markdown being a happy medium between human/computer understanding but Markdown is non-deterministic for parsing. Highly structured data would be more reliable for programmatic consumption while still being readable.
In general, markdown refers to CommonMark and derivatives now. I’d be surprised if that wasn’t the case here.
> and a custom mcp to search indexed mkdocs sites that effectively cover the kinds of things I would include in these skills file
Search and this document base pattern are different. In search the model uses a keyword to retrieve results, here the model starts from a map of information, and navigates it. This means it could potentially keep context better, because search tools have issues with information fragmentation and not seeing the big picture.
if you've ever worked with Excel + Python, I think this example will drive home the value a bit:
https://github.com/anthropics/skills/blob/main/document-skil...
There are many edge cases when writing / reading Excel files with Python and this nails many of them.
I manually select my context* (like a caveman) and clear it often. I feel like I have a bit more control and grounding this way.
*I use a TUI to manage the context.
Reminds me of xml vs json. Xml was big and professional, json was simple and easy to use. We all know who won...
Funny thing, pseudo-XML is going through a big resurgence right now, because models love it, while they seriously struggle with JSON.
I'd be really interested in what you mean. Are the any studies that quantify this difference in model performance when using JSON or XML? What could be a good intuition for why there might be a big difference? If XML is better than JSON for LLMs, why isn't everyone and the grandma recommending me to use XML instead of JSON? Why is Google Gemini API offering structured output only with JSON schema instead of XML schema?
I don't know if the XML is better than JSON thing still holds with this year's frontier models, but it was definitely a thing last year. Here's Anthropic's documentation about that: https://docs.claude.com/en/docs/build-with-claude/prompt-eng...
Note that they don't actually suggest that the XML needs to be VALID!
My guess was that JSON requires more characters to be escaped than XML-ish syntax does, plus matching opening and closing tags makes it a little easier for the LLM not to lose track of which string corresponds to which key.
the Qwen team is still all in on XML and they make a good case for it
Can you please provide a source? I'd love to know their exact reasoning and/or evidence that XML is the way to go.
(1) JSON requires lots of escape characters that mangle the strings + hex escapes and (2) it's much easier for model attention to track when a semantic block begins and ends when it's wrapped by the name of that section
<instructions>
...
...
</instructions>
can be much easier than
{
"instructions": "..\n...\n"
}
especially when there are newlines, quotes and unicode
Thanks for the reply, that part about the models attention is pretty interesting!
I would suspect that a single attention layer won't be able to figure out to which token a token for an opening bracket should attend the most to. Think of {"x": {y: 1}} so with only one layer of attention, can the token for the first opening bracket successfully attend to exactly the matching closing bracket?
I wonder if RNNs work better with JSON or XML. Or maybe they are just fine with both of them because a RNN can have some stack-like internal state that can match brackets?
Probably, it would be a really cool research direction to measure how well Transformer-Mamba hybrid models like Jamba perform on structured input/output formats like JSON and XML and compare them. For the LLM era, I could only find papers that do this evaluation with transformer-based LLMs. Damn, I'd love to work at a place that does this kind of research, but guess I'm stuck with my current boring job now :D Born to do cutting-edge research, forced to write CRUD apps with some "AI sprinkled in". Anyone hiring here?
HTML? Is the main advantage of XML for understandability the labeled closing tags? Lisp has the same struggle too?
Tangent: the fact XHTML didn't gain traction is a mistake we've been paying off for decades.
Browser engines could've been simpler; web development tools could've been more robust and powerful much earlier; we would be able to rely on XSLT and invent other ways of processing and consuming web content; we would have proper XHTML modules, instead of the half-baked Web Components we have today. Etc.
Instead, we got standards built on poorly specified conventions, and we still have to rely on 3rd-party frameworks to build anything beyond a toy web site.
Stricter web documents wouldn't have fixed all our problems, but they would have certainly made a big impact for the better.
What is pseudo-XML?
Looks like XML but isn't actually valid XML. This for example:
If you ask an LLM for the title, author and body it will give you the right answer, even though that is not a valid XML document.If XML had been like this from the start, it might have won.
Just look at HTML vs XHTML.
> Almost everything I might achieve with an MCP can be handled by a CLI tool instead.
That's the pull quote right there.
We've just started to roll out our MCP Servers and if Anthropic and the community has already moved on, we'll wait till all this churn subsides till switching over next time.
I don’t really see how this replaces MCP tbh.
MCP gives the LLM access you your APIs. These skills are just text files with context about how to perform specific tasks.
You don't need MCP if you can instead drop in a skill markdown file that says "to access the GitHub API, use curl against api.github.com and send the GITHUB_API_KEY environment variable in the authorization header. Here are some examples. Consult github-api.md for more."
> You don't need MCP
Depends on who the user is...
A difference/advantage of MCP is that it can be completely server-side. Which means that an average person can "install" MCP tools into their desktop or Web app by pointing it to a remote MCP server. This person doesn't want to install and manage skills files locally. And they definitely don't want to run python scripts locally or run a sandbox vm.
That's going to be a lot less efficient context-wise and computing-wise than using either a purpose-built MCP or skill based around executing a script.
Am I the only person left that is still impressed that we have a natural language understanding system so good that its own tooling and additions are natural language?
I still can't believe we can tell a computer to "use playwright Python to test this new feature page" and it will figure it out successfully most of the time!
Impressing, but I can't believe we went from fixing bugs to coffee-grounds-divination-prompt-guessing-and-tweaking when things don't actually go well /s
Strong agree here
Is any of it even churn? I feel like almost everything is still relevant, basically everything was a separate card which they're using to build up a house. Even RAG still has it's place
Now wherever they're able to convert that house of cards into a solid foundation or it eventually spectacularly falls over will have to be seen over the next decade.
One idea I'm toying with right now: My company has some internal nodejs libraries, some of which are quite large (think: codegen API library, thousands of types). When a new version is published: task claude code to distill the entire library down to a skill, then bundle that in with the npm package. Then when the package is installed, postinstall a script that copies the skill from the dependency to the parent project that's doing the installing.
I'm still getting this set up, so I'm not sure yet if it'll lead to better outcomes. I'd say, there's reason to believe it might, but there's also reasons to believe it won't.
If the library is like 50,000 lines of code long, thousands of types, hundreds of helper functions, CC could just look into the node_modules folder and bundle all of this into its context. But, this might not be feasible, or be expensive; so the SKILL.md distills things down to help it get a high level understanding faster.
However, the flip side of that is: What if its too general? What if CC needs specific implementation details about one specific function? Is this actually better than CC engaging a two-step process of (1) looking at node_modules/my-lib/index.ts or README.md, get that high level understanding, then (2) look at node_modules/my-lib/specificFunction.ts to get the specific intel it needs? What value did the SKILL.md actually convey?
My feeling is that this concept of "human-specified context-specific skills" would convey the most value in situations where the model itself is constrained. E.g. you're working with a smaller open source model that doesn't have as comprehensive intrinsic knowledge of some library's surface, or doesn't have the context windows of larger models. But, for the larger models... its probably just better to rely on in-built model knowledge and code-as-context.
Maybe the skill could have references to the code. Like if everything else fails it can look at the implementation.
Intuitively it feels like if you need to look at the implementation to understand the library then the library is probably not well documented/structured.
I think the ability to look into the code should exist but shouldn't be necessary for the majority of use cases
What useful Claude Code Skills have you made so far?
I'm not sure I totally see the disdain for MCP.
These skills also rely on tools, having a standard way to add tools to an agent is good, otherwise each agent has its own siloed tools.
But also, I remember MCP having support for resources no? These skills are just context (though I guess it can include executable scripts to help, but the article said most skills are just an instruction markdown).
So you could already have an MCP expose skills as resources, and you could already have the model automatically decide to include a resource based on the resource description.
Now I understand to add user created resources is pretty annoying, and maybe it's not great for people to easily exchange themselves resources. But you assume that Slack would make the best context to generate Slack gifs, and then expose that as a resource from their MCP along with a prompt template and some tools to help or to add the gif to your slack as emojis or what not.
You could even add Skills specifically to MCP, to that you can expose a combination of context resources and scripts or something.
That said, I agree that the overabundance for tools as MCP is not that good, some tools are so powerful, they can cover 90% of all other tool use cases. Bash tool can do so many things. A generic web browsing tool as well. That's been the problem with MCP as tools.
Skills appear to be a good technique as a user, and I actually already did similar things. I like formalizing it, and it's nice that Claude Code now automatically scans and includes their description header for the model to know it can load the rest. That's the exciting part.
But I do feel for the more general public, MCP resources + prompts + tools are a better avenue.
I'm a little confused about the relationship of Skills and just plain tools. It seems like a lot of skills might just be tools. Or, they might rely on calling sets of tools with some instructions.
But aren't the tool definitions and skill definitions in different places? How do you express the dependency? Can skills say they require command line access, python, tool A, and tool B, and when you load the skill it sets those as available tool calls?
Skills seem to have code packaged in. They have added pdf, excel support using skills.
Seems like it will synergize perfectly with what Microsoft has released shortly ago.
https://github.com/microsoft/amplifier
MCP gives me early days gRPC vibes - when the protocol felt heavy and the toolings had many sharp edges. Even today, after many rounds of improvements, people often eschew gRPC and Protobuf.
Similarly, my experience writing and working with MCPs has been quite underwhelming. It takes too long to write them and the workflow is kludgy. I hope Skills get adopted by other model vendors, as it feels like a much lighter way to save and checkout my prompts.
What do you find difficult about writing MCPs? I havent worked much with them but it seems easy enough. I made an MCP that integrates with jenkins so I can deploy code from claude (not totally useful cause can just make a few cli commands), but still took like 10 mins and works flawlessly.
But I suppose yeah, why not just write clis and have an llm call them
Writing one off simple MCPs are quite easy but once you need to manage a fleet of them, it gets hairy.
- Writing manifests and schemas by hand takes too long for small or iterative tools. Even minor schema changes often require re-registration or manual syncing. There’s no good “just run this script and expose it” path yet.
- Running and testing an MCP locally is awkward. You don’t get fast iteration loops or rich error messages. When something fails, the debugging surface is too opaque - you end up guessing what part broke (manifest, transport, or tool logic).
- There’s no consistent registry, versioning, or discovery story. Sharing or updating MCPs across environments feels ad hoc, and you often have to wire everything manually each time.
With Skills you need none of them - instruct to invoke a tool and be done with it.
> - There’s no consistent registry, versioning, or discovery story. Sharing or updating MCPs across environments feels ad hoc, and you often have to wire everything manually each time.
yes there is:
https://github.com/modelcontextprotocol/registry
and here you have frontends for the registry https://github.com/modelcontextprotocol/registry/blob/main/d...
Everything is new so we are all building it in real time. This used to be the most fun times for a developer: new tech, everybody excited, lots of new startups taking advantage of new platforms/protocols.
Stumbled on to a similar concept a few years ago, I think there was a paper around some kind of proto code skills for Minecraft around then... awesome to see Anthropic pursue this direction
https://www.prefix.app/blog/is-chat-the-future
They're completely different things. MCP is a standardised lightweight integration interface and skills are dynamic rules.
That doesn't mean you can't say one is a bigger deal than the other.
If I learned how to say "hello" in French today and also found out I have stage 4 brain cancer, they are completely different things but one is a bigger deal than the other.
Reposting a comment I made when it was posted 10 hours ago:
As someone who is looking into MCP right now, I'd love to hear what folks with experience in both of these areas think.
My first impressions are that MCP has some advantages:
- around for longer and has some momentum
- doesn't require a dev envt on the computer to be effective
- cross-vendor support
- more sophistication for complex use cases (enterprise permissions can be layered on because of OAuth support)
- multiple transport layers gives flexibility
Skills seems to have advantages too, of course:
- simpler
- easier to iterate
- less context used
I think if the other vendors follow along with skills, and we expect every computer to have access to a development environment, skills could win the day. HTML won over XML and REST won over SOAP, so simple often wins.
But the biggest drawback of MCP, the context window overuse, can be remediated by having MCP specific sub-agents that are interacted with using a primary agent, rather than injecting each MCP server into the main context.
Yeah, I think you're exactly right about MCP's advantages - especially that MCP doesn't require access a full sandboxed Linux environment to work!
I still plan to ship an MCP for one of my products to let it interact with the wider ecosystem, but as an end-user I'm going to continue mostly using Claude Code without them.
I don't understand why tool calling isn't the primitive. A "skill" could have easily just been a tool that an agent can build an execute in its own compute space.
I really don't see why we need two forms of RCP...
> A "skill" could have easily just been a tool
The problem skills solve is initial mapping of available information. A tool might hide what information it contains until used, this approach puts a table of contents for docs in the context, so the model is aware and can navigate to desired information as needed.
If you look through the example skills most of them are about calling existing tools, often Python via the terminal.
Take a look at this one for working with PDFs for example: https://github.com/anthropics/skills/blob/main/document-skil... - it includes a quickstart guide to using the Python pypdf module, then expands on that with some useful scripts for common patterns.
For me Claude Skills is just a proof that we're making RAG unnecessary difficult to use. Not tech wise, but UX wise. But if we can fix that, the need for Claude Skills will go away.
Where Claude Skills is better than MCP? It's easier to produce a Claude SKills. It's just text. Everyone can write it. But it's dependant on the environment alot. Eg: when you need to have certain tools available for it to work. How do you automate sandbox setup with that? Even that, are you sure it's the right version for it to use, etc...
Skills again seem more like making MCPs more accessible for everyday users. It will again need to see evolution - things that are missing - containers for skills (you will need beyond a folder for sharing a toolchain with skills) - the orchestration and picking up of the skill is left to the LLM (the synthesis is done via the context shared in the skill - this feels a very sub-optimal pattern at the moment - low control)
Few others like versioning, access control for tools which are missing.
Real world skills come not just from practice but are opionated workflows built with specific toolchains too.
IMO, this is half-ass engineered at the moment
I am really confused on how this compares to resources/prompts in the MCP Spec that a MCP server can expose.
I get it no one is using that, but like this just sounds like a rehash?
https://modelcontextprotocol.io/specification/2025-06-18/ser... https://modelcontextprotocol.io/specification/2025-06-18/ser...
Skills are massively easier to understand and implement than MCP resources/prompts.
Do skills enable the AI to do things within it's contact, or does it grant the ability to process things in an entirely separate context.
I was considering the merits of a news article analysts skill that processed the content adopting a variety of personas of opinions and degrees of adversarial attitudes in isolated contexts then bought the responses together into a single context to arbitrate the viewpoints impartially.
Or even a skill for "where did this claim originate?". I'd love an auto Snopes skill.
It's amazing how writing optimal guides for llms or humans is exactly the same. Even if a human doesn't want to use a LLM these skills markdowns could work as tutorials.
> LLMs know how to call cli-tool --help, which means you don’t have to spend many tokens describing how to use them—the model can figure it out later when it needs to.
I do not understand this. cli-tool --help outputs still occupies tokens right?
Absolutely, but it occupies them later and only when needed. This is what I think they're driving at here.
but why can't I do the same with mcp? I just create a help() function that returns the help info?
That hypothetical might be fine, but MCPs do much more than that and their catalogs can be enormous. Here are some popular MCPs and the amount of context they eat before you've done anything with them:
Github: 39 tools, 30K. I had to disable it.
Does anybody have a good SKILLS.md file we can study?
Absolutely! I now have Claude Code using `gh` and haven't missed the MCP. (If there are better CLI alternatives, I'd love to hear about them.)
you can; i've seen people put mcp access behind another mcp. I'm not sure how much success they got from it though
Most people still don't understand MCP properly and think it's about adding 50 tools to every call. Proper MCP servers and clients implement tools/listChanged
This is like MCPs on demand. I've switched a few MCP servers to just scripts because they were taking like 20% of the context just by being there. Now, I can ask the model to use X script, which it reads and uses only if needed.
What is MCP?:
https://modelcontextprotocol.io/docs/getting-started/intro
It seems to me that MCP and Skills are solving 2 different problems and provide solutions that compliment each other quite nicely.
MCP is about integration of external systems and services. Skills are about context management - providing context on demand.
As Simon mentions, one issue with MCP is token use. Skills seem like a straightforward way to manage that problem: just put the MCP tools list inside a skill where they use no tokens until required.
You could even have a Skill that says "First call 'enable-mcp jira' to enable the Jira MCP, now here's how to use that: ..."
I tried. Claude Code can't enable a disabled MCP on the fly.
Just to echo the point of MCP, they seem cool, but in my experience just using a CLI is orders of magnitude faster to write and to debug (I just run the CLI myself, put test in the code, etc...)
Jup and it doesn't bloat the context unnecessarily. The agent can call --help when it needs it. Just imagine a kubectl MCP with all the commands as individual tools, doesn't make any sense whatsoever.
> and it doesn't bloat the context unnecessarily.
And, this is why I usually use simple system prompts/direct chat for "heavy" problems/development that require reasoning. The context bloat is getting pretty nutty, and is definitely detrimental to performance.
Do you have any information e.g. blog posts on this pattern?
The point of this stuff is to increase reliability. Sure the LLM has a good chance of figuring out the skill by itself, the idea is that its less likely to fuck up with the skill though. This is an engineering advancement that makes it easier for businesses to rely on LLMs for routine stuff with less oversight.
Does anyone else still keep everything super simple by just including .md files in project for guardrails and do all edits manually by file upload and clipboard?
Sure, it's tedious, but I get to directly observe every change.
I guess I just don't feel comfortable with more black box magic beyond the LLM itself.
Don’t quite follow.
So you only use the chat UI and copy and paste from there?
Or do you use CC but don’t let it automatically update files?
I just upload the file I want to change and the file with associated tests. Anything more than that and the ROI goes way down on slop generated. But I'll admit Gemini at least is getting good at "implement the function with a TODO describing what needs to be done". No need for anything but the Gemini file uploader, and copy pasting the results if they look good.
Isn't this just repackaged RAG pretty much?
Depends which definition of RAG you're talking about.
RAG was originally about adding extra information to the context so that an LLM could answer questions that needed that extra context.
On that basis I guess you could call skills a form of RAG, but honestly at that point the entire field of "context engineering" can be classified as RAG too.
Maybe RAG as a term is obsolete now, since it really just describes how we use LLMs in 2025.
I’d rather say you can use skills to do RAG by supplying the right tools in the skill (“here’s how you query our database”).
Calling the skill system itself RAG is a bit of a stretch IMO, unless you end up with so many skills that their summaries can’t fit in the context and you have to search through them instead. ;)
Seems like that’s it? You give it a knowledge base of “skills” aka markdown files with contexts in them and Claude figures out when to pull them into context.
I think RAG is out of favor because models have a much larger context these days, so the loss of information density from vectorization isn't worth it, and doesn't fetch the information surrounding what's retrieved.
That's true if you use RAG to mean "extra context found via vector search".
I think vector search has shown to be a whole lot more expensive than regular FTS or even grep, so these days a search tool for the model which uses FTS or grep/rg or vectors or a combination of those is the way to go.
Seems similar to Amp's "toolboxes" from August:
https://ampcode.com/news/toolboxes
Those are nice too — a much more hackable way of building simple personal tools than MCP, with less token and network use.
Soon we will have a skills marketplace that allows you to sell you skill of being able to make the AI more skilled.
I can’t quite put my finger on why this doesn’t excite me. First, there was MCP — and honestly, my feelings about it were mixed. In the end, it’s just tool calling, only now the tools are hosted somewhere else instead of being implemented as function calls. Then comes this new “skill” concept. Isn’t it just another round of renaming what we already do? We’ve been using Markdown files to guide our coding agents for ages. Overall, it feels like yet another case of “a new spin on an old thing.”
FYI the gif is not visible to us. In the shared chat:
> Perfect! I've created your Slack GIF! > [Files hidden in shared chats]
It's in the blog post (it's rubbish) - here's a direct URL: https://static.simonwillison.net/static/2025/skills_vs_mcps....
This seems like Claude sub agents. I personally had tried to use sub agents for repetitive tasks with very detailed instructions, but was never able to get it to work really well in different contexts.
I wonder if one could write a skill called something like "Ask the damn user" that the model could use when e.g. all needed files are not in the context.
Might not need a skill for that, Claude Code just added it as a top level feature: https://twitter.com/trq212/status/1979215901577875812
Maybe I'm just dumb, but it isn't clear how to manage my skills for Claude Code. All the docs are for the web version.
Eg I don't know where to put a skill that can be used across all projects
Here's the relevant documentation: https://docs.claude.com/en/docs/claude-code/skills#personal-...
You can drop the new markdown files directly into your ~/.claude/skills directory.
Thank you! For some reason claude told me to mess around in ~/.claudeconfig or some made up directory or another
As a bonus you can get Claude to create the SKILL.md file itself.
Which kind of sounds pointless if Claude already knows what to do, why create a document?
My examples - I interact with ElasticSearch and Claude keeps forgetting it is version 5.2 and we need to use the appropriate REST API. So I got it to create a SKILL.md about what we used and provided examples.
And the next one was getting it to write instructions on how to use ImageMagik on Windows, with examples and trouble shooting, rather than it trying to use the Linux versions over and over.
Skills are the solution the problems I have been having. And came just at the right time as I already spent half of last week making similar documents !
Do Claude Skills enable anything that wasn't possible before?
No. They make context usage more efficient, but they're not providing new capabilities that didn't previously exist in an LLM system that could run commands on a computer.
Perhaps not, but a big benefit according to OP is the smaller number of tokens / context pollution skills introduce v. MCP.
I read the article yesterday and said the same thing.
You don’t need skills to build progressively detailed context. You can do this with MCP.
>kills are folders that include instructions, scripts, and resources that Claude can load when needed.
I hate how we are focusing on just adding more information to look up maps, instead of focusing on deriving those maps from scratch.
I think skills and also MCP to an extent are a UI failure. It's AI after all. It should be able to intelligently adapt to our workflow, maybe present it's findings and ask for feedback. If this means it's stored as a thing called "skills" that's perfectly fine, but it should just be an implementation detail.
I don't mean to be unreasonable, but this is all about managing context in a heavy and highly technical manner. Eventually models must be able to augment their training / weights on the fly, customizing themselves to our needs and workflow. Once that happens (it will be a really big deal), all of the time you've spent messing around with context management tools and procedures will be obsolete. It's still good to have fundamental understanding though!
Creating Planning Agents seems to force that approach.
Rather than define skills and execution agents, letting a meta-Planning agent determine the best path based on objectives.
What's the cost from all the "figuring out" the model has to do to use a skill vs MCP?
If this is true, what is the Playwright Skill that we can all enjoy with low token usage and the same value?
I've been telling Claude Code to "use Playwright Python" and getting good results out of it from just those three words.
If I'm going to fix my car, I might read the car manual first. But if I'm not fixing my car and just walking past the bookshelf, I'll only see the title of the book.
> inject a prompt based on the description
how are skills different from SlashCommand tool in claude-code then?
So, the skills files can be used by Codex, right?
Yes, even though Codex CLI doesn't (yet) know what a skill is. Tell it "Go read skills/pdfs/skill.md and then create me a PDF that does ..." and see what happens.
A step away from AI
You can drive a car, even though you may not know exactly how every part works, right?
I wonder if this is a way to make Claude smarter about tool use. As I try more and more of CC, one thing that's been frustrating me is that it often falls over when trying to call some command line tool. Like it will try to call tool_1 which is not on the system, and it errors out, and then it tries to call tool_2, but it's in the wrong working directory or something, so it fails too, then it changes directory and tries to call tool_2 again, and it passes the wrong command line parameters or something, then it tries again... All the while, probably wasting my limited token budget on all of its fuckups. It's gotten to the point where I let it do the code changes, but when it decides it wants to run something in the shell (or execute the project's executable), I just interrupt it and do the rest of the command line work myself.
Is this different from AGENTS.MD by OPENAI
Yes, it's different. AGENTS.md is a single file that is read by the LLM when the session starts. Skills are multiple .md files which are NOT read on startup - instead, the LLM harness scans them for descriptions and populates the context with those, such that it can later on decide to read the full skills/pdf/skill.md file because the user indicates they want to do something (like create a PDF) for which the skill is relevant.
Plus skill folders can also include additional reference documents and executable scripts, like this one here: https://github.com/anthropics/skills/tree/main/document-skil...
Skills feel so similar to specialized agents / sub-agentd, which we see some of already. I could be under appreciating the depth, but it feels like the main work here is the UX affordance: maybe like a mod launcher for games: 'what mods/prompts do you want to run with?'
I really enjoyed seeing Microsoft Amplifier last week, which similarly has a bank of different specialized sub-agents. These other banks of markdowns that get turned on for special purposes feels very similar. https://github.com/microsoft/amplifier?tab=readme-ov-file#sp... https://news.ycombinator.com/item?id=45549848
One of the major twists with Skills seems to be that Skills also have a "frontmatter YAML" that is always loaded. It still sounds like it's at least somewhat up to the user to engage the Skills, but this "frontmatter" offers… something, that purports to help.
> There’s one extra detail that makes this a feature, not just a bunch of files on disk. At the start of a session Claude’s various harnesses can scan all available skill files and read a short explanation for each one from the frontmatter YAML in the Markdown file. This is very token efficient: each skill only takes up a few dozen extra tokens, with the full details only loaded in should the user request a task that the skill can help solve.
I'm not sure what exactly this does but conceptually it sounds smart to have a top level awareness of the specializations available.
I do feel like I could be missing some significant aspects of this. But the mod-launched paradigm feels like a fairly close parallel?
I just read this seperately through Google Discover, and I don't quite get amazing newness of it - if anything, it feels to me like of an abstraction of MCP - there is nothing I see here that couldnt be replaced by a series of MCP tools - for example, the author mentions "a current trick" often used is including a markdown file with details / instructions around a task - this can be handled with an mcp server prompt (or even a 'tool' that just returns the desired text) If you've fooled around as much as I have, you realize in the prompt itself you can mention other available tools the LLM can use - defining a workflow, if you will, including tools for actual coding and validation like the author mentions they included in their skill.
Furthermore, with all the hype around MCP servers and simply the amount of servers now existing, do they just immediately come obsolete? its also a bit fuzzy to me just exactly how an LLM will choose an MCP tool over a skill and vice versa...
a skill is a markdown & yaml file on your filesystem. an MCP server is accessed over http, and defines a way to authenicate users.
if you're running an MCP file just to expose local filesystem resources, then it's probably obsolete. but skills don't cover a lot of the functionality that MCP offers.
Hmmm... If skills could create Skills, then evaluate the success of those generated Skills and updating as needed, in a loop, seems like an interesting thing to try. It could be like a form of code creation caching. (Or a Skynet in the making).
The question is whether the analysis of all the Skill descriptions is faster or slower than just rewriting the code from scratch each time. Would it be a good or bad thing if an agent has created thousands of slightly varied skills.
I think Skills might be coming from an AI Safety and AI Risk sort of place, or better, alignment with the company's goals. The motivation could be to reduce the amount of ad-hoc instruction giving that can be done, on the fly, in the favor of doing this at a slower pace, making it more subject to these checks. It does fit well into what a lot of agents are doing, though, which makes it more palatable for the average AI user.
Basically the way it would work is, in the next model, it would avoid role playing type instructions, unless they come from skill files, and internally they would keep track of how often users changed skill files, and it would be a TOS violation to change it too often.
Though I gave up on Anthropic in terms of true AI alignment long ago, I know they are working on a trivial sort of alignment where it prevents it from being useful for pen testers for example.
Related:
Claude Skills
https://news.ycombinator.com/item?id=45607117
my understanding is the "skills" are manually written
can we prompt claude to edit and improve it's skills too?
Yes, Anthropic even published a skill for that: https://github.com/anthropics/skills/blob/main/skill-creator...
I have been using Claude since few month and couldn't switch to another one regarding performance
is there a community share directory of Claude skills somewhere yet? sounds promising
Reads like a sloppily disguised influencer blog post really. The feature has just been released and the author already knows they are going to be a "bigger deal than the MCP"? Although he proceeds to discredit the MCP as too complex to be able to take off towards the end of the article. Not very useful examples either, which author publishes with a bit of distancing, disclaimer almost - who the hell has time to create lousy slack gifs? And if you do, you'll probably not want the lousy ones. So yeah, let's not immediately declare this as "great", lets see how it fares in general and revisit in a few months.
What is an "influencer blog post"?
I didn't say that MCP was too complex to take off - it clearly took off despite that complexity.
I'm predicting skills will take off even more. If I'm wrong feel free to call me out in a few months time for making a bad prediction!
> I didn't say that MCP was too complex to take off - it clearly took off despite that complexity.
I did not say you said exactly that either. Read more carefully. I said you were discrediting them by implying they were too complex to take off due to resource and complexity constraints. It's clearly stated in the relevant section of your post (https://simonwillison.net/2025/Oct/16/claude-skills/#skills-...)
>I'm predicting skills will take off even more. If I'm wrong feel free to call me out in a few months time for making a bad prediction!
How the hell can you predict they will "take off even more" when the feature is accessible for barely 24 hours at this point? You don't even have a basic referent frame or at least a statistical usage sample for making such a statement. That's not a very reliable prediction, is it?
> How the hell can you predict they will "take off even more" when the feature is accessible for barely 24 hours at this point?
That's what a prediction IS. If I waited until the feature had proven itself it wouldn't be much of a prediction.
The feature has also been live for more than 24 hours. I reverse-engineered it a week ago: https://simonwillison.net/2025/Oct/10/claude-skills/ - and it's been invisibly powering the PDF/DOC/XLS/PPT creation features on https://claude.ai/ since those launched on the 9th September: https://www.anthropic.com/news/create-files
> That's what a prediction IS. If I waited until the feature had proven itself it wouldn't be much of a prediction.
No, that's merely guessing mate. Predictions are, at least in modern meaning, based on at least some data and some extrapolation model that more or less reliably predicts the development of your known dataset into future (uknown) values. I don't see you presenting either in your post, so that's not predicting, that's in the best of cases guessing, and in the worst of cases irresponsible distribution of Anthropic's propaganda.
Wrong
Bro is paid to write this
I'm not. Here are my disclosures: https://simonwillison.net/about/#disclosures
And if my disclosures aren't enough for you, here's the FTC explaining how it would be illegal for an AI vendor to pay someone to write something like this without both sides disclosing the relationship: https://www.ftc.gov/business-guidance/resources/ftcs-endorse...
> I have not accepted payments from LLM vendors, but I am frequently invited to preview new LLM products and features from organizations that include OpenAI, Anthropic, Gemini and Mistral, often under NDA or subject to an embargo. This often also includes free API credits and invitations to events.
You don't need money, just incentives and network effects
I quote, "Bro is paid to write this".
And yeah, blogging does kind of work on incentives. If I write things and get good conversations as a result, I'm incentivized to write more things. If I write something and get silence then maybe I won't invest as much time in the future.
[dead]
[dead]