For me, 2023 was an entire year of weekly demos that now looking back at were basically a "Look at this dank prompt I wrote" followed by thunderous applause from the audience (which was mostly, but not exclusively, upper management)
Hell man, I attended a session at an AWS event last year that was entirely the presenter opening Claud and writing random prompts to help with AWS stuff... Like thanks dude... That was a great use of an hour. I left 15 minutes in.
We have a team that's been working on an "Agent" for about 6 months now. Started as prompt engineering, then they were like "no we need to add more value" developed a ton of tools and integrations and "connectors" and evals etc. The last couple of weeks were a "repivot" going back full circle to "Lets simplify all that by prompt engineering and give it a sandbox environment to run publicly documented CLIs. You know, like Claude Code"
The funny thing is I know where it's going next...
I can't take anyone seriously who uses prompt engineering unironically. I see those emails come through at work and all I can do is roll my eyes and move on
But did it work? This is the sticking point with me now. I've seen slides, architecture diagrams, job descriptions, roadmaps and other docs now from about a dozen different companies doing AI Agent projects. And while it's completely feasible to build the systems they're describing, what I have not seen yet is evidence of any of them working.
When you press them on this, they have all sorts of ideas like a judge LLM that takes the outputs, comes up with modified SOPs and feeds those into the prompts of the mixture-of-experts LLMs. But I don't think that works, I've tried closing that loop and all I got was LLMs flailing around.
"In tech, often an expert is someone that know one or two things more than everyone else. When things are new, sometimes that's all it takes."
It's no surprise it's just prompt engineering. Every new tech goes that way - mainly because innovation is often adding one or two things more the the existing stack.
They make too many mistakes for me to rely on their summaries for consulting. Repeating one of those is a great way to embarrass yourself in front of a client and damage your reputation
I'm always more interested in the 'less is more' strategy, taking things away from the already hyper-complicated stack, reviewing first principles and simplifying for the same effectiveness. This is ever more rare.
I think this sense of “less is more” roughly means refactoring? I think the reason these go south so often is because we’re likely moving complexity around rather than removing it. Removing a layer from the stack means making a different layer more complex to take over for it.
Why is this post published in November 2025 talking about GPT-4?
I'm suspicious of their methodology:
> Open DevTools (F12), go to the Network tab, and interact with their AI feature. If you see: api.openai.com, api.anthropic.com, api.cohere.ai You’re looking at a wrapper. They might have middleware, but the AI isn’t theirs.
But... everyone knows that you shouldn't make requests directly to those hosts from your web frontend because doing so exposes your API key in a way that can be stolen by attackers.
If you have "middleware" that's likely to solve that particular problem - but then how can you investigate by intercepting traffic?
Something doesn't smell right about this investigation.
It does later say:
> I found 12 companies that left API keys in their frontend code.
Providers such as OpenAI have client keys so your client application can call the providers directly. Many developers prefer them as they save roundtrip costs and latency.
The money is easy to come by because wealthy investors, while they don't want to pay any more in taxes, are desperate to find possible returns in an economy that sucks outside of ballooning healthcare and the AI bubble... not because they need the money but because NUMBER MUST GO UP.
And more so than even most VC markets, raising for an "AI" company is more about who you know than what results you can show.
If anyone is actually showing significant results, where's the actual output of the AI-driven software boom (beyond just LLMs making coders more efficient by being a better google)? I don't see any real signs of it. All I see is people doing after market modifications on the shovels, I've yet to see any of the end users of these shovels coming down from the hills with sacks of real gold.
> Eg how do you build representative evals and measure forward progress?
This assumes that those companies do evaluations. In my experience, seeing a huge amount of internal AI projects at my company (FAANG), there's not even 5% that have any sort of eval in place.
Yeah, I believe that lots of startups don’t have evals either, but as soon as you get paying customers you’re gonna need something to prevent accidentally regressing as you tune your scaffolding, swap in newer models, etc.
This is a big chasm that I could well believe a lot of founders fail to cross.
It’s really easy to build an impressive-looking tech demo, much harder to get and retain paying customers and continuously improve.
But! Plenty of companies are actually doing this hard work.
IMO nothing wrong with it. Just misleading to call yourself an AI company when you actually make a CRUD app. I think if these companies were honest about what they’re doing nobody would be upset. There’s an obvious deliberate attempt to give an impression of technical complexity/competence that isn’t there.
I assume it works because the ecosystem is, as you say, so new. Non-technical observers have trouble distinguishing between LLM companies and CRUD companies
I don’t have a problem with a company calling themselves an AI company if they use OpenAI behind the scenes.
The thing that annoys me is when clearly non-AI companies try to brand themselves as AI: like how Long Island Iced Tea tried to brand themselves as a blockchain company or WeWork tried to brand themselves as a tech company.
If we’re complaining about AI startups not building their own in house LLMs, that really just seems like people who are not in the arena criticizing those who are.
They should compete in the crucible of the free market. If prompt engineering is indeed a profitable industry then so be it. I for one am just tired of all things software being dominated by this hype funded AI frenzy.
AI is an ecosystem that includes users at all layers and innovation at all those layers - infra, databases, models, agents, portals, UIs and so on. What do you mean by doing AI?
Btw, the so-called AI devs or model developers are "users" of the databases and all the underlying layers of the stack.
Do you have to use an open source model instead of an API? Do you have to fine tune it? How much do you need to? Do you have to create synthetic data for training? Do you have to gather your own data? Do you need to train from scratch? Do you need to come up with a novel architecture?
10 years ago if you gathered some data and trained a linear model to determine the likelihood your client would default on their loan and used that to decide how much, if any, to loan them- you're absolutely doing "actual AI"
---
Any other software you could ask all the same questions but with using a high level language, frameworks, dependencies, hiring consultants / firm, using an LLM, no-code, etc.
At what point does outsourcing some portion of the end product become no longer doing the thing?
Isn’t this true for most start ups out there even before AI? Some sort of bundle/wrapper around existing technology? I worked auditing companies and we used a particular system that cost tens of thousands of dollars per user per year and we charged customers up to a million to generate reports with it. The platform didn’t have anything proprietary other than the UX, under the hood it was a few common tools some of them open source. We could have created our own product but our margins were so huge it didn’t make sense to setup a software development unit not even bother with outsourcing it.
This post hovers on something I came to the week after ChatGPT dropped in 2023.
If an AI company has an AGI, what incentive do they actually have to sell it as a product, especially if it’s a 10x cost/productivity/reliability silicon engineer? Just undercut the competition by building their services from scratch.
That is lower than I expected. There are just a handful of companies that create llms. They are all more ir less similar. So all automation is in using them, which is prompt engineering if you see that way.
The bigger question is, this is the same story with apps on mobile phones. Apple and google could easily replicate your app if they wanted to and they did too. That danger is much higher with these ai startups. The llms are already there in terms of functionality, all the creators figured out the value is in vertical integration and all of them are doing it. From that sense all these startups are just showing them what to build. Even perplexity and cursor are in danger.
Do not forget that a product idea needs to meet a certain ROI to be stolen. Big Tech won't go after opportunities that do not generate billion-level revenue. This leaves some room for applications where you can earn decent money.
That is not how companies work. What you said may be true for the immediate short term but over time every team in the company needs to show improvement and set yearly milestones. All these startups will then become functionality they want to push that quarter. Yes it doesn’t mean the death of the startup but a struggle
I can believe that many startups are doing prompt engineering and agents but in a sense this like saying 90% of startups are using cloud providers mainly AWS and Azure.
There is absolutely no point of reinventing the wheel to create a generic LLM, spend fortune to run GPUs while there are providers giving this power cheaply
In addition, there may be value in getting to market quickly with existing LLM providers, proving out the concept, then building / training specialized models if needed once you have traction.
It is beyond annoying that the article is totally generated by AI. I appreciate the author (hopefully) spending effort in trying to figure out the AI systems, but the obviously-LLM non-edited content makes me not trust the article.
What makes you believe that anything in the article is real?
The author seems to not exist and it's unclear where the data underlying the claims is even coming from since you can't just go and capture network traffic wherever you like.
This makes no sense to me? I don't understand why a company, even if it is using GPT or Claude as their true backend, is going to leave API calls in Javascript that anyone can find. Sure maybe a couple would, but 73% of those tested?
Surely your browser is going to talk to their webserver, and yup sure it'll then go off and use Claude etc then return the answer to you, but surely they're not all going to just skin an easily-discoverable website over the big models?
I don't believe any of this. Why aren't we questioning the source of how the author is apparently able to figure out some sites are using REDIS etc etc?
It's very confusing in the text of the article, at times it sounds like the author is using heuristic methods (like timings) but at times it sounds like they somehow have access to network traffic from the provider's backend. I could 100% believe that a ton of these companies are making API calls to providers directly from an SPA, but the flow diagrams in the article seem to specifically rule that out as an explanation.
I might allow them more credit if the article wasn't in such an obviously LLM-written style. I've seen a few cases like this, now, where it seems like someone did some very modest technical investigation or even none at all and then prompted an LLM to write a whole article based on it. It comes out like this... a whole lot of bullet points and numbered lists, breathless language about the implications, but on repeated close readings you can't tell what they actually did.
It's unfortunate that, if this author really did collect this data, their choice to have an LLM write the article and in the process obscure the details has completely undermined their credibility.
It makes perfect sense when you consider that the average Javascript developer does not know that business logic can exist outside of React components.
Prompt engineering isn't as simple as writing prompts in english. It's still engineering data flow, when data is relevant, systems that the AI can access and search, tools that the AI can use, etc.
Is it, though? Apparently the current best practice is just to allow the LLM untethered access to everything and try to control access by preventing prompt injection...
Well it took me 2 full-time weeks to properly implement a RAG-based system so that it found actually relevant data and did not hallucinate. Had to:
- write an evaluation pipeline to automate quality testing
- add a query rewriting step to explore more options during search
- add hybrid BM-25+vector search with proper rank fusion
- tune all the hyperparameters for best results (like weight bias for bm25 vs. vector, how many documents to retrieve for analysis, how to chunk documents based on semantics)
- parallelize the search pipeline to decrease wait times
- add moderation
- add a reranker to find best candidates
- add background embedding calculation of user documents
- lots of failure cases to iron out so that the prompt worked for most cases
There's no "just give LLM all the data", it's more complex than that, especially if you want best results and also full control of data (we run all of that using open source models because user data is under NDA)
I already had experience with RAG before so I had a head start. You're right that it's not rocket science, but it's not just "press F to implement the feature" either
P.S. No vibe coding was used. I only used LLM-as-a-judge to automate quality testing when tuning the parameters, before passing it to human QA
You still need to find the correct data, and get it to the LLM. IMO, a lot of it is data engineering work with API calls to an LLM as an extra step. I'm currently doing a lot of ETL work with Airflow (and whatever data {warehouses, lakes, bases} are needed) to get the right data to a prompt engineering flow. The prompt engineering flow is literally a for loop of Google Docs in a Google Drive that non-tech people, but domain experts in their field, can access.
It's up to the domain experts and me to understand where giving it data will tone down the hallucinative nonsense an LLM puts out, and where we should not give data because we need the problem solving skills of the LLM itself. A similar process is for tool-use, which in our case are pre-selected Python scripts that it is allowed to run.
Nah. There's no such thing as prompt engineering. It doesn't exist. Engineering involves applying scientific principles to solve real world problems. There are no clear scientific principles to apply here. It's all instinct, hunches, educated guesses, and heuristics with maybe some sort of feedback loop. And that's fine, it can still produce useful results. Just don't call it engineering. Maybe artisanal prompt crafting? Or prompt alchemy?
Where is this guy sitting that he is able to collect all of this data? And why is he able to release it all in a blog post? (my company wouldn't allow me to collect and release customer data like this.)
Another red flag with the article is that the author's LinkedIn profile link at the bottom leads to a non-existing page.
Is Teja Kusireddy a real person? Or is this maybe just an experiment from some AI company (or other actor) to see how far they can push it? A Google search by that name doesn't find anything not related to the article.
The article should be flagged. Otoh, this should get discussed.
It sounds like some of these companies call the OpenAI or Anthropic APIs directly from their frontend. Later, the author also mentions "response time patterns for every major AI API," so maybe there's some information about the backend leaking that way even if the API calls are bridged.
But I'd like to know an actual answer to this, too, especially since large parts of this post read as if they were written by an LLM.
> It sounds like some of these companies call the OpenAI or Anthropic APIs directly from their frontend.
Which would be a major security hole. And sure, lots of startups have major security holes, but not enough that he could come up with these BS statistics.
I'm a little dismayed at how high up this has been voted given the data is guaranteed to be made up.
Yeah, TBH my BS detector is going off because this article never explains how he is able to intercept these calls.
To be able to call the OpenAI directly from the front end, you'd need to include the OpenAI key, which would be a huge security hole. I don't doubt that many of these companies are just wrappers around the big LLM providers, but they'd be calling the APIs from their backend where nothing should be interceptable. And sure, I believe a few of them are dumb enough to call OpenAI from the frontend, but that would be a minority.
This whole thing smells fishy, and I call BS unless the author provides more details about how he intercepted the calls.
> Yeah, TBH my BS detector is going off because this article never explains how he is able to intercept these calls.
You mean, except for explaining what he's doing 4-5 times? He was literally repeating himself restating it. Half the article is about the various indicators he used. THERE'S EXAMPLES OF THEM.
There's this bit:
> Monitored their network traffic for 60-second sessions
> Decompiled and analyzed their JavaScript bundles
Also there's this whole explanation:
> The giveaways when I monitored outbound traffic:
> Requests to api.openai.com every time a user interacted with their "AI"
> Monitored their network traffic for 60-second sessions
How can he monitor what's going on between a startup's backend and OpenAI's server?
> The truth is just an F12 away
That's just not how this works. You can see the network traffic between your browser and some service. In 12 cases that was OpenAI or similar. Fine. But that's not 73%. What about the rest? He literally has a diagram claiming that the startups contact an LLM service behind the scenes. That's what's not described, how does he measure that?
You are not bothered by the only sign that the author even exist is this one article and the previous one? Together with the claim to be a startup founder? Anybody can claim that. It doesn't automatically provide credibility.
I believe he's saying that a large number of the startups he tested did not have their own backend to mediate. It was literally direct front-end calls to openai. And if this sounds insane, remember that openai actually supports this: https://platform.openai.com/docs/api-reference/realtime-sess...
Presumably OpenAI didn't add that for fun, either, so there must be non-zero demand for it.
Im also wondering how he is able to see calls to AI providers directly in the browser, client side api calls? Thats strange to me. Also how is he able to peer into the rag architectures? I don’t get that, maybe GpT4.1 allows unauthenticated requests? Is there an OAuth setup that allows client side requests to OpenAI?
The article is basically a description of where to look for clues. Perhaps they've contracted with some of these companies and don't want to break some NDA by naming them, but still know a lot about how they work.
The thing that drives me nuts is that most "AI Applications" are just adding crappy chat to a web app. A true AI application should have AI driven workflows that automate boring or repetitive tasks without user intervention, and simplify the UI surface of the application.
I'm firmly of the opinion that, as a general rule, if you're directly embedding the output of a model into a workflow and you're not one of a handful of very big players, you're probably doing it wrong.[1]
If we overlook that non-determinism isn't really compatible with a lot of business processes and assume you can make the model spit out exactly what you need, you can't get around the fact that an LLM is going to be a slower and more expensive way of getting the data you need in most cases.
LLMs are fantastic for building things. Use them to build quickly and pivot where needed and then deploy traditional architecture for actually running the workloads. If your production pipeline includes an LLM somewhere in the flow, you need to really, seriously slow down and consider whether that's actually the move that makes sense.
[1] - There are exceptions. There are always exceptions. It's a general rule not a law of physics.
True, even OpenAI built their castle in nVidia's kingdom. And nVidia built their castle in TSMC's kingdom. And TSMC built their castle in ASML's kingdom.
My question with these is always "what happens when the model doesn't need prompting?". For example, there was a brief period where IDE integrations for coding agents were a huge value add - folks spent eons crafting clever prompts and integrations to get the context right for the model. Then... Claude, Gemini, Codex, and Grok got better. All indications are that engineers are pivoting to using foundation model vended coding toolchains and their wrappers.
This is rapidly becoming a more extreme version of the classic "what if google does that?" as the foundation model vendors don't necessarily need to target your business or even think about it to eat it.
5% prompt engineering, 95 % orchestration and no you can not vibe code your way and clone my apps, I have paid subscriptions why aren't you doing it then? Oh because models degrade severely over 500 lines.
LLMs is the new AJAX. AJAX made pages dynamic, LLMs make pages interactive.
The reason is because VC needs to show that their flagship investments have "traction" so they manufacture ecosystem interest by funding and encouraging ecosystem product usage. It's a small price to pay. If someone builds a wrapper that gets 100 business users then token use on the foundation layer gets that passed down. Big scheme.
This a kind of global app store all over again, where all these companies are clients of only few true ai companies and try to distinguish themselves in the bounds of the underlying models and apis, just like apps were trying to find niches in the bound of apis and exposed hw of underlying iphones. Apis versions bugs are now models updates. And of course, all are at the mercy of their respective Leviathan.
it's wild, I work with some fortune 500 engineers who don't spend a lot of time prompting AI, and just a quick few prompts like "output your code in <code lang="whatever">...</code>" tags" — a trick that most people in the prompting world are very familiar with, but outside of the bubble virtually no one knows about — can improve AI code generation outputs to almost 100%.
It doesn't have to be this way and it won't be this way forever, but this is the world we live in right now, and it's unclear how many years (or weeks) it'll be until we don't have to do this anymore
Another slop article that could probably be good if the author was interested in writing it, but instead they dumped everything into an LLM and now I can't tell what's real and what's not and get no sense of what parts of the findings the author found important or interesting compared to what other parts.
I have to wonder, are people voting this up after reading the article fully, and I'm just wrong and this sort of info dump with LLM dressing is desirable? Or are people skimming it and upvoting? Or is it more of an excuse to talk about the topic in the title? What level of cynicism should I be on here, if any?
Interesting article and plausible conclusions but the author needs to provide more details to back up their claims. The author has yet to release anything supporting their approach on their Github.
https://github.com/tejakusireddy
Honestly it sounds about right: at the end of the day, most companies will always be an interesting UI and workflow around some commodity tech, but, that's valuable. Not all of it may be defensible, but still valuable.
I decided to flag this article because it has to be fake.
The author never explains how he is able to intercept these API calls to OpenAI, etc. I definitely believe tons of these companies are just wrappers, but they'd be doing the "wrapping" in their backend, with only a couple (dumb) companies doing the calls directly to OpenAI from the front end where they could be traced.
This article is BS. My guess is it was probably AI generated because it doesn't make any sense.
I find it shocking that most comments here just accept the article as fact and discuss the implications.
The message might not even be wrong. But why is everybody's BS detection on ice in the AI topic space? Come one people, you can all do better than this!
Thanks for flagging. Though whenever such a made up thing is flagged, we lose the chance to discuss this (meta) topic. People need to be aware how prevalent this is. By just hiding it every time we notice, we're preventing everybody to read the kind of comment you wrote and recalibrate their BS-meters.
And 99% of software development is just feeding data into a complier. But that sort of misses the point doesn't it?
AI has created a new interface with a higher level abstraction that is easier to use. Of course everyone is going to use it (how many people still code assembler?).
The point is what people are doing with it is still clever (or at least has potential to be).
I disagree. Software development is not limited to LLM-type responses and incorporates proper logic. You are at the mercy of LLM when you build an "AI" interface for the LLM apis. 73% these "AI" companies will collapse when the original API providing company comes up with a simple option (Gemini for Sheets, for example), they will disappear. It is already happening.
AI software is not long-lasting; its results are not deterministic.
First, someone has to develop those models and that's currently being done with VC backing. Second, running those models is still not profitable, even if you self host (obviously true because everything is self hosted eventually).
Burning VC money isn't a long term business model and unless your business is somehow both profitable on Llama 8b (or some such low power model) _and_ your secret sauce can't be easily duplicated, you're in for a rough ride.
The only barrier between AI startups at this point is access to the best models, and that's dependent on being able to run unprofitable models that spend someone else's money.
Investing in a startup that's basically just a clever prompt is gambling on the first mover's advantage because that's the only advantage they can have.
What differentiates a product is not the commodity layer it’s built on (databases, programming languages, open source libraries, OS apis, hosting, etc) but how it all gets glued together into something useful and accessible.
It would be a bad strategy for most startups to do anything other than prompt engineering in their AI implementations for the same reason it would be a bad idea for most startups to write low-level database code instead of SQL queries. You need to spend your innovation tokens wisely.
One of the biggest problems frontier models will face going forward is how many tasks require expertise that cannot be achieved through Internet-scale pre-training.
Any reasonably informed person realizes that most AI start-ups looking to solve this are not trying to create their own pre-trained models from scratch (they will almost always lose to the hyperscale models).
A pragmatic person realizes that they're not fine-tuning/RL'ing existing models (that path has many technical dead ends).
So, a reasonably informed and pragmatic VC looks at the landscape, realizes they can't just put all their money into the hyperscale models (LP's don t want that) and they look for start-ups that take existing hyperscale models and expose them to data that wasn't in their pre-Training set, hopefully in a way that's useful to some users somewhere.
To a certain extent, this study is like saying that Internet start-ups in the 90's relied on HTML and weren't building their own custom browsers.
I'm not saying that this current generation of start-ups will be successful as Amazon and Google, but I just don't know what the counterfactual scenario is.
The question that isn't answered completely in the article is how useful are the pipelines for these startups? The article certainly implies that for at least some of these startups there very little value add in the wrapper.
Got any links to explanations of why fine tuning open models isn’t a productive solution?
Besides renting the GPU time, what other downsides exist on today’s SOTA open models for doing this?
When people are desperate to invest, they often don't care what someone actually can do but more about what they claim they can do. Getting investors these days is about how much bullshit you can shovel as opposed to how much real shit you shoveled before.
Prompt engineering and using an expensive general model in order to prove your market, and then putting in the resources to develop a smaller(cheaper) specialized model seems like a good idea?
Are people down to have a bunch of specialized models? The expectation set by OpenAI and everyone else has set is that you will have one model that can do everything for you.
It’s like how we’ve seen basically all gadgets meld into the smart phone. People don’t have Garmin’s and beepers and clock radios anymore (or dedicated phones!). It’s all on the screen that fits in your pocket. Any would-be gadget is now just an app
Having everything in my phone is a great convenience for me as a consumer. Pockets are small, and you only have a small number of them in any outfit.
But cloud services run in... the cloud. It's as big as you need it to be. My cloud service can have as many backing services as I want. I can switch them whenever I want. Consumers don't care.
"One model that can do everything for you" is a nice story for the hyper scalers because only companies of their size can pull that off. But I don't think the smartphone analogy holds. The convenience in that world is for the the developers of user-facing apps. Maybe some will want to use an everything model. But plenty will try something specialized. I expect the winner to be determined by which performs better. Developers aren't constrained by size or number of pockets.
> The expectation set by OpenAI and everyone else has set is that you will have one model that can do everything for you.
I don’t think that’s the expectation set by “everyone else” in the AI space, even if it arguably is for OpenAI (which has always, at least publicly, had something of a focus on eventual omnicapable superintelligence.) I think Google Antigravity is evidence of this: there’s a main, user selected coding model, but regardless of which coding model is used, there are specialized models used for browser interaction and image generation. While more and more capabilities are at least tolerably supported by the big general purpose models, the range of specialized models seems to be increasing rather than decreasing, and seems likely that, for conplex efforts, combining a general purpose model with a set of focussed, task-specific models will be a useful approach for the forseeable future.
I think of the foundational model like CPUs. They're the core of powerful, general-purpose computers, and will likely remain popular and common for most computing solutions. But we also have GPUs, microcontrollers, FPGAs, etc. that don't just act as the core of a wide variety of solutions, but are also paired alongside CPUs for specific use cases that need specialization.
Foundational models are not great for many specific tasks. Assuming that one architecture will eventually work for everything is like saying that x86/amd64/ARM will be all we ever need for processors.
Specialized models are cheaper. For a company you're looking for some task that needs to be done millions of times per day, and where general models can do it well enough that people will pay you more than the general model's API cost to do it. Once you've validated that people will pay you for your API wrapper you can train a specialized model to increase your profit and if necessary lower your pricing so people won't pay OpenAI directly.
It's probably the direction it will go, at least in the near term.
It seems right now like there is a tradeoff between creativity and factuality, with creative models being good at writing and chatting, and factuality models being good at engineering and math.
It why we are getting these specific -code models.
It's really an implementation decision. The end user doesn't need to know their request is routed to a certain model. A smaller specialized model might have identical output to a larger general purpose model, but just be cheaper and faster to run.
I still use the Garmin I bought in 2010. I refuse to turn on my phone's location tracking. Also the single-purpose interface is better and safer than switching between apps and contexts on a general purpose device.
Not really because the money involved is relatively small. The bubble is where people are using D8s to push square kilometers of dirt around for data centers that need new nuclear power plants built, to house millions of obsolete Nvidia GPUs that need new fabs constructed to make, using yet more D8s..
Not to be too pedantic, but code is a kind of specification. I think making the blanket statement "Prompt is code" is inaccurate but there does exist a methodology of writing prompts as if they are specifications that can reliably converted to computational actions, and I believe we're heading toward that.
For me, 2023 was an entire year of weekly demos that now looking back at were basically a "Look at this dank prompt I wrote" followed by thunderous applause from the audience (which was mostly, but not exclusively, upper management)
Hell man, I attended a session at an AWS event last year that was entirely the presenter opening Claud and writing random prompts to help with AWS stuff... Like thanks dude... That was a great use of an hour. I left 15 minutes in.
We have a team that's been working on an "Agent" for about 6 months now. Started as prompt engineering, then they were like "no we need to add more value" developed a ton of tools and integrations and "connectors" and evals etc. The last couple of weeks were a "repivot" going back full circle to "Lets simplify all that by prompt engineering and give it a sandbox environment to run publicly documented CLIs. You know, like Claude Code"
The funny thing is I know where it's going next...
I can't take anyone seriously who uses prompt engineering unironically. I see those emails come through at work and all I can do is roll my eyes and move on
what level of seriousness does "context engineering" deserve?
But did it work? This is the sticking point with me now. I've seen slides, architecture diagrams, job descriptions, roadmaps and other docs now from about a dozen different companies doing AI Agent projects. And while it's completely feasible to build the systems they're describing, what I have not seen yet is evidence of any of them working.
When you press them on this, they have all sorts of ideas like a judge LLM that takes the outputs, comes up with modified SOPs and feeds those into the prompts of the mixture-of-experts LLMs. But I don't think that works, I've tried closing that loop and all I got was LLMs flailing around.
Wait ...
You mean teams are already building their own solutions to existing solutions? Software development will live on in eternity then.
They are just reselling OpenAI subscriptions at a markup. Surprise!
A long time ago a mentor of mine said,
"In tech, often an expert is someone that know one or two things more than everyone else. When things are new, sometimes that's all it takes."
It's no surprise it's just prompt engineering. Every new tech goes that way - mainly because innovation is often adding one or two things more the the existing stack.
I remember being told that the secret of good consultancy is knowing what to read on your way to the meeting
very true. and these days take a lot less effort than before getting llms to summarize shit which is one task they inarguably shine on
They make too many mistakes for me to rely on their summaries for consulting. Repeating one of those is a great way to embarrass yourself in front of a client and damage your reputation
It’s easy to underestimate the amount of testing “just” prompt/context engineering takes to get above average results.
And then you need to see what variations work best with different models.
My POCs for personal AI projects take time to get this right. It’s not like the API calls are the hard portion of the software.
I'm always more interested in the 'less is more' strategy, taking things away from the already hyper-complicated stack, reviewing first principles and simplifying for the same effectiveness. This is ever more rare.
I think this sense of “less is more” roughly means refactoring? I think the reason these go south so often is because we’re likely moving complexity around rather than removing it. Removing a layer from the stack means making a different layer more complex to take over for it.
Why is this post published in November 2025 talking about GPT-4?
I'm suspicious of their methodology:
> Open DevTools (F12), go to the Network tab, and interact with their AI feature. If you see: api.openai.com, api.anthropic.com, api.cohere.ai You’re looking at a wrapper. They might have middleware, but the AI isn’t theirs.
But... everyone knows that you shouldn't make requests directly to those hosts from your web frontend because doing so exposes your API key in a way that can be stolen by attackers.
If you have "middleware" that's likely to solve that particular problem - but then how can you investigate by intercepting traffic?
Something doesn't smell right about this investigation.
It does later say:
> I found 12 companies that left API keys in their frontend code.
So that's 12 companies, but what about the rest?
Providers such as OpenAI have client keys so your client application can call the providers directly. Many developers prefer them as they save roundtrip costs and latency.
https://platform.openai.com/docs/api-reference/realtime-sess...
Do those still only work for the voice APIs though?
I've been hoping they would extend that to other APIs, and I'd love to see the same kind of mechanism for other providers.
That's a big llm smell when it mentions old models like GPT-4
> just prompt engineering
This dismisses a lot of actual hard work. The scaffolding required to get SOTA performance is non-trivial!
Eg how do you build representative evals and measure forward progress?
Also, tool calling, caching, etc is beyond what folks normally call “prompt engineering”.
If you think it’s trivial though - go build a startup and raise a seed round, the money is easy to come by if you can show results.
prompt engineering + CRUD is likely much more fair.
And many companies are "just CRUD".
The money is easy to come by because wealthy investors, while they don't want to pay any more in taxes, are desperate to find possible returns in an economy that sucks outside of ballooning healthcare and the AI bubble... not because they need the money but because NUMBER MUST GO UP.
And more so than even most VC markets, raising for an "AI" company is more about who you know than what results you can show.
If anyone is actually showing significant results, where's the actual output of the AI-driven software boom (beyond just LLMs making coders more efficient by being a better google)? I don't see any real signs of it. All I see is people doing after market modifications on the shovels, I've yet to see any of the end users of these shovels coming down from the hills with sacks of real gold.
What’s your opinion on any of the plethora of unicorns in domain-specific AI, like Harvey? ($100m ARR from what I could find on a cursory search)
https://www.forbes.com/sites/iainmartin/2025/10/29/legal-ai-...
This is like when people say that you should short the market if you think its going to crash. People have different risk premiums.
> Eg how do you build representative evals and measure forward progress?
This assumes that those companies do evaluations. In my experience, seeing a huge amount of internal AI projects at my company (FAANG), there's not even 5% that have any sort of eval in place.
Yeah, I believe that lots of startups don’t have evals either, but as soon as you get paying customers you’re gonna need something to prevent accidentally regressing as you tune your scaffolding, swap in newer models, etc.
This is a big chasm that I could well believe a lot of founders fail to cross.
It’s really easy to build an impressive-looking tech demo, much harder to get and retain paying customers and continuously improve.
But! Plenty of companies are actually doing this hard work.
See for example this post: https://news.ycombinator.com/item?id=46025683
This should be the top comment.
But ... what else should they be doing? What's the expectation here?
For example, in the 90's, a startup that offered a nice UI for a legacy console based system, would have been a great idea. What's wrong with that?
IMO nothing wrong with it. Just misleading to call yourself an AI company when you actually make a CRUD app. I think if these companies were honest about what they’re doing nobody would be upset. There’s an obvious deliberate attempt to give an impression of technical complexity/competence that isn’t there.
I assume it works because the ecosystem is, as you say, so new. Non-technical observers have trouble distinguishing between LLM companies and CRUD companies
So, what is an AI company? What do they sell? AI models? agents? Are they building these from scratch or using some pre-trained base models/agents?
I don’t have a problem with a company calling themselves an AI company if they use OpenAI behind the scenes.
The thing that annoys me is when clearly non-AI companies try to brand themselves as AI: like how Long Island Iced Tea tried to brand themselves as a blockchain company or WeWork tried to brand themselves as a tech company.
If we’re complaining about AI startups not building their own in house LLMs, that really just seems like people who are not in the arena criticizing those who are.
They should be creating tiny domain specific models, because someday OpenAI will stop selling dollars for a nickel.
They should compete in the crucible of the free market. If prompt engineering is indeed a profitable industry then so be it. I for one am just tired of all things software being dominated by this hype funded AI frenzy.
I think the point is more to point out the inherent danger presented when your platform is just a wrapper, but is being sold as more than that.
A lot of these startups have little to no moat, but they're raking in money like no one's business. That's exactly what happened in the dotcom bubble.
Actual AI. Not being "AI" users.
Being LLM users would be fine but they pretend they do AI.
AI is an ecosystem that includes users at all layers and innovation at all those layers - infra, databases, models, agents, portals, UIs and so on. What do you mean by doing AI?
Btw, the so-called AI devs or model developers are "users" of the databases and all the underlying layers of the stack.
Everything is a spectrum.
At what point can you claim that you did "it"?
Do you have to use an open source model instead of an API? Do you have to fine tune it? How much do you need to? Do you have to create synthetic data for training? Do you have to gather your own data? Do you need to train from scratch? Do you need to come up with a novel architecture?
10 years ago if you gathered some data and trained a linear model to determine the likelihood your client would default on their loan and used that to decide how much, if any, to loan them- you're absolutely doing "actual AI"
---
Any other software you could ask all the same questions but with using a high level language, frameworks, dependencies, hiring consultants / firm, using an LLM, no-code, etc.
At what point does outsourcing some portion of the end product become no longer doing the thing?
What’s actual AI in this context?
Isn’t this true for most start ups out there even before AI? Some sort of bundle/wrapper around existing technology? I worked auditing companies and we used a particular system that cost tens of thousands of dollars per user per year and we charged customers up to a million to generate reports with it. The platform didn’t have anything proprietary other than the UX, under the hood it was a few common tools some of them open source. We could have created our own product but our margins were so huge it didn’t make sense to setup a software development unit not even bother with outsourcing it.
This post hovers on something I came to the week after ChatGPT dropped in 2023.
If an AI company has an AGI, what incentive do they actually have to sell it as a product, especially if it’s a 10x cost/productivity/reliability silicon engineer? Just undercut the competition by building their services from scratch.
That is lower than I expected. There are just a handful of companies that create llms. They are all more ir less similar. So all automation is in using them, which is prompt engineering if you see that way.
The bigger question is, this is the same story with apps on mobile phones. Apple and google could easily replicate your app if they wanted to and they did too. That danger is much higher with these ai startups. The llms are already there in terms of functionality, all the creators figured out the value is in vertical integration and all of them are doing it. From that sense all these startups are just showing them what to build. Even perplexity and cursor are in danger.
Do not forget that a product idea needs to meet a certain ROI to be stolen. Big Tech won't go after opportunities that do not generate billion-level revenue. This leaves some room for applications where you can earn decent money.
That is not how companies work. What you said may be true for the immediate short term but over time every team in the company needs to show improvement and set yearly milestones. All these startups will then become functionality they want to push that quarter. Yes it doesn’t mean the death of the startup but a struggle
I can believe that many startups are doing prompt engineering and agents but in a sense this like saying 90% of startups are using cloud providers mainly AWS and Azure.
There is absolutely no point of reinventing the wheel to create a generic LLM, spend fortune to run GPUs while there are providers giving this power cheaply
In addition, there may be value in getting to market quickly with existing LLM providers, proving out the concept, then building / training specialized models if needed once you have traction.
See: https://en.wikipedia.org/wiki/Lean_startup
It is beyond annoying that the article is totally generated by AI. I appreciate the author (hopefully) spending effort in trying to figure out the AI systems, but the obviously-LLM non-edited content makes me not trust the article.
What makes you believe that anything in the article is real?
The author seems to not exist and it's unclear where the data underlying the claims is even coming from since you can't just go and capture network traffic wherever you like.
A little due diligence please.
This makes no sense to me? I don't understand why a company, even if it is using GPT or Claude as their true backend, is going to leave API calls in Javascript that anyone can find. Sure maybe a couple would, but 73% of those tested? Surely your browser is going to talk to their webserver, and yup sure it'll then go off and use Claude etc then return the answer to you, but surely they're not all going to just skin an easily-discoverable website over the big models?
I don't believe any of this. Why aren't we questioning the source of how the author is apparently able to figure out some sites are using REDIS etc etc?
It's very confusing in the text of the article, at times it sounds like the author is using heuristic methods (like timings) but at times it sounds like they somehow have access to network traffic from the provider's backend. I could 100% believe that a ton of these companies are making API calls to providers directly from an SPA, but the flow diagrams in the article seem to specifically rule that out as an explanation.
I might allow them more credit if the article wasn't in such an obviously LLM-written style. I've seen a few cases like this, now, where it seems like someone did some very modest technical investigation or even none at all and then prompted an LLM to write a whole article based on it. It comes out like this... a whole lot of bullet points and numbered lists, breathless language about the implications, but on repeated close readings you can't tell what they actually did.
It's unfortunate that, if this author really did collect this data, their choice to have an LLM write the article and in the process obscure the details has completely undermined their credibility.
It makes perfect sense when you consider that the average Javascript developer does not know that business logic can exist outside of React components.
Prompt engineering isn't as simple as writing prompts in english. It's still engineering data flow, when data is relevant, systems that the AI can access and search, tools that the AI can use, etc.
Is it, though? Apparently the current best practice is just to allow the LLM untethered access to everything and try to control access by preventing prompt injection...
Well it took me 2 full-time weeks to properly implement a RAG-based system so that it found actually relevant data and did not hallucinate. Had to:
- write an evaluation pipeline to automate quality testing
- add a query rewriting step to explore more options during search
- add hybrid BM-25+vector search with proper rank fusion
- tune all the hyperparameters for best results (like weight bias for bm25 vs. vector, how many documents to retrieve for analysis, how to chunk documents based on semantics)
- parallelize the search pipeline to decrease wait times
- add moderation
- add a reranker to find best candidates
- add background embedding calculation of user documents
- lots of failure cases to iron out so that the prompt worked for most cases
There's no "just give LLM all the data", it's more complex than that, especially if you want best results and also full control of data (we run all of that using open source models because user data is under NDA)
Sounds like you vibe coded a RAG system in two weeks, which isn't very hard. Any startup can do it.
I've debugged single difficult bugs before for two weeks, a whole feature that takes two weeks is an easy feature to build.
I already had experience with RAG before so I had a head start. You're right that it's not rocket science, but it's not just "press F to implement the feature" either
P.S. No vibe coding was used. I only used LLM-as-a-judge to automate quality testing when tuning the parameters, before passing it to human QA
"did not hallucinate"
Sorry to nitpick, but this is not technically possible no matter how much RAG you throw at it. I assume you just mean "hallucinates a lot less"
You're right, bad wording
whoa, two weeks
@apwell23 while the author didn’t say how s/he measured QA, creating the QA process was literally the first bullet.
You still need to find the correct data, and get it to the LLM. IMO, a lot of it is data engineering work with API calls to an LLM as an extra step. I'm currently doing a lot of ETL work with Airflow (and whatever data {warehouses, lakes, bases} are needed) to get the right data to a prompt engineering flow. The prompt engineering flow is literally a for loop of Google Docs in a Google Drive that non-tech people, but domain experts in their field, can access.
It's up to the domain experts and me to understand where giving it data will tone down the hallucinative nonsense an LLM puts out, and where we should not give data because we need the problem solving skills of the LLM itself. A similar process is for tool-use, which in our case are pre-selected Python scripts that it is allowed to run.
can you describe what the usecase is ?
Nah. There's no such thing as prompt engineering. It doesn't exist. Engineering involves applying scientific principles to solve real world problems. There are no clear scientific principles to apply here. It's all instinct, hunches, educated guesses, and heuristics with maybe some sort of feedback loop. And that's fine, it can still produce useful results. Just don't call it engineering. Maybe artisanal prompt crafting? Or prompt alchemy?
Prompt engineering is the new Search Engine Optimization.
Not sure if we called it engineering ten years ago.
Human speech is "engineering data flow"
Painting is "engineering data flow"
Directing a movie is "engineering data flow"
Playing the guitar is "engineering data flow"
This statement merely reveals a bias to apply high value to the word "engineering" and to the identity "engineer".
Ironic in that silicon valley lifted that identity and it's not even legally recognized as a licensed profession.
Imagine you are a top of the line engenier...
Engineering data flow... sure, we all like to use big words.
The new 10x engineering is writing "please don't write bugs" in a markdown file.
Where is this guy sitting that he is able to collect all of this data? And why is he able to release it all in a blog post? (my company wouldn't allow me to collect and release customer data like this.)
Another red flag with the article is that the author's LinkedIn profile link at the bottom leads to a non-existing page.
Is Teja Kusireddy a real person? Or is this maybe just an experiment from some AI company (or other actor) to see how far they can push it? A Google search by that name doesn't find anything not related to the article.
The article should be flagged. Otoh, this should get discussed.
He seems real. Goes by Teja K. Seems a startup founder.
He may be real, but the article is fake BS. There is simply no way he'd be in a position to intercept the calls, and he never explains it.
There is nothing difficult about monitoring network traffic in this way for desktop or native apps.
Did you read the article? It claims to have knowledge of network traffic between the startup's devices and the AI providers' devices.
Do you have any link that supports this?
It sounds like some of these companies call the OpenAI or Anthropic APIs directly from their frontend. Later, the author also mentions "response time patterns for every major AI API," so maybe there's some information about the backend leaking that way even if the API calls are bridged.
But I'd like to know an actual answer to this, too, especially since large parts of this post read as if they were written by an LLM.
> It sounds like some of these companies call the OpenAI or Anthropic APIs directly from their frontend.
Which would be a major security hole. And sure, lots of startups have major security holes, but not enough that he could come up with these BS statistics.
I'm a little dismayed at how high up this has been voted given the data is guaranteed to be made up.
> > It sounds like some of these companies call the OpenAI or Anthropic APIs directly from their frontend.
> Which would be a major security hole.
An officially supported security hole
https://platform.openai.com/docs/api-reference/realtime-sess...
"I found 12 companies that left API keys in their frontend code. I reported them all. None responded."
They claim to have found that.
Yeah, TBH my BS detector is going off because this article never explains how he is able to intercept these calls.
To be able to call the OpenAI directly from the front end, you'd need to include the OpenAI key, which would be a huge security hole. I don't doubt that many of these companies are just wrappers around the big LLM providers, but they'd be calling the APIs from their backend where nothing should be interceptable. And sure, I believe a few of them are dumb enough to call OpenAI from the frontend, but that would be a minority.
This whole thing smells fishy, and I call BS unless the author provides more details about how he intercepted the calls.
> Yeah, TBH my BS detector is going off because this article never explains how he is able to intercept these calls.
You mean, except for explaining what he's doing 4-5 times? He was literally repeating himself restating it. Half the article is about the various indicators he used. THERE'S EXAMPLES OF THEM.
There's this bit:
> Monitored their network traffic for 60-second sessions
> Decompiled and analyzed their JavaScript bundles
Also there's this whole explanation:
> The giveaways when I monitored outbound traffic:
> Requests to api.openai.com every time a user interacted with their "AI"
> Request headers containing OpenAI-Organization identifiers
> Response times matching OpenAI’s API latency patterns (150–400ms for most queries)
> Token usage patterns identical to GPT-4’s pricing tiers
> Characteristic exponential backoff on rate limits (OpenAI’s signature pattern)
Also there's these bits:
> The Methodology (Free on GitHub next week):
> - The complete scraping infrastructure
> - API fingerprinting techniques
> - Response time patterns for every major AI AP
One time he even repeats himself by stating what he's doing as playwright pseudocode, in case plain English isn't enough.
This was also really funny:
> One company’s “revolutionary natural language understanding engine” was literally this: [clientside code with prompt + direct openai API call].
And there's also this bit at the end of the article:
> The truth is just an F12 away.
There's more because LITERALLY HALF THE ARTICLE IS HIM DOING THE THING YOU COMPLAIN HE DIDN'T DO.
In case it's still not clear, he was capturing local traffic while automating with playwright as well as analyzing clientside JS.
> Monitored their network traffic for 60-second sessions
How can he monitor what's going on between a startup's backend and OpenAI's server?
> The truth is just an F12 away
That's just not how this works. You can see the network traffic between your browser and some service. In 12 cases that was OpenAI or similar. Fine. But that's not 73%. What about the rest? He literally has a diagram claiming that the startups contact an LLM service behind the scenes. That's what's not described, how does he measure that?
You are not bothered by the only sign that the author even exist is this one article and the previous one? Together with the claim to be a startup founder? Anybody can claim that. It doesn't automatically provide credibility.
I believe he's saying that a large number of the startups he tested did not have their own backend to mediate. It was literally direct front-end calls to openai. And if this sounds insane, remember that openai actually supports this: https://platform.openai.com/docs/api-reference/realtime-sess...
Presumably OpenAI didn't add that for fun, either, so there must be non-zero demand for it.
>Response times matching OpenAI’s API latency patterns (150–400ms for most queries)
This also matches the latency of a large number of DB queries and non-OpenAI LLM inference requests.
>Token usage patterns identical to GPT-4’s pricing tiers
What? Yes this totally smells real.
He also mentions backoff patterns, which I'm not sure how he'd disambiguate extremely standard backoff in a normal API.
Given the ridiculousness of these claims, I believe there's a reason he didn't include the fingerprinting methodology in this article.
Im also wondering how he is able to see calls to AI providers directly in the browser, client side api calls? Thats strange to me. Also how is he able to peer into the rag architectures? I don’t get that, maybe GpT4.1 allows unauthenticated requests? Is there an OAuth setup that allows client side requests to OpenAI?
Yea I just posted a similar comment. I'm sure some websites just skin OpenAI/Claude etc, but ALL of them? It makes no sense.
There's a link in the preview of TFA that unlocks the rest of the article, looks like this for me:
https://medium.com/@teja.kusireddy23/i-reverse-engineered-20...
The article is basically a description of where to look for clues. Perhaps they've contracted with some of these companies and don't want to break some NDA by naming them, but still know a lot about how they work.
> Perhaps they've contracted with some of these companies and don't want to break some NDA by naming them, but still know a lot about how they work.
This makes literally no sense. Why would any companies (let alone most of them) contract with this guy who seems hell bent on exposing them all.
The article is simple made up, most likely by an LLM.
The thing that drives me nuts is that most "AI Applications" are just adding crappy chat to a web app. A true AI application should have AI driven workflows that automate boring or repetitive tasks without user intervention, and simplify the UI surface of the application.
I'm firmly of the opinion that, as a general rule, if you're directly embedding the output of a model into a workflow and you're not one of a handful of very big players, you're probably doing it wrong.[1]
If we overlook that non-determinism isn't really compatible with a lot of business processes and assume you can make the model spit out exactly what you need, you can't get around the fact that an LLM is going to be a slower and more expensive way of getting the data you need in most cases.
LLMs are fantastic for building things. Use them to build quickly and pivot where needed and then deploy traditional architecture for actually running the workloads. If your production pipeline includes an LLM somewhere in the flow, you need to really, seriously slow down and consider whether that's actually the move that makes sense.
[1] - There are exceptions. There are always exceptions. It's a general rule not a law of physics.
73% of AI startups are building their castle in someone else's kingdom.
It's worse than that, someone else's models, someone else's smartphone operating systems, it's every conceivable disadvantage.
Every city should have its own municipal chip fabrication plant!
If you break up AT&T the Bell System will collapse!
Not sure if you're familiar, but it did collapse. It's all one company again.
-1: there's lots of "kingdoms" (openai, anthropic, google, plus open source) - if one king comes for your castle, you can move in minutes.
True, even OpenAI built their castle in nVidia's kingdom. And nVidia built their castle in TSMC's kingdom. And TSMC built their castle in ASML's kingdom.
lastly we need the FDIC meme "Backed by the full faith and credit of the U.S. Government" for good measure, haha.
TSMC bought a huge chunk of ASML's shares before taking the plunge on EUV -- enough to get them a board seat.
My question with these is always "what happens when the model doesn't need prompting?". For example, there was a brief period where IDE integrations for coding agents were a huge value add - folks spent eons crafting clever prompts and integrations to get the context right for the model. Then... Claude, Gemini, Codex, and Grok got better. All indications are that engineers are pivoting to using foundation model vended coding toolchains and their wrappers.
This is rapidly becoming a more extreme version of the classic "what if google does that?" as the foundation model vendors don't necessarily need to target your business or even think about it to eat it.
5% prompt engineering, 95 % orchestration and no you can not vibe code your way and clone my apps, I have paid subscriptions why aren't you doing it then? Oh because models degrade severely over 500 lines.
LLMs is the new AJAX. AJAX made pages dynamic, LLMs make pages interactive.
I'm surprised by the number of people who are running head first into AI wrapper start-ups.
Either you have a smash-and-grab strategy or you are awful at risk analysis.
Do you want to be right, or do you want to make money? You'll be correct in 5-10 years. Do you wait and do nothing until then?
The reason is because VC needs to show that their flagship investments have "traction" so they manufacture ecosystem interest by funding and encouraging ecosystem product usage. It's a small price to pay. If someone builds a wrapper that gets 100 business users then token use on the foundation layer gets that passed down. Big scheme.
This a kind of global app store all over again, where all these companies are clients of only few true ai companies and try to distinguish themselves in the bounds of the underlying models and apis, just like apps were trying to find niches in the bound of apis and exposed hw of underlying iphones. Apis versions bugs are now models updates. And of course, all are at the mercy of their respective Leviathan.
Flagged. Please don't post items on HN where we have to pay or hand over PII to read it. Thanks.
pls don't create new guidelines
Ditto.
it's wild, I work with some fortune 500 engineers who don't spend a lot of time prompting AI, and just a quick few prompts like "output your code in <code lang="whatever">...</code>" tags" — a trick that most people in the prompting world are very familiar with, but outside of the bubble virtually no one knows about — can improve AI code generation outputs to almost 100%.
It doesn't have to be this way and it won't be this way forever, but this is the world we live in right now, and it's unclear how many years (or weeks) it'll be until we don't have to do this anymore
Another slop article that could probably be good if the author was interested in writing it, but instead they dumped everything into an LLM and now I can't tell what's real and what's not and get no sense of what parts of the findings the author found important or interesting compared to what other parts.
I have to wonder, are people voting this up after reading the article fully, and I'm just wrong and this sort of info dump with LLM dressing is desirable? Or are people skimming it and upvoting? Or is it more of an excuse to talk about the topic in the title? What level of cynicism should I be on here, if any?
Interesting article and plausible conclusions but the author needs to provide more details to back up their claims. The author has yet to release anything supporting their approach on their Github. https://github.com/tejakusireddy
https://archive.ph/Zjs2J
98% of all websites are just database wrappers
2% are unjust?
I don't care how you get to a system that does something useful.
That's actually lower than I would have thought.
73% of startups are just writing computer programs
And 73% of SaaS companies are just CRUD.
Honestly it sounds about right: at the end of the day, most companies will always be an interesting UI and workflow around some commodity tech, but, that's valuable. Not all of it may be defensible, but still valuable.
73% of statistics are wrong
I decided to flag this article because it has to be fake.
The author never explains how he is able to intercept these API calls to OpenAI, etc. I definitely believe tons of these companies are just wrappers, but they'd be doing the "wrapping" in their backend, with only a couple (dumb) companies doing the calls directly to OpenAI from the front end where they could be traced.
This article is BS. My guess is it was probably AI generated because it doesn't make any sense.
I find it shocking that most comments here just accept the article as fact and discuss the implications.
The message might not even be wrong. But why is everybody's BS detection on ice in the AI topic space? Come one people, you can all do better than this!
Thanks for flagging. Though whenever such a made up thing is flagged, we lose the chance to discuss this (meta) topic. People need to be aware how prevalent this is. By just hiding it every time we notice, we're preventing everybody to read the kind of comment you wrote and recalibrate their BS-meters.
And 99% of software development is just feeding data into a complier. But that sort of misses the point doesn't it?
AI has created a new interface with a higher level abstraction that is easier to use. Of course everyone is going to use it (how many people still code assembler?).
The point is what people are doing with it is still clever (or at least has potential to be).
I disagree. Software development is not limited to LLM-type responses and incorporates proper logic. You are at the mercy of LLM when you build an "AI" interface for the LLM apis. 73% these "AI" companies will collapse when the original API providing company comes up with a simple option (Gemini for Sheets, for example), they will disappear. It is already happening.
AI software is not long-lasting; its results are not deterministic.
Maybe one day i can ask my tech in natural language for the weather...could you imagine?
Wait...nvm.
Isn’t it a bit like saying, “X% of startups are just writing code”?
So? 73% of Saas startups are DB connectors & queries.
The difference is, if your company “moat” is a “prompt” on a commodity engine, there is no moat.
Google even said they have no moat, when clearly the moat is people that trust them and not any particular piece of technology.
the orchestration layer is the moat, ask any LLM and they will give paragraphs explaining why this is...
And 73% of paas are deploy scripts for existing software. It's how the industry works.
If tekens aren't profitable then prices per token are likely to go up. If that's all these businesses are, they're all very sensitive to token prices.
Not with open weight models you can deploy yourself. Different economics but not venerable to price increases.
First, someone has to develop those models and that's currently being done with VC backing. Second, running those models is still not profitable, even if you self host (obviously true because everything is self hosted eventually).
Burning VC money isn't a long term business model and unless your business is somehow both profitable on Llama 8b (or some such low power model) _and_ your secret sauce can't be easily duplicated, you're in for a rough ride.
The only barrier between AI startups at this point is access to the best models, and that's dependent on being able to run unprofitable models that spend someone else's money.
Investing in a startup that's basically just a clever prompt is gambling on the first mover's advantage because that's the only advantage they can have.
73% of AI blog post statistics are bogus. Subscribe to learn more.
And out of that 73%, 99% of them don't even do the obvious step of trying to actually optimize/engineer their damn prompts!
https://github.com/zou-group/textgrad
and bonus, my rant about this circa 2023 in the context of Stable Diffusion models: https://gist.github.com/Hellisotherpeople/45c619ee22aac6865c...
Flagged. AI written article with questionable sources behind a wall that requires handing over PII.
The really impressive thing about AI startups is not that they sell wrappers around (whatever), but that they are not complete vaporware.
It’s because the LLM is a commodity.
What differentiates a product is not the commodity layer it’s built on (databases, programming languages, open source libraries, OS apis, hosting, etc) but how it all gets glued together into something useful and accessible.
It would be a bad strategy for most startups to do anything other than prompt engineering in their AI implementations for the same reason it would be a bad idea for most startups to write low-level database code instead of SQL queries. You need to spend your innovation tokens wisely.
Yep I just use chatgpt . I can write better prompts and data for my own usecases
Atlas himself doesn't carry as much as "engineering" does in that headline.
That's like saying "73% of business is just meetings"
One of the biggest problems frontier models will face going forward is how many tasks require expertise that cannot be achieved through Internet-scale pre-training.
Any reasonably informed person realizes that most AI start-ups looking to solve this are not trying to create their own pre-trained models from scratch (they will almost always lose to the hyperscale models).
A pragmatic person realizes that they're not fine-tuning/RL'ing existing models (that path has many technical dead ends).
So, a reasonably informed and pragmatic VC looks at the landscape, realizes they can't just put all their money into the hyperscale models (LP's don t want that) and they look for start-ups that take existing hyperscale models and expose them to data that wasn't in their pre-Training set, hopefully in a way that's useful to some users somewhere.
To a certain extent, this study is like saying that Internet start-ups in the 90's relied on HTML and weren't building their own custom browsers.
I'm not saying that this current generation of start-ups will be successful as Amazon and Google, but I just don't know what the counterfactual scenario is.
The question that isn't answered completely in the article is how useful are the pipelines for these startups? The article certainly implies that for at least some of these startups there very little value add in the wrapper.
Got any links to explanations of why fine tuning open models isn’t a productive solution? Besides renting the GPU time, what other downsides exist on today’s SOTA open models for doing this?
When people are desperate to invest, they often don't care what someone actually can do but more about what they claim they can do. Getting investors these days is about how much bullshit you can shovel as opposed to how much real shit you shoveled before.
Thus has it always been. Thus will it always be.
Prompt engineering and using an expensive general model in order to prove your market, and then putting in the resources to develop a smaller(cheaper) specialized model seems like a good idea?
Are people down to have a bunch of specialized models? The expectation set by OpenAI and everyone else has set is that you will have one model that can do everything for you.
It’s like how we’ve seen basically all gadgets meld into the smart phone. People don’t have Garmin’s and beepers and clock radios anymore (or dedicated phones!). It’s all on the screen that fits in your pocket. Any would-be gadget is now just an app
Having everything in my phone is a great convenience for me as a consumer. Pockets are small, and you only have a small number of them in any outfit.
But cloud services run in... the cloud. It's as big as you need it to be. My cloud service can have as many backing services as I want. I can switch them whenever I want. Consumers don't care.
"One model that can do everything for you" is a nice story for the hyper scalers because only companies of their size can pull that off. But I don't think the smartphone analogy holds. The convenience in that world is for the the developers of user-facing apps. Maybe some will want to use an everything model. But plenty will try something specialized. I expect the winner to be determined by which performs better. Developers aren't constrained by size or number of pockets.
> The expectation set by OpenAI and everyone else has set is that you will have one model that can do everything for you.
I don’t think that’s the expectation set by “everyone else” in the AI space, even if it arguably is for OpenAI (which has always, at least publicly, had something of a focus on eventual omnicapable superintelligence.) I think Google Antigravity is evidence of this: there’s a main, user selected coding model, but regardless of which coding model is used, there are specialized models used for browser interaction and image generation. While more and more capabilities are at least tolerably supported by the big general purpose models, the range of specialized models seems to be increasing rather than decreasing, and seems likely that, for conplex efforts, combining a general purpose model with a set of focussed, task-specific models will be a useful approach for the forseeable future.
I think of the foundational model like CPUs. They're the core of powerful, general-purpose computers, and will likely remain popular and common for most computing solutions. But we also have GPUs, microcontrollers, FPGAs, etc. that don't just act as the core of a wide variety of solutions, but are also paired alongside CPUs for specific use cases that need specialization.
Foundational models are not great for many specific tasks. Assuming that one architecture will eventually work for everything is like saying that x86/amd64/ARM will be all we ever need for processors.
Specialized models are cheaper. For a company you're looking for some task that needs to be done millions of times per day, and where general models can do it well enough that people will pay you more than the general model's API cost to do it. Once you've validated that people will pay you for your API wrapper you can train a specialized model to increase your profit and if necessary lower your pricing so people won't pay OpenAI directly.
It's probably the direction it will go, at least in the near term.
It seems right now like there is a tradeoff between creativity and factuality, with creative models being good at writing and chatting, and factuality models being good at engineering and math.
It why we are getting these specific -code models.
It's really an implementation decision. The end user doesn't need to know their request is routed to a certain model. A smaller specialized model might have identical output to a larger general purpose model, but just be cheaper and faster to run.
Happy with my Garmin :-)
I still use the Garmin I bought in 2010. I refuse to turn on my phone's location tracking. Also the single-purpose interface is better and safer than switching between apps and contexts on a general purpose device.
My coffee maker app is quite disappointing.
I imagine you’re being facetious but I wouldn’t count food-related products for the most part. It’s not like Claude is brewing a pot for me anyway lol
People talk about an AI bubble. I think this is the real bubble.
Not really because the money involved is relatively small. The bubble is where people are using D8s to push square kilometers of dirt around for data centers that need new nuclear power plants built, to house millions of obsolete Nvidia GPUs that need new fabs constructed to make, using yet more D8s..
(D8s apparently refers to a specific Caterpillar-brand bulldozer, not some kubernetes takeoff.)
Why is slop with ridiculous or impossible claims at the top of HN?
Wait til you hear what GPT 5 is
What is it? A gpt-4o wrapper?
[dead]
[dead]
Prompt is code.
prompt as code is a pipe-dream.
The machine model for natural language doesnt exist - it is too ambiguous to be useful for many applications.
Hence, we limited natural language to create programming languages whose machine model is well defined.
In math, we created formalism to again limit language to a subset that can be reasoned with.
I've always said, determinism has been holding the field back.
Prompt is specification, not code.
Not to be too pedantic, but code is a kind of specification. I think making the blanket statement "Prompt is code" is inaccurate but there does exist a methodology of writing prompts as if they are specifications that can reliably converted to computational actions, and I believe we're heading toward that.
Yeah, I assumed someone would say this.
My manager gives me specifications, which I turn into code.
My manager is not coding.
This is how to look at it.
100% of AI startups are just multiplying matrices 100% of tech startups are just database engineering
It's still early in the paradigm and most startups will fail but those that succeed will embed themselves in workflows.