Show HN: Velvet – Store OpenAI requests in your own DB

usevelvet.com

109 points by elawler24 5 days ago

Hey HN! We’re Emma and Chris, founders of Velvet (https://www.usevelvet.com).

Velvet proxies OpenAI calls and stores the requests and responses in your PostgreSQL database. That way, you can analyze logs with SQL (instead of a clunky UI). You can also set headers to add caching and metadata (for analysis).

Backstory: We started by building some more general AI data tools (like a text-to-SQL editor). We were frustrated by the lack of basic LLM infrastructure, so ended up pivoting to focus on the tooling we wanted. So many existing apps, like Helicone, were hard to use as power users. We just wanted a database.

Scale: We’ve already warehoused 50m requests for customers, and have optimized the platform for scale and latency. We’ve built the proxy on Cloudflare Workers, and latency is nominal. We’ve built some “yak shaving” features that were really complex such as decomposing OpenAI Batch API requests so you can track each log individually. One of our early customers (https://usefind.ai/) makes millions of OpenAI requests per day, up to 1500 requests per second.

Vision: We’re trying to build development tools that have as little UI as possible, that can be controlled entirely with headers and code. We also want to blend cloud and on-prem for the best of both worlds — allowing for both automatic updates and complete data ownership.

Here are some things you can do with Velvet logs:

- Observe requests, responses, and latency

- Analyze costs by metadata, such as user ID

- Track batch progress and speed

- Evaluate model changes

- Export datasets for fine-tuning of gpt-4o-mini

(this video shows how to do each of those: https://www.youtube.com/watch?v=KaFkRi5ESi8)

--

To see how it works, try chatting with our demo app that you can use without logging in: https://www.usevelvet.com/sandbox

Setting up your own proxy is 2 lines of code and takes ~5 mins.

Try it out and let us know what you think!

DeveloperErrata 4 days ago

Seems neat - I'm not sure if you do anything like this but one thing that would be useful with RAG apps (esp at big scales) is vector based search over cache contents. What I mean is that, users can phrase the same question (which has the same answer) in tons of different ways. If I could pass a raw user query into your cache and get back the end result for a previously computed query (even if the current phrasing is a bit different than the current phrasing) then not only would I avoid having to submit a new OpenAI call, but I could also avoid having to run my entire RAG pipeline. So kind of like a "meta-RAG" system that avoids having to run the actual RAG system for queries that are sufficiently similar to a cached query, or like a "approximate" cache.

  • davidbarker 4 days ago

    I was impressed by Upstash's approach to something similar with their "Semantic Cache".

    https://github.com/upstash/semantic-cache

      "Semantic Cache is a tool for caching natural text based on semantic similarity. It's ideal for any task that involves querying or retrieving information based on meaning, such as natural language classification or caching AI responses. Two pieces of text can be similar but not identical (e.g., "great places to check out in Spain" vs. "best places to visit in Spain"). Traditional caching doesn't recognize this semantic similarity and misses opportunities for reuse."
    • OutOfHere 4 days ago

      I strongly advise not relying on embedding distance alone for it because it'll match these two:

      1. great places to check out in Spain

      2. great places to check out in northern Spain

      Logically the two are not the same, and they could in fact be very different despite their semantic similarity. Your users will be frustrated and will hate you for it. If an LLM validates the two as being the same, then it's fine, but not otherwise.

      • DeveloperErrata 4 days ago

        I agree, a naive approach to approximate caching would probably not work for most use cases.

        I'm speculating here, but I wonder if you could use a two stage pipeline for cache retrieval (kinda like the distance search + reranker model technique used by lots of RAG pipelines). Maybe it would be possible to fine-tune a custom reranker model to only output True if 2 queries are semantically equivalent rather than just similar. So the hypothetical model would output True for "how to change the oil" vs. "how to replace the oil" but would output False in your Spain example. In this case you'd do distance based retrieval first using the normal vector DB techniques, and then use your custom reranker to validate that the potential cache hits are actual hits

        • OutOfHere 4 days ago

          Any LLM can output it, but yes, a tuned LLM can benefit with a shorter prompt.

  • OutOfHere 4 days ago

    That would totally destroy the user experience. Users change their query so they can get a refined result, not so they get the same tired result.

    • pedrosorio 4 days ago

      Even across users it’s a terrible idea.

      Even in the simplest of applications where all you’re doing is passing “last user query” + “retrieved articles” into openAI (and nothing else that is different between users, like previous queries or user data that may be necessary to answer), this will be a bad experience in many cases.

      Queries A and B may have similar embeddings (similar topic) and it may be correct to retrieve the same articles for context (which you could cache), but they can still be different questions with different correct answers.

    • elawler24 4 days ago

      Depends on the scenario. In a threaded query, or multiple queries from the same user - you’d want different outputs. If 20 different users are looking for the same result - a cache would return the right answer immediately for no marginal cost.

      • OutOfHere 4 days ago

        That's not the use case of the parent comment:

        > for queries that are sufficiently similar

  • elawler24 4 days ago

    Thanks for the detail! This is a use case we plan to support, and it will be configurable (for when you don’t want it). Some of our customers run into this when different users ask a similar query - “NY-based consumer founders” vs “consumer founders in NY”.

OutOfHere 4 days ago

A cache is better when it's local rather than on the web. And I certainly don't need to pay anyone to cache local request responses.

  • knowaveragejoe 4 days ago

    How would one achieve something similarly locally, short of just running a proxy and stuffing the request/response pairs into a DB? I'm sure it wouldn't be too terribly hard to write something, but I figure something open source already exists for OpenAI-compatible APIs.

    • w-ll 4 days ago

      Recently did this workflow.

      Started with nginx proxy with rules to cache base on url/params. Wanted more control over it and explored lua/redis apis, and opted to build a app to do be a little more smart for what i wanted. Extra ec2 cost is negligible compared to cache savings.

      • doubleorseven 4 days ago

        Yes! It's amazing how many things you can do with lua in nginx. I had a server that served static websites where the files and the certificates for each website were stored in a bucket. Over 20k websites with 220ms overhead if the certificate wasn't cached.

    • OutOfHere 4 days ago

      There are any number of databases and language-specific caching libraries. A custom solution or the use of a proxy isn't necessary.

  • nemothekid 4 days ago

    As I understand it, your data remains local, as it leverages your own database.

    • manojlds 4 days ago

      Why do I even ha e to use this saas? This should be a open source lib or just a practice that I implement myself.

      • dsmurrell 4 days ago

        Implement it yourself then and save your $$ at the expense of your time.

        • torlok 4 days ago

          If you factor in dealing with somebody's black box code 6 months into a project, you'll realise you're saving both money and time.

          • OutOfHere 3 days ago

            It's not complicated as you make it. There are numerous caching libraries, and databases have been a thing for decades.

        • manojlds 4 days ago

          Like this is not a big thing to implement, that's my point. There are already libraries like OpenLLMetry and sink to a DB. We are doing something like this already.

          • nemothekid 3 days ago

            Yes, the ol' Dropbox "you can already build such a system yourself quite trivially by getting an FTP account" comment. Even after 17 years, people still feel the need to make this point.

angoragoats 4 days ago

I don't understand the problem that's being solved here. At the scale you're talking about (e.g. millions of requests per day with FindAI), why would I want to house immutable log data inside a relational database, presumably alongside actual relational data that's critical to my app? It's only going to bog down the app for my users.

There are plenty of other solutions (examples include Presto, Athena, Redshift, or straight up jq over raw log files on disk) which are better suited for this use case. Storing log data in a relational DB is pretty much always an anti-pattern, in my experience.

  • philip1209 4 days ago

    Philip here from Find AI. We store our Velvet logs in a dedicated DB. It's postgres now, but we will probably move it to Clickhouse at some point. Our main app DB is in postgres, so everybody just knows how it works and all of our existing BI tools support it.

    Here's a video about what we do with the data: https://www.youtube.com/watch?v=KaFkRi5ESi8

  • elawler24 4 days ago

    It's a standalone DB, just for LLM logging. Since it's your DB - you can configure data retention, and migrate data to an analytics DB / warehouse if cost or latency becomes a concern. And, we're happy to support whatever DB you require (ClickHouse, Big Query, Snowflake, etc) in a managed deployment.

    • angoragoats 4 days ago

      I guess I should have elaborated to say that even if you're spinning up a new database expressly for this purpose (which I didn't see specifically called out in your docs anywhere as a best practice), you're starting off on the wrong foot. Maybe I'm old-school, but relational databases should be for relational data. This data isn't relational, it's write-once log data, and it belongs in files on disk, or in purpose-built analytics tools, if it gets too large to manage.

      • elawler24 4 days ago

        Got it. We can store logs to your purpose-built analytics DB of choice.

        PostgreSQL (Neon) is our free self-serve offering because it’s easy to spin up quickly.

phillipcarter 4 days ago

Congrats on the launch! I love the devex here and things you're focusing on.

Have you had thoughts on how to you might integrate data from an upstream RAG pipeline, say as a part of a distributed trace, to aid in debugging the core "am I talking to the LLM the right way" use case?

  • elawler24 4 days ago

    Thanks! You can layer on as much detail as you need by including meta tags in the header, which is useful for tracing RAG and agent pipelines. But would love to understand your particular RAG setup and whether that gives you enough granularity. Feel free to email me too - emma@usevelvet.com

simple10 4 days ago

Looks cool. Just out of curiosity, how does this compare to other OpenLLMetry-type observation tools like Arize, Traceloop, LangSmith, LlamaTrace, etc.?

From personal experience, they're all pretty simple to install and use. Then mileage varies in analyzing and taking action on the logs. Does Velvet offer something the others do not?

For my client projects, I've been leaning towards open source platforms like Arize so clients have the option of pulling it inhouse if needed. Most often for HIPAA requirements.

RAG support would be great to add to Velvet. Specifically pgvector and pinecone traces. But maybe Velvet already supports it and I missed it in the quick read of the docs.

  • elawler24 4 days ago

    Velvet takes <5 mins to get set up in any language, which is why we started as a proxy. We offer managed / custom deployments for enterprise customers, so we can support your client requirements.

    We warehouse logs directly to your DB, so you can do whatever you want with the data. Build company ops on top of the DB, run your own evals, join with other tables, hash data, etc.

    We’re focusing on backend eng workflows so it’s simple to run continuous monitoring, evals, and fine-tuning with any model. Our interface will focus on surfacing data and analytics to PMs and researchers.

    For pgvector/pinecone RAG traces - you can start by including meta tags in the header. Those values will be queryable in the JSON object.

    Curious to learn more though - feel free to email me at emma@usevelvet.com.

  • marcklingen 4 days ago

    disclosure: founder/maintainer of Langfuse (OSS LLM application observability)

    I believe proxy-based implementations like Velvet are excellent for getting started and solve for the immediate debugging use case; simply changing the base path of the OpenAI SDK makes things really simple (the other solutions mentioned typically require a few more minutes to set up).

    At Langfuse (similarly to the other solutions mentioned above), we prioritize asynchronous and batched logging, which is often preferred for its scalability and zero impact on uptime and latency. We have developed numerous integrations (for openai specifically an SDK wrapper), and you can also use our SDKs and Decorators to integrate with any LLM.

    > For my client projects, I've been leaning towards open source platforms like Arize so clients have the option of pulling it inhouse if needed. Most often for HIPAA requirements.

    I can echo this. We observe many self-hosted deployments in larger enterprises and HIPAA-related companies, thus we made it very simple to self-host Langfuse. Especially when PII is involved, self-hosting makes adopting an LLM observability tool much easier in larger teams.

reichertjalex 4 days ago

Very nice! I really like the design of the whole product, very clean and simple. Out of curiosity, do you have a designer, or did you take inspiration from any other products (for the landing page, dashboard, etc) when you were building this? I'm always curious how founders approach design these days.

  • elawler24 4 days ago

    I’m a product designer, so we tend to approach everything from first principles. Our aim is to keep as much complexity in code as possible, and only surface UI when it solves a problem for our users. We like using tools like Vercel and Supabase - so a lot of UI inspiration comes from the way they surface data views. The AI phase of the internet will likely be less UI focused, which allows for more integrated and simple design systems.

ramon156 4 days ago

> we were frustrated by the lack of LLM infrastructure

May I ask what you specifically were frustrated about? Seems like there are more than enough solutions

  • elawler24 4 days ago

    There were plenty of UI-based low code platforms. But they required that we adopt new abstractions, use their UI, and log into 5 different tools (logging, observability, analytics, evals, fine-tuning) just to run basic software infra. We didn’t feel these would be long-term solutions, and just wanted the data in our own DB.

turnsout 4 days ago

Nice! Sort of like Langsmith without the Langchain, which will be an attractive value proposition to many developers.

  • efriis 4 days ago

    Howdy Erick from LangChain here! Just a quick clarification that LangSmith is designed to work great for folks not using LangChain as well :)

    Check out our quickstart for an example of what that looks like! https://docs.smith.langchain.com/

    • turnsout 4 days ago

      TIL! LangSmith is great.

ji_zai 4 days ago

Neat! I'd love to play with this, but site doesn't open (403: Forbidden).

  • elawler24 4 days ago

    Might be a Cloudflare flag. Can you email me your IP address and we'll look into it? emma@usevelvet.com.

codegladiator 4 days ago

Error: Forbidden

403: Forbidden ID: bom1::k5dng-1727242244208-0aa02a53f334

hiatus 4 days ago

This seems to require sharing our data we provide to OpenAI with yet another party. I don't see any zero-retention offering.

  • elawler24 4 days ago

    The self-serve version is hosted (it’s easy to try locally), but we offer managed deployments where you bring your own DB. In this case your data is 100% yours, in your PostgreSQL. That’s how Find AI uses Velvet.

    • knowaveragejoe 4 days ago

      Where is this mentioned? Is there a github(etc) somewhere that someone can use this without using the hosted version?

      • elawler24 4 days ago

        Right now, it’s a managed service that we set up for you (we’re still a small team). Email me if you’re interested and I can share details - emma@usevelvet.com.

bachback 4 days ago

interesting, seems more of an enterprise offering. its OpenAI only for and you plan to expand to other vendors? anything opensource?

  • elawler24 4 days ago

    We already support OpenAI and Anthropic endpoints, and can add models/endpoints quickly based on your requirements. We plan to expand to Llama and other self-hosted models soon. Do you have a specific model you want supported?

  • beepbooptheory 4 days ago

    I guess I don't understand what this is now. If its just proxying requests and storing in db, can't it be literally any API?

    • elawler24 4 days ago

      We could support any API. We’re focused on building data pipelines and tooling for LLM use cases.