Ask HN: Why don't many platform data products offer self hosted?

4 points by arshbot 5 days ago

Leading a new team to unify a lot of our data sources at my company, and searching for vendors/tools from BI to data exploration (through means of a data catalogue) to data ingestion.

A hard requirement is we need to be own our data because data leaks for us are incredibly detrimental to our customers (in ways where even email leaks can allow our custs to be targeted by phishing, scams for years to come).

But looking around, so often the industry leaders like atlan or rudderstack or whatever don't offer a method to self host. To be clear, we do not care about paying, we have a high budget.

We just need to take ownership of our data because a breach can kill our company, but not necessarily true for a dedicated data platform (see snowflake).

pradeepchhetri 5 days ago

I would recommend to look at ClickHouse[0] which is Apache-License and is completely open-source. You can self-host it and it is one of the fastest growing OLAP database. It has good integrations with BI tools and it can be used for wide varieties of use cases. It is widely adopted[1]. I would recommend you to give it a try and validate it for your use-case.

I work for ClickHouse and available for any queries or help you need.

[0]: https://clickhouse.com/ [1]: https://clickhouse.com/docs/en/about-us/adopters

  • djfobbz 5 days ago

    I don't work for ClickHouse, but I second this!!! Absolute juggernaut of a db warehouse/analytics product.

evantahler 5 days ago

On the data moving / ETL/ELT space - check out Airbyte. You can self host!