Our container platform is in production. It has GPUs. Here's an early look

blog.cloudflare.com

188 points by jgrahamc a day ago

ckastner a day ago

> To add GPU support, the Google team introduced nvproxy which works using the same principles as described above for syscalls: it intercepts ioctls destined to the GPU and proxies a subset to the GPU kernel module.

This does still expose the host's kernel to a potentially malicious workload, right?

If so, could this be mitigated by (continuously) running a QEMU VM with GPUs passed through via VFIO, and running whatever Workers need within that VM?

The Debian ROCm Team faces similar challenge, we want to do CI [1] for our stack and all our dependent packages, but cannot rule out potentially hostile workloads. We spawn QEMU VMs per test (instead of the model described above) but that's because our tests must also be run against the relevant distribution's kernel and firmwares.

Incidentally, I've been monitoring the Firecracker VFIO GitHub issue linked in the article. Upstream does not have a use case for and thus no resources dedicated to implement this, but there's a community meeting [2] coming up in October to discuss the future of this feature request.

[1]: https://ci.rocm.debian.net

[2]: https://github.com/firecracker-microvm/firecracker/issues/11...

hinkley a day ago

I’ve been looking at distributed CI and for now I’m just going to be running workloads queued by the owner of the agent. That doesn’t eliminate hostile workloads but it does present a similar surface area to simply running the builds locally.
I’ve been thinking about QEMM or firecracker instead of just containers for a more robust solution. I have some time before anyone would ask me about GPU workloads, but do you think firecracker is on track to get there or would I be better off learning QEMM?
- ckastner a day ago
  
  Amazon/AWS has no use case for VFIO in Firecracker. They're open to the community adding support and have a community meeting soon, but I wouldn't get my hopes up.
  QEMU can work -- I say can, because it doesn't work with all GPUs. And with consumer GPUs, VFIO is generally not an officially supported use case. We got it working, but with lots of trial and error, and there are still some problematic corner cases.
  - hinkley a day ago
    
    What would you say is the sort of time horizon for turnkey operation of one commonly available video card, half a dozen, and OEM cards in high end laptops (eg, MacBook Pro)? Years? Decades? Heat death?
    
    ckastner 21 hours ago
    
    I don't think I fully understand your question. If, with turnkey operation you mean virtualization, enterprise GPUs already officially support it now, and it already works with consumer GPUs, at least the discrete ones.
ec109685 a day ago

If the calls first pass through a memory safe language as what gvisor does, isn’t the attack surface greatly reduced?
It does seem however that Firecracker + GPU support (or https://github.com/cloud-hypervisor/cloud-hypervisor) is most promising though.
It’s surprising that AWS doesn’t have a need for Lambda but with GPU’s to motivate them to bring GPU’s to firecracker.
- donavanm 14 hours ago
  
  Im not familiar with this cases specifics, but AWS also has an approach of virtualizing actual hardware interfaces (like nvme/pcie) to the host through dedicated hardware/firmware. I wouldnt be surprised if their solution was to map physical devices (partitions of) as a “hardware” device on the host and pass it directly through to the fire cracker instances. Especially if they can isolate multiple firecracker/lambda instances of a customer to a single physical device.
- ckastner a day ago
  
  > If the calls first pass through a memory safe language as what gvisor does, isn’t the attack surface greatly reduced?
  The runtime may be memory safe, but I'm thinking of the GPU workloads which nvproxy seems to pass on to the device via the host's kernel. Say I find a security issue in the GPU's driver, and manage to exploit it with some malicious CUDA workload.
  - ec109685 a day ago
    
    Would having a VM inbetween help in that case? It seems like protecting against malicious GPU workloads requires the GPU to off virtualization to avoid this exploit.
    This is helpful in explaining why AWS hasn't been excited to ship this use case in firecracker.
    
    ckastner 20 hours ago
    
    It would probably not stop all theoretically possible attacks, but it would stop many of them.
    Say you find a bug in the GPU driver that let's you execute arbitrary code as root. That still all happens within the VM. To attack the host, you'd still need to break out of the VM, and if the VM is unprivileged (which I assume it is), you'd next need gain privileges on the host.
    There are other channels -- perhaps you can get the GPU to do something funky on PCI level, perhaps you can get the GPU to crash the host -- but VM isolation does add a solid layer of protection.

LukeLambert a day ago

This is really cool and I can't wait to read all about it. Unfortunately, I've missed a month of blog posts because Cloudflare changed their blog's RSS URL without notice. If you change blogging platforms and can't implement a 301, please leave a post letting subscribers know where to find the new feed. RSS isn't dead!

jgrahamc a day ago

We did? That's nuts if we did. What URL were you using?
EDIT: It looks like some people may have been using ghost.blog.cloudflare.com/rss because we used to use Ghost but the actual URL was/is blog.cloudflare.com/rss. We're setting up a redirect for anyone who was using the ghost. URL.
- LukeLambert a day ago
  
  Yes, it was the Ghost URL. Thank you for correcting it! I read just about every post, so I have a lot of catching up to do.
  - jgrahamc a day ago
    
    Sorry about the interruption! We migrated away from Ghost and not sure how you ended up with that URL but we're adding a redirect. Have a good catch up :-)
- Traubenfuchs a day ago
  
  Hacker News is my favorite C-Suite level support forum of cloudflare and stripe.
NicoJuicy a day ago

Just works? https://blog.cloudflare.com/rss
- jgrahamc a day ago
  
  Yes, that should be the URL and I don't think that's changed. Just wondering what URL the parent was hitting.

pjmlp a day ago

So I just discovered that Cloudfare now owns the trademark for Sun's "The Network is the Computer".

"Cloudflare serves the entire world — region: earth. Rather than asking developers to provision resources in specific regions, data centers and availability zones, we think “The Network is the Computer”. "

https://blog.cloudflare.com/the-network-is-the-computer/

DonHopkins a day ago

Did they also get the old DEC t-shirt trademark: "The Network Is The Network and The Computer Is The Computer. We regret the confusion."
IBM mocked Sun with: "When they put the dot into dot-com, they forgot how they were going to connect the dots," after sassily rolling out Eclipse just to cast a dark shadow on Java. Badoom psssh!
https://www.itbusiness.ca/news/ibm-brings-on-demand-computin...
remram a day ago

> The global scheduler is built on Cloudflare Workers, Durable Objects, and KV, and decides which Cloudflare location to schedule the container to run in. Each location then runs its own scheduler, which decides which metals within that location to schedule the container to run on.
So they just use the term "location" instead of "region".

tarasglek 11 hours ago

I love they built all this infra for running js to avoid building a container runtime and ended up building a container platform using all the hypervisors. On more serious note, I do not understand why they can't fix the 500mb upload limit. I hit that with r2 registry and ended switching away instead changing all the dockepush tooling. Not super excited re using more weird tools rather than fixing platform

ericpauley 5 hours ago

Agreed; after 7 hopeful years this feels like a declaration of defeat for V8 isolate-based FaaS.

breatheoften 13 hours ago

Why does it take 4 minutes (after being optimized from 8 minutes!) to move a 30 GB (compressed) docker image ...? The read slowness of docker registries continues to surprise me ...

simfree 13 hours ago

Perhaps they are throttling this transfer to 1Gbps so as to not slam their network or disk I/O? It does seem quite slow.

aconz2 4 hours ago

surprised they didn't go straight into cloud-hypervisor, though I haven't actually tested with gpu yet but it is on my todo list. OCI layers can use zstd compression. I wonder if they are defeating layer sharing by splitting in 500 mb chunks. Lambda splits your image into chunks and shares at the block layer (I believe even same chunk different (user's?) container on a single host). Esp for 15 GB images I'd think using lazy pulling with nydus/stargz or whatever would be beneficial. I'd like to test out snapshotting, though my testing already boots a guest and runs a container in ~170ms; and I'm not actually sure how you write the guest init to signal it is ready for snapshotting and then wait properly (maybe you just sleep 1000?) so it resumes from the snapshot in a good state. I know fly has written about their use of snapshotting but I don't think it went into that detail. Cool stuff overall though, not worrying about locations and the yucky networking to do so seems nice

pier25 15 hours ago

I like using Workers for smallish http services. The uptime, pricing, and latency are fantastic. I would never use them for anything complex as the vendor lock in is quite strong and the dev experience still needs to improve.

Containers on the edge with low cold starts, scalability, the same reliability as Workers, etc would be super cool. In part to avoid the lock in but also to be able to use other languages like Go (which Workers don't support natively).

tarasglek 11 hours ago

This is why I switched to deno deploy. Much of the same benefits and much more portable stack.
- pier25 an hour ago
  
  It's great you can run Deno anywhere you want. Their KV service is phenomenal too.
  Personally I don't want to keep using JS in the server anymore. As more time passes I feel like TS is a hack compared to the elegance of something like Go.
catskull 15 hours ago

Workers use the web worker API so theoretically there’s less lock in. I’ve also found wrangler pretty good, what problems have you run into?
- pier25 13 hours ago
  
  You can't really run the Worker code without modifications somewhere else afaik (unless you're using something like Hono with an adapter). And for most use cases, you're not going to be using Workers without KV, DO, etc.
  I've hit a bunch of issues and limitations with Wrangler and Workers locally over the years.
  Eg:
  https://github.com/cloudflare/workers-sdk/issues/2964
  https://github.com/cloudflare/workerd/issues/1897

thefounder a day ago

So this will be similar to Google Appengine(now Google run) ? If that’s the case I would love to give it a try but then I need close SQL server nearby and other open source services as well

tomrod a day ago

This seems like a pretty big deal.

I want to like CloudFlare over DO/AWS. I like their DevX focus too -- I could see issues if devs can't get into the abstractions though.

Any red flags folks would stake regarding CF? I know they are widely used but not sure where the gotchas are.

wmf a day ago

Is Cloudflare the one that goes from free to "call for pricing" ($100K+) at the drop of a hat?
- mnahkies a day ago
  
  I think they have some incredibly low pricing for what most small companies need. I also think they've done a very good job of carving out pieces that more sophisticated setups need into the enterprise tier, which does constitute a big jump.
  One that bit me was https://developers.cloudflare.com/cloudflare-for-platforms/c... - we found an alternative solution that didn't require upgrading to their enterprise plan (yet), but it was a pretty compelling reason to upgrade and if I was doing it again I'd probably choose upgrading over implementing our solution. On balance I'm not sure we actually saved money in the end, considering opportunity cost
- ignoramous a day ago
  
  One data point but, one among our toy services has been pushing 30TB/mo to 60TB/mo for over a year now, and we haven't got the call: https://news.ycombinator.com/item?id=39521228
  - trallnag a day ago
    
    Can you share what kind of toy is shuffling around so much data?
- jgrahamc a day ago
  
  https://blog.cloudflare.com/cloudflares-commitment-to-free/
  - mkl 8 hours ago
    
    The https://developers.cloudflare.com/calls/turn/overview/ link in that article 404s. Looks like it should be https://developers.cloudflare.com/calls/turn/?
    
    jgrahamc 7 hours ago
    
    I'll get that fixed. Thanks for reporting it.
ec109685 a day ago

Their solution isn’t GA yet.
For headless browsers, the latency benefits of “container anywhere” seems high. For things like AI inference, running on the edge seems way less beneficial than running on the cheapest location possible which would be larger regional data centers.
- hinkley a day ago
  
  One would hope that “larger regional data centers” are not that far from The Edge. But the problem isn’t physics or the speed of light, it’s operational.
  The operational excellence required to have every successful Internet company manage deployments to a dozen regions just isn’t there. Most of us struggle with three, my last gig tried to do two, which isn’t economical because you always try to handle one region going dark which means you need at least 200% capacity, where 3 data centers only need 150 + ??%, and 4 need 133 + ??%. It has all of the consistency problems of n > 1 and few if any of the advantages.
  We need more help from the CDNs of the world to run compute heavy operations at the edge. And if they choose to send them 10-20ms away to a beefier data center I think that’s probably fine. Just don’t make us have to have the sort of operational discipline that requires.
  - ec109685 a day ago
    
    Given how slow AI inference is (and for training it doesn't matter at all), the advantage of it being a few milliseconds closer to the user is greatly diminished. The latency to egress to a regional data center is inconsequential.
    Good point about at the very least not exposing placement to customers. That is a definite win.
    
    DrStartup 17 hours ago
    
    We’re gonna have so much local inference available.
srockets a day ago

Won't apply to everyone (most?), but some compliance assurances your customers may require can't be fulfilled by Cloudflare. And personally, I would hope their laissez faire attitude towards protecting hate speech should damage their business, but I suspect most people not targeted by such just don't give a damn.
- thefounder 10 hours ago
  
  What’s hate speech to you is free speech to someone else.
  - tomrod an hour ago
    
    That's not a real distinction, pseudo-koan aside.
- jgrahamc a day ago
  
  but some compliance assurances your customers may require can't be fulfilled by Cloudflare.
  Such as? See: https://www.cloudflare.com/trust-hub/compliance-resources/

lysace a day ago

Lots of cool stuff in this blog post. Impressive work on many fronts!

If I understand correctly, you will be running actual third party compute workloads/containers in hundreds of network interexchange locations.

Is that in line with what the people running these locations have in mind? Can you scale this? Aren't these locations often very power/cooling-constrained?

SantaCruz11 a day ago

Edgegap has been doing this for 5 years.

artemisart 35 minutes ago

I can't find any GPU capabilities on their website and it seems to be 100% focused on game backend hosting, not general workers. Do you have more information?

roboben a day ago

What I am always missing in these posts: How do they limit network bandwidth? Since these are all multi-tenant services, how do they make sure a container or isolated browser is not taking all the network bandwidth of a host?

tscolari a day ago

You probably can do this through the proc filesystem/cgroups. If you think about it, you can use cgroups to limit the bandwidth, so you can also use it to measure it.

dopylitty a day ago

I like the dig at "first generation" clouds.

There really is a wide gulf between the services provided by the older cloud providers (AWS, Azure) and the newer ones (fly.io, CloudFlare etc).

AWS/Azure provide very leaky abstractions (VMs, VPCs) on top of very old and badly designed protocols/systems (IP, Windows, Linux) . That's fine for people who want to spend all their time janitoring VMs, operating systems, and networks but for developers who just want to write code that provides a service it's much better to be able to say to the cloud provider "Here's my code, you make sure it's running somewhere" and let the cloud provider deal with the headaches. Even the older providers' PaaS services have too many knobs to deal with (I don't want to think about putting a load balancer in front of ECS or whatever)

abadpoli a day ago

This undersells the fact that there’s a lot more to infrastructure management than “janitoring”. You and many others may want to just say “here’s my code, ship it”, but there’s also a massive market of people that _need_ the customization and deep control over things like load balancers, because they’re pumping petabytes of data through it and using a cloud-managed LB is leaving money and performance on the table. Or there are companies that _need_ the strong isolation between regions for legal and security reasons, even if it comes with added complexity.
A lot of developers get frustrated at AWS or Azure because they want to deploy their hobby app on it and realize it’s too difficult dealing with stuff like IAM - it’s like trying to dig a small hole in your garden and someone suggests you go buy a Caterpillar Excavator, when all you needed was a hand trowel. The reason this persists is because AWS doesn’t target the hobby developer - it targets the massive enterprise that does need the customization and power it provides, despite the complexity. There are, thankfully, other companies that have come in to serve up cloud hand trowels.
There is no “one size fits all” cloud. There probably never will be. They’re all going to coexist for the foreseeable future.
bigcat12345678 a day ago

Hn now clearly are swarmed by grandiose novice techs.
10 years ago, no such superficial assessment would appear on first page.
This set of words bear little substance and engineering facts.
> AWS/Azure provide very leaky abstractions (VMs, VPCs) on top of very old and badly designed protocols/systems (IP, Windows, Linux) .
AWS cannot be made parallel, they themselves are 2 gens
AWS gen1
Azure gcp gen 2
Gen1 is on vm, ecs ebs s3, for web2 era
Gen2 is on cluster computing which was enable by vm
The then "leaky abstraction" is the mandated abstraction at the time
And GPUs today is about 70s's CPU
For example, you don't have any form of abstracted runtime on GPU, it's like running dos system
It's more leaky than 00s ' vm
spaceywilly 16 hours ago

Amazon has had serverless functions for a long time now. I built an iOS app with a backend in AWS and it was as you say, “here’s my code, you make sure it’s running somewhere.” I uploaded my Typescript code, set up an API gateway to call the lambda function and… that’s it. No load balancer, no ECS management. It’s been running for years and I haven’t to do anything other than pay the bill every month.

CSMastermind a day ago

Does it say anywhere what GPUs they have available?

I really need NVIDIA RTX 4000, 5000, A4000, or A6000 GPUs for their ray tracing capabilities.

Sadly I've been very limited in the cloud providers I can find that support them.

asciimike a day ago

The short answer here is that NVIDIA doesn't like Cloud Service Partners using RTX cards, as they are "professional" cards (they are also significantly cheaper than the corresponding data center cards). IIRC, A40, L40, and L40S have ray tracing, and might be more available on CSPs. Otherwise, the GPU marketplaces that aren't "true" CSPs will likely have RTX cards.
Paperspace (now DO), Vultr, Coreweave, Crusoe, should all have something with ray tracing.
- CSMastermind a day ago
  
  Incredibly helpful, thank you!
  We did try on the T4 and A10G but the raytracing failed even though those cards claim to support it.
  We ended up on Paperspace for the time being but they depreciated their support for Windows so I've been looking for alternatives. Will check out the provides you mentioned. Thanks again.
willtemperley a day ago

I doubt they'll commit to specific models before containers go GA in 2025 but they'll likely be NVIDIA:
https://www.cloudflare.com/en-gb/press-releases/2023/cloudfl...
singhrac a day ago

You can as of very recently get A6000s on Hetzner, which is a pretty good deal (but not serverless, so you need a consistent load).
- CSMastermind a day ago
  
  Super helpful, thank you!

attentive 16 hours ago

> Cloudflare serves the entire world — region: earth

Is that true for China though?

lofaszvanitt a day ago

Why is Cloudflare trying to create a walled-garden internet within the internet?

surfingdino a day ago

Looks like CloudFlare will soon be using "All other clouds are behind ours." slogan.

DonHopkins a day ago

"We're the silver lining."
"We'll keep you on the edge of your seat."
"Nice parade you got there. It sure would be a shame if somebody were to rain on it."
- surfingdino 21 hours ago
  
  "That's how dynamic pricing works, baby" (CF taking in learnings from the Oasis/Ticketmaster heist)

halfcat a day ago

> ”Remote Browser Isolation provides Chromium browsers that run on Cloudflare, in containers, rather than on the end user’s own computer. Only the rendered output is sent to the end user.”

It turns out we don’t need React Server Components after all. In the future we will just run the entire browser on the server.

srockets a day ago

What old is new again.

vednig a day ago

> We rely on it in Production

They really have a great engineering team

Toritori12 a day ago

[dead]