From Custom to Open: Scalable Network Probing and HTTP/3 Readiness with Prometheus

Slack EngineeringMarch 31, 2026

Why it matters

As HTTP/3 and QUIC become standard, legacy monitoring tools often fail to provide visibility into UDP-based traffic. Open-sourcing these capabilities into Prometheus BBE enables engineers to monitor modern network protocols without relying on fragmented or proprietary solutions.

Key takeaways

Slack faced a critical observability gap when rolling out HTTP/3 because existing SaaS and internal tools lacked support for UDP-based QUIC probing.
An engineering intern developed and open-sourced QUIC support for the Prometheus Blackbox Exporter (BBE) using the quic-go library.
The implementation integrated a new HTTP/3 transport into BBE's client while maintaining existing configuration patterns and composability.
The new probing system enables a unified view of HTTP/1.1, HTTP/2, and HTTP/3 metrics within Grafana for easier correlation and debugging.
Open-sourcing the contribution future-proofs Slack's infrastructure and provides the wider Prometheus community with native HTTP/3 monitoring capabilities.
Future roadmap items include Server Name Indication (SNI) routing validation and hop-by-hop end-to-end network path visualization.

Keywords

HTTP/3

The Problem: Legacy Tooling and Its Limitations

Currently, Slack utilizes a hybrid approach to network measurement, incorporating both internal (such as traffic between AWS Availability Zones) and external (monitoring traffic from the public internet into Slack’s infrastructure) solutions. These tools comprise a combination of commercial SaaS offerings and custom-built network testing solutions developed by our internal teams over time. This was a suitable enough solution for our needs.

When we began rolling out HTTP/3 support on the edge, there was a significant challenge that we encountered: A lack of client-side observability.

Since HTTP/3 is built on top of the QUIC transport protocol, it uses UDP instead of the traditional TCP. This fundamental shift to a new transport meant that existing monitoring tools and SaaS solutions were not capable of probing our new HTTP/3 endpoints for metrics.

At that time, there was a major gap in the market:

None of the SaaS observability tools we investigated supported HTTP/3 probing out of the box.
Our internal Prometheus Blackbox Exporter (BBE), a cornerstone of our monitoring, didn’t have native support for QUIC.

Without the ability to probe hundreds of thousands of HTTP/3 endpoints in our new infrastructure, we couldn’t get the client-side visibility we needed to monitor regressions to HTTP/2 or accurate round trip measurements.

The Intern Who Made It Happen

The Open Source Contribution

Our intern, Sebastian Feliciano, scoped, implemented, and ultimately open-sourced QUIC support for Prometheus BBE

Choosing the Right HTTP Client: The first step was selecting a QUIC-capable HTTP client. After careful consideration, they chose quic-go to serve as the foundation for the new functionality. The choice was settled on due to its wide adoption across other open source technologies, as well as the first-class support it provides in creating http clients in go.

Here’s how Sebastian integrated quic-go into BBE’s HTTP client:

http3Transport := &http3.Transport{
    TLSClientConfig: tlsConfig,
    QUICConfig:      &quic.Config{},
}

client = &http.Client{
    Transport: http3Transport,
}

Maintaining Composability: Sebastian had to add this new logic while following the Blackbox Exporter’s existing architecture, ensuring the new features maintained the tool’s configuration patterns.

The result of this work was a functional and configurable HTTP/3 probe within Prometheus, and by open-sourcing their contribution, they provided a solution that the entire Prometheus community could use. By following existing patterns and earning community buy-in, Sebastian successfully landed the HTTP/3 feature.

Final Step: Integration

Making an open-source contribution as an intern is a huge accomplishment. As many of us know, maintainers don’t always merge PRs quickly, especially for new features. Sebastian’s internship timeline was limited, so he couldn’t wait. Sebastian took matters into his own hands and architected an in-house system that utilized the new upstream features for probing out HTTP/3 endpoints.

Operational Improvements

Single Pane of Glass: We now have a unified view of both HTTP/1.1, HTTP/2, and HTTP/3 metrics in Grafana, allowing for easier correlation with other telemetry and comparison.

Better and More Reliable Alerts: With the new probes, we can create more reliable alerts on the health and performance of our HTTP/3 endpoints.

Easier Correlation: Having all our data in one place makes it easier to correlate HTTP/3 performance with other metrics and debug issues faster.

The Open Source Win

Community Benefit: This contribution benefits the wider Prometheus community, helping other organizations facing the same challenges with HTTP/3 adoption. By building this support, we have future-proofed our observability for the ongoing adoption of QUIC and HTTP/3.

Looking Ahead

While this is a major step, our work isn’t done. Future improvements could be made through adding advanced features, such as:

Server Name Indication (SNI) routing tests
- Validating that the SNI extension is correctly handled by our edge infrastructure. This ensures that when a client requests a specific hostname over a shared IP (like a CDN or a multi-tenant load balancer), the gateway correctly routes the traffic to the intended backend and serves the matching SSL certificate, preventing misrouting errors.
end-to-end path visualization
- Moving beyond simple “up/down” checks by mapping the entire network hop-by-hop from the monitoring agent to the service endpoint. This provides a visual representation of the network path, making it possible to pinpoint exactly where latency spikes, or packets are lost.

We invite others in the community to try out this new QUIC support in Prometheus Blackbox Exporter and join us in building the next generation of observability tools. You can find the HTTP/3 configuration in the configuration documentation in the Prometheus Black Box Exporter repository.

Conclusion

There were a few takeaways from this project:

1. Monitor first, and migrate second

This should go without saying, but getting observability right as a precursor to migration makes everything faster. We know that the industry is going towards QUIC, but proving to ourselves that it’s the right move long term enables us to invest more into its future.

2. Contributing open source pays dividends

It feels good to give back to open source communities who provide us so much. When a game changing protocol like QUIC comes through, and there’s a gap in existing technologies supporting it, everyone wins when we fill the gap, and we win when everyone decides to support it long term.

3. Bet on your interns

We were incredibly fortunate to have landed Sebastian as an intern for our team. His proactiveness and creativity in problem solving helped us push the QUIC migration across the line, and gave us tangible exposure to the benefits of black-box monitoring.

This journey from having an observability gap to an open-sourced solution perfectly illustrates our commitment to simplicity and scalability. As HTTP/3 adoption grows industry-wide, we’re committed to keeping our monitoring tools ahead of the curve. We welcome community feedback and contributions to help evolve these capabilities further.

Interested in taking on interesting projects, making people’s work lives easier, or just building some pretty cool forms? We’re hiring!

Apply now

From Custom to Open: Scalable Network Probing and HTTP/3 Readiness with Prometheus

Why it matters

Key takeaways

Keywords

Content preview

The Problem: Legacy Tooling and Its Limitations

The Intern Who Made It Happen

Operational Improvements

The Open Source Win

Looking Ahead

Conclusion

Related posts

Build better software to build software better

Advancing Our Chef Infrastructure: Safety Without Disruption

Deploy Safety: Reducing customer impact from change

Migration Automation: Easing the Jenkins → GHA shift with help from AI