Explore the latest engineering posts and summaries

Search by topic, company, or concept and scan results quickly.

Posts indexed431

Last indexedMar 14, 2026

Cloudflare BlogMar 5, 2026

A QUICker SASE client: re-building Proxy Mode

Why it matters: This shift solves the performance penalty of SASE proxies by moving from L3 tunneling to direct L4 proxying via QUIC. It doubles throughput and lowers latency, making Zero Trust security transparent to users during high-bandwidth tasks or when coexisting with legacy VPNs.

Cloudflare re-engineered its SASE client proxy mode by replacing the WireGuard-based Layer 3 tunnel with a direct Layer 4 QUIC-based architecture.
The previous implementation relied on smoltcp, a user-space TCP stack that lacked modern features and created a performance ceiling for media-heavy traffic.
The new architecture leverages MASQUE and HTTP/3 (RFC 9114) using the CONNECT method to encapsulate traffic directly into QUIC streams.
By bypassing Layer 3 translation, the client eliminates inefficient IP packet handling and benefits from native QUIC congestion and flow control.
Internal testing demonstrated that download and upload speeds doubled while latency decreased significantly for end-users.
The update specifically improves performance for third-party VPN coexistence, high-bandwidth application partitioning, and developer CLI tools.

#security #dist #sre

Read original

Cloudflare BlogMar 5, 2026

How Automatic Return Routing solves IP overlap

Why it matters: ARR simplifies complex network architectures by eliminating the need for NAT or VRF when handling overlapping private IP spaces. This reduces administrative toil and prevents non-deterministic routing, allowing engineers to scale enterprise backbones without manual IP re-addressing.

Cloudflare introduced Automatic Return Routing (ARR) to resolve IP overlap issues in private enterprise networks without manual configuration.
IP overlap typically arises from mergers and acquisitions, extranet connections, or standardized 'cookie-cutter' site architectures.
Traditional solutions like NAT and VRF are effective but introduce significant administrative overhead and brittle configuration requirements.
ARR utilizes stateful tracking to remember the specific tunnel (IPsec, GRE, or CNI) that initiated a network flow.
By bypassing the routing table for return traffic, ARR ensures symmetric routing even when multiple sites share identical IP addresses.
The 'zero-touch' approach allows overlapping networks to coexist seamlessly without requiring complex route leaking or IP re-addressing.

#dist #security

Read original

Airbnb EngineeringMar 4, 2026

It Wasn’t a Culture Problem: Upleveling Alert Development at Airbnb

Why it matters: Validating alert behavior before deployment prevents alert fatigue and missed incidents. By shifting validation left through backtesting and visual diffs, teams can iterate on complex monitoring patterns at scale without risking production reliability or developer trust.

Airbnb transitioned to a Prometheus-based Observability as Code (OaC) platform managing 300,000 alerts.
Identified that traditional code reviews fail to predict alert behavior, leading to noise or missed incidents.
Implemented a local-first workflow ensuring identical execution across developer environments, CI, and production.
Developed Change Reports providing side-by-side configuration diffs and visual previews of production alerts.
Integrated bulk backtesting to simulate alerts against historical data, calculating noisiness metrics before deployment.
Reduced development cycles from weeks to minutes by shifting alert validation left in the development lifecycle.

#sre #culture

Read original

Cloudflare BlogMar 4, 2026

Always-on detections: eliminating the WAF “log versus block” trade-off

Why it matters: This shift from binary 'log vs. block' to continuous detection allows engineers to gain deep security insights without impacting latency or risking false positives. It enables more sophisticated, context-aware defenses by correlating full HTTP transactions instead of just inspecting requests.

Cloudflare is introducing an 'always-on' framework that separates WAF detection from mitigation, eliminating the traditional trade-off between visibility and protection.
Attack Signature Detection runs on every request, providing rich metadata including confidence scores, attack categories, and unique Ref IDs without stopping evaluation upon a match.
The system optimizes performance by running detections asynchronously after the request is sent to the origin, unless a specific blocking rule is active.
Full-Transaction Detection is being developed to correlate both requests and responses, enabling the identification of reflective SQL injection and data exfiltration.
Engineers can use new fields like cf.waf.signature.request.confidence and categories to build highly precise security policies in the Edge Rules Engine.
This evolution allows for easier onboarding and tuning by providing complete visibility into which signatures would have fired before switching to blocking mode.

#security #dist

Read original

Cloudflare BlogMar 4, 2026

Mind the gap: new tools for continuous enforcement from boot to login

Why it matters: These tools close critical security gaps by ensuring continuous enforcement from device boot. By decoupling MFA from the primary IdP, engineers can prevent lateral movement even if SSO credentials are compromised, significantly reducing the blast radius of potential breaches.

Cloudflare is introducing mandatory authentication to ensure devices are secured and visible from the moment of system boot.
The Cloudflare One Client now acts as a gatekeeper, using a system firewall to block all internet traffic until a user successfully authenticates.
A new independent MFA feature provides a secondary root of trust at the network edge, separate from the primary Identity Provider (IdP).
Independent MFA supports biometrics, FIDO2 security keys, and TOTP to protect against SSO session hijacking and credential compromise.
Administrators can define granular, per-application policies, requiring higher assurance authentication for sensitive resources like production databases.

#security #sre

Read original

Cloudflare BlogMar 4, 2026

Defeating the deepfake: stopping laptop farms and insider threats

Why it matters: Traditional Zero Trust is insufficient when attackers use deepfakes and laptop farms to bypass credential checks. Integrating biometric identity verification into the SASE layer closes the identity assurance gap, preventing nation-state actors from infiltrating corporate networks.

Remote IT worker fraud involves laptop farms where attackers use stolen identities and remote access to infiltrate corporate networks.
Traditional Zero Trust models often verify devices and credentials but fail to verify the actual person, creating an identity assurance gap.
Cloudflare is partnering with Nametag to integrate identity-verified onboarding and continuous assurance into the Cloudflare One SASE platform.
The integration uses OpenID Connect (OIDC) to chain Nametag as an identity provider or an external evaluation factor within Cloudflare Access.
Nametag's Deepfake Defense engine uses biometrics and AI to detect injection attacks and fabricated government IDs during authentication.
The system ensures that the person receiving and configuring a corporate device is the legitimate hire before granting access to sensitive resources.

#security

Read original

Cloudflare BlogMar 4, 2026

Moving from license plates to badges: the Gateway Authorization Proxy

Why it matters: This enables identity-based security for unmanaged devices without endpoint agents. Engineers can enforce granular policies and gain visibility in restricted environments like VDI or M&A, bridging the gap between network-level proxying and user-level identity.

Cloudflare introduced the Gateway Authorization Proxy to provide identity-based filtering for unmanaged devices without requiring client software.
The system replaces static IP-based identification with Cloudflare Access-style authentication using signed JWT cookies.
A domain-specific token generation process ensures seamless user authentication across different domains via Cloudflare's global edge network.
New Proxy Auto-Configuration (PAC) File Hosting allows teams to host and manage proxy settings directly on Cloudflare with built-in templates.
The solution supports multiple identity providers simultaneously, facilitating complex environments like mergers and acquisitions.
Identity-aware logging and granular policy enforcement are now possible for browser-based traffic on any device reaching the Internet.

#security #dist

Read original

Cloudflare BlogMar 4, 2026

Stop reacting to breaches and start preventing them with User Risk Scoring

Why it matters: It shifts security from static, binary login checks to continuous, adaptive authorization. By automating responses to behavioral risks and integrating third-party telemetry, engineers can reduce incident response times and prevent lateral movement without manual intervention.

Cloudflare One now incorporates User Risk Scores into Zero Trust Network Access (ZTNA) policies for real-time, behavior-based access control.
Risk scores are calculated continuously using internal telemetry from Cloudflare Access and Gateway, monitoring events like impossible travel and DLP violations.
The platform integrates third-party signals from security partners like CrowdStrike and SentinelOne to ingest external device posture and threat data.
Administrators can define deterministic risk levels and automate responses, such as requiring physical security keys or revoking access.
Integration with the Shared Signals Framework allows risk data to be synchronized with Identity Providers like Okta to secure the SSO ecosystem.

#security

Read original

Pinterest EngineeringMar 3, 2026

Unifying Ads Engagement Modeling Across Pinterest Surfaces

Why it matters: Consolidating fragmented ML models reduces technical debt and operational overhead while boosting performance through shared representations. This case study provides a blueprint for balancing architectural unification with the need for surface-specific specialization in large-scale systems.

Pinterest unified fragmented ads engagement models from Home Feed and Search into a single architecture to increase iteration velocity and reduce maintenance.
The unified model utilizes a multi-task learning design with surface-specific tower trees and calibration layers to handle distinct user intents across surfaces.
To mitigate latency increases from larger feature maps, the team implemented DCNv2 projection layers to compress Transformer outputs.
Infrastructure efficiency was improved via request-level broadcasting, fetching user embeddings once per unique user rather than per candidate pin.
The approach leverages shared representation learning, allowing surface-specific models to benefit from combined training data and complementary features.

#mlp #data

Read original

GitHub EngineeringMar 3, 2026

How we rebuilt the search architecture for high availability in GitHub Enterprise Server

Why it matters: This architectural shift eliminates common failure modes in high-availability setups where search indexes could become locked or corrupted during upgrades. By using native Cross Cluster Replication, engineers gain a more resilient, easier-to-maintain search infrastructure.

GitHub Enterprise Server (GHES) transitioned from a single multi-node Elasticsearch cluster to independent single-node clusters per instance.
The previous architecture allowed primary shards to migrate to read-only replica nodes, causing system locks during maintenance and upgrades.
The new architecture utilizes Elasticsearch’s Cross Cluster Replication (CCR) to synchronize data between independent clusters.
CCR ensures data durability by replicating information only after it has been persisted to the underlying Lucene segments.
A custom bootstrap workflow was developed to attach followers to existing indexes and configure auto-follow for future data.
This shift aligns search infrastructure with the standard leader/follower pattern used across the rest of the GHES platform.

#sre #dist #data

Read original

Page 4 of 44

Prev 1 2 3 4 5 6...44 Next