Explore the latest engineering posts and summaries

Search by topic, company, or concept and scan results quickly.

Posts indexed815

Last indexedJul 29, 2026

Salesforce EngineeringJan 27, 2026

Inside Salesforce Edge: Automating Global Rollback for 1.5 Trillion Requests in 10 Minutes

Why it matters: For global-scale perimeter services, traditional sequential rollbacks are too slow. This architecture demonstrates how to achieve 10-minute global recovery through warm-standby blue-green deployments and synchronized autoscaling, ensuring high availability for trillions of requests.

Salesforce Edge manages a global perimeter platform handling 1.5 trillion monthly requests across 21+ points of presence.
Transitioned from sequential regional rollbacks taking up to 12 hours to a global blue-green model that recovers in 10 minutes.
Implemented parallel blue and green Kubernetes deployments to maintain a warm standby fleet capable of immediate full-load handling.
Customized Horizontal Pod Autoscalers (HPA) to ensure the inactive fleet scales identically to the active fleet, preventing capacity mismatches.
Automated traffic redirection using native Kubernetes labels and selectors instead of external L7 routing tools like Argo.
Integrated TCP connection draining and controlled traffic cutover to preserve four-nines availability during global rollback events.

#sre #dist #security

Read original

GitHub EngineeringJan 27, 2026

Help shape the future of open source in Europe

Why it matters: This initiative influences how open source projects are funded and regulated in the EU. Developer input ensures policies support both commercial growth and the maintenance of critical non-commercial libraries essential to the global software ecosystem.

The European Commission is developing the "Towards European Open Digital Ecosystems" strategy to provide funding and a strategic framework for the open source sector.
The initiative focuses on strengthening technological sovereignty in critical areas such as AI, cloud computing, and cybersecurity.
GitHub advocates for a European Sovereign Tech Fund to support the maintenance of essential libraries and programming languages.
The strategy aims to improve public procurement and capital access for OSS businesses while ensuring the sustainability of non-commercial projects.
Developers and maintainers are invited to provide feedback to the European Commission by February 3 to shape future digital policy.

#culture #security

Read original

Cloudflare BlogJan 27, 2026

Building a serverless, post-quantum Matrix homeserver

Why it matters: This proof of concept demonstrates how to transform heavy, stateful communication protocols into serverless architectures. It reduces operational overhead and costs to near zero while future-proofing security with post-quantum encryption at the edge.

Ported the Matrix homeserver protocol to Cloudflare Workers using TypeScript and the Hono framework.
Replaced traditional stateful infrastructure with serverless primitives: D1 for SQL, KV for caching, R2 for media, and Durable Objects for state resolution.
Achieved a scale-to-zero cost model, eliminating the fixed overhead of running dedicated virtual private servers.
Integrated post-quantum cryptography by default using hybrid X25519MLKEM768 key agreement for TLS 1.3 connections.
Leveraged Cloudflare's global edge network to reduce latency by executing homeserver logic in over 300 locations.
Maintained end-to-end encryption (Megolm) while adding a quantum-resistant transport layer for defense-in-depth.

#dist #security #finops

Read original

Netflix Tech BlogJan 26, 2026

The AI Evolution of Graph Search at Netflix

Why it matters: Translating natural language to complex DSLs reduces friction for subject matter experts interacting with massive, federated datasets. This approach bridges the gap between intuitive human intent and rigid technical schemas, improving productivity across hundreds of enterprise applications.

Netflix is evolving its Graph Search platform to support natural language queries using Large Language Models (LLMs).
The system translates ambiguous user input into a structured Filter Domain Specific Language (DSL) for federated GraphQL data.
Accuracy is maintained by ensuring syntactic, semantic, and pragmatic correctness through schema validation and controlled vocabularies.
The architecture utilizes Retrieval-Augmented Generation (RAG) to provide domain-specific data processing without replacing existing UIs.
Pre-processing and context engineering are critical to prevent LLM hallucinations and ensure fields match the underlying index.

#data #mlp #dist

Read original

GitHub EngineeringJan 26, 2026

Power agentic workflows in your terminal with GitHub Copilot CLI

Why it matters: GitHub Copilot CLI brings agentic AI to the terminal, bridging the gap between IDEs and system-level tasks. By automating environment setup, debugging, and GitHub interactions via MCP, it significantly boosts developer velocity and reduces the cognitive load of manual CLI operations.

GitHub Copilot CLI enables agentic AI workflows directly within the terminal, reducing context switching between IDEs and command-line environments.
The tool automates complex terminal tasks such as repository cloning, dependency management, and process troubleshooting like identifying and killing PIDs.
It supports multimodal capabilities, allowing users to upload screenshots of UI bugs for automated analysis and suggested code fixes.
Integration with the Model Context Protocol (MCP) allows the CLI to interact with custom agents for specialized tasks like accessibility reviews or security audits.
Developers can query GitHub-specific data, such as open issues or PRs, and delegate multi-step tasks to coding agents without leaving the command line.

#mlp #culture

Read original

Microsoft Azure BlogJan 26, 2026

Maia 200: The AI accelerator built for inference

Why it matters: Maia 200 represents a shift toward custom first-party silicon optimized for LLM inference. It offers engineers high-performance FP4/FP8 compute and a flexible software stack, significantly reducing the cost and latency of deploying massive models like GPT-5.2 at scale.

Maia 200 is built on a TSMC 3nm process, featuring 140 billion transistors and delivering 10 petaFLOPS of FP4 and 5 petaFLOPS of FP8 performance.
The memory architecture utilizes 216GB of HBM3e at 7 TB/s alongside 272MB of on-chip SRAM to maximize token generation throughput.
It employs a custom Ethernet-based scale-up network providing 2.8 TB/s of bidirectional bandwidth for clusters of up to 6,144 accelerators.
The software ecosystem includes the Maia SDK with a Triton compiler, PyTorch integration, and a low-level programming language (NPL).
Engineered for efficiency, it achieves 30% better performance per dollar than existing hardware for models like GPT-5.2 and synthetic data generation.

#mlp #dist #finops

Read original

Cloudflare BlogJan 26, 2026

Cable cuts, storms, and DNS: a look at Internet disruptions in Q4 2025

Why it matters: Understanding global connectivity disruptions helps engineers build more resilient, multi-homed architectures. It highlights the fragility of physical infrastructure like submarine cables and the impact of BGP routing and government policy on service availability.

Q4 2025 saw over 180 global Internet disruptions caused by government mandates, physical infrastructure damage, and technical failures.
Tanzania implemented a near-total Internet shutdown during its presidential election, resulting in a 90% traffic drop and fluctuations in BGP address space announcements.
Submarine cable cuts, specifically to the PEACE and WACS systems, significantly impacted connectivity in Pakistan and Cameroon.
Infrastructure vulnerabilities in Haiti led to multiple outages for Digicel users due to international fiber optic cuts.
Beyond physical damage, disruptions were linked to hyperscaler cloud platform issues and ongoing military conflicts affecting regional network stability.

#sre #dist #culture

Read original

Cloudflare BlogJan 23, 2026

Route leak incident on January 22, 2026

Why it matters: This incident highlights how minor automation errors in BGP policy configuration can cause global traffic disruptions. It underscores the risks of permissive routing filters and the importance of robust validation in network automation to prevent large-scale route leaks.

An automated routing policy change intended to remove IPv6 prefix advertisements for a Bogotá data center caused a major BGP route leak in Miami.
The removal of specific prefix lists from policy statements resulted in overly permissive terms, unintentionally redistributing peer routes to other providers.
The incident lasted 25 minutes, causing significant congestion on Miami backbone infrastructure and affecting both Cloudflare customers and external parties.
The leak was classified as a mixture of Type 3 and Type 4 route leaks according to RFC7908, violating standard valley-free routing principles.
Impact was limited to IPv6 traffic and was mitigated by manually reverting the configuration and pausing the automation platform.

#sre #dist

Read original

Salesforce EngineeringJan 22, 2026

How Agentforce, Data, and Apps Turned the Salesforce Stack into Agentforce 360

Why it matters: This article details the architectural shift from fragmented point solutions to a unified AI stack. It provides a blueprint for solving data consistency and metadata scaling challenges, essential for engineers building reliable, real-time agentic systems at enterprise scale.

Salesforce unified its data, agent, and application layers into the Agentforce 360 stack to ensure consistent context and reasoning across all surfaces.
The platform uses Data 360 as a universal semantic model, harmonizing signals from streaming, batch, and zero-copy sources into a single plane of glass.
Engineers addressed metadata scaling by treating metadata as data, enabling efficient indexing and retrieval for massive entity volumes.
A harmonization metamodel defines mappings and transformations to generate canonical customer profiles from heterogeneous data sources.
The architecture centralizes freshness and ingest control to maintain identical answers across different AI agents and applications.
Real-time event correlation is optimized to update unified context immediately while balancing storage costs for large-scale personalization.

#data #mlp #dist

Read original

Microsoft Azure BlogJan 22, 2026

Beyond boundaries: The future of Azure Storage in 2026

Why it matters: Azure Storage is shifting from passive storage to an active, AI-optimized platform. Engineers must understand these scale and performance improvements to architect systems capable of handling the high-concurrency, high-throughput demands of autonomous agents and LLM lifecycles.

Azure Storage is evolving into a unified platform supporting the full AI lifecycle, from frontier model training to large-scale inferencing and agentic applications.
Blob scaled accounts now support millions of objects across hundreds of scale units, enabling massive datasets for training and tuning.
Azure Managed Lustre (AMLFS) has expanded to support 25 PiB namespaces and 512 GBps throughput to maximize GPU utilization in high-performance computing.
Deep integration with frameworks like Microsoft Foundry, Ray, and LangChain facilitates seamless data grounding and low-latency context serving for RAG architectures.
Elastic SAN and Azure Container Storage (ACStor) are being optimized for 'agentic scale' to handle the high concurrency and query volume of autonomous agents.
New storage tiers and performance updates, such as Premium SSD v2 and Cold/Archive tiers for Azure Files, focus on reducing TCO for mission-critical workloads.

#data #mlp #dist

Read original

Page 53 of 82

Prev 1...51 52 53 54 55...82 Next