Explore the latest engineering posts and summaries

Search by topic, company, or concept and scan results quickly.

Posts indexed431

Last indexedMar 14, 2026

Cloudflare BlogFeb 13, 2026

Shedding old code with ecdysis: graceful restarts for Rust services at Cloudflare

Why it matters: Graceful restarts are critical for high-availability services where even millisecond outages cause millions of failed requests. ecdysis provides a battle-tested Rust implementation for zero-downtime upgrades, ensuring continuous connection handling during security patches and deployments.

Cloudflare open-sourced ecdysis, a Rust library for zero-downtime graceful restarts of high-traffic network services.
It solves the orphaned connection problem inherent in SO_REUSEPORT by passing socket file descriptors directly between processes.
The mechanism uses fork() and execve(), allowing the new process to inherit listening sockets via a named pipe.
It ensures crash safety: if a new version fails during initialization, the existing process continues serving traffic without interruption.
The library integrates natively with the Tokio async runtime and supports systemd-notify for seamless service management.

#sre #dist

Read original

Netflix Tech BlogFeb 13, 2026

Scaling LLM Post-Training at Netflix

Why it matters: Scaling LLM post-training requires solving complex distributed systems problems like GPU synchronization. This framework allows engineers to focus on model innovation rather than infrastructure, enabling faster iteration on domain-specific AI experiences at scale.

Netflix developed an internal Post-Training Framework to abstract infrastructure complexity for LLM alignment tasks like SFT, DPO, and Reinforcement Learning.
The framework addresses data engineering hurdles including precise loss masking for chat templates and efficient sequence packing to minimize GPU idle time.
It utilizes PyTorch FSDP and Ray to manage distributed state and orchestrate multi-node GPU clusters for models that exceed single-device memory.
The architecture supports complex RL workflows by interleaving rollout generation with policy updates across decoupled Ray actors.
Modular components for data, model, compute, and workflow allow developers to customize architectures and vocabularies for domain-specific Netflix use cases.

#mlp #dist #data

Read original

GitHub EngineeringFeb 12, 2026

Welcome to the Eternal September of open source. Here’s what we plan to do for maintainers.

Why it matters: As AI and low-friction tools flood open source with low-quality contributions, maintainer burnout is rising. GitHub's new features aim to restore balance by giving maintainers better tools to filter noise, manage PR volume, and protect the sustainability of the open-source ecosystem.

Open source is facing an 'Eternal September' where the volume of low-quality contributions, often AI-generated, exceeds maintainer review capacity.
The cost to create pull requests and security reports has dropped significantly, while the cost to review them remains a high manual burden.
GitHub is introducing features like pinned comments and noise-reduction banners to minimize 'low-value' interactions like '+1' comments.
Performance improvements to PR diffs and issue navigation aim to reduce the time maintainers spend on administrative triage.
New repo-level controls will allow maintainers to limit PR creation to collaborators or delete spam PRs directly from the UI.
The industry is seeing a shift toward 'invitation-only' contribution models to restore trust and ensure high-quality engagement.

#culture #security

Read original

Dropbox Tech BlogFeb 12, 2026

How low-bit inference enables efficient AI

Why it matters: As AI models scale to trillions of parameters, low-bit inference is essential for maintaining low latency and cost-efficiency. It allows engineers to deploy sophisticated models on existing hardware by optimizing memory usage and maximizing throughput via specialized GPU cores.

Low-bit inference reduces memory and compute requirements by decreasing numerical precision during model serving.
Large-scale models like Kimi-K2.5 (1T parameters) require these optimizations to manage energy and hardware constraints.
Compute costs in attention-based models are driven by matrix multiplications in linear layers and the attention mechanism.
Specialized hardware, such as NVIDIA Tensor Cores and AMD Matrix Cores, doubles throughput when precision is halved.
Quantization is critical for delivering responsive, cost-effective AI features like search and summarization in production.

#mlp #finops

Read original

Microsoft Azure BlogFeb 12, 2026

The data behind the design: How Pantone built agentic AI with an AI-ready database

Why it matters: Pantone's approach provides a blueprint for scaling niche domain expertise via agentic AI. It demonstrates how a multi-agent architecture supported by a robust NoSQL database like Azure Cosmos DB can transform static data into interactive, high-value creative tools.

Pantone implemented a multi-agent AI architecture to digitize and scale decades of proprietary color science and trend expertise.
The system features specialized agents, including a "chief color scientist" agent, to manage complex reasoning and context-aware responses.
Azure Cosmos DB serves as the foundational database, providing the low-latency retrieval needed for real-time conversational context and history.
The Palette Generator uses these agents to transform abstract user prompts into curated, data-driven color palettes instantly.
The project highlights a shift from basic LLM prompting to sophisticated orchestration where databases play a critical role in agentic reasoning.

#data #mlp

Read original

Netflix Tech BlogFeb 12, 2026

Automating RDS Postgres to Aurora Postgres Migration

Why it matters: This migration strategy demonstrates how to handle large-scale database transitions with minimal downtime and zero data loss. It provides a blueprint for automating complex stateful migrations in a self-service manner while maintaining strict security and operational standards.

Netflix standardized on Amazon Aurora PostgreSQL to leverage its cloud-native architecture for scalability and high availability across 400+ clusters.
The team developed a self-service migration workflow to automate the transition from RDS PostgreSQL, reducing manual effort and human error.
They utilized the Aurora Read Replica migration technique, which minimizes downtime by maintaining continuous replication until the final cutover.
To ensure zero data loss without direct credential access, Netflix implemented a control-plane solution that revokes IAM-based database access to quiesce traffic.
The automated pipeline handles complex tasks including parameter group migration, read replica setup, and data parity validation.

#data #sre

Read original

Cloudflare BlogFeb 12, 2026

Introducing Markdown for Agents

Why it matters: As AI agents become primary web consumers, optimizing content for them is crucial. This feature reduces LLM token costs by 80% and simplifies data ingestion pipelines, making it easier to build efficient, agent-friendly applications at the edge.

Cloudflare introduced 'Markdown for Agents' to automatically convert HTML content to Markdown in real-time for AI crawlers.
Converting HTML to Markdown can reduce token usage by approximately 80%, lowering costs and processing complexity for AI pipelines.
The feature leverages HTTP content negotiation, enabling agents to request Markdown via the 'Accept: text/markdown' header.
Responses include an 'x-markdown-tokens' header to help developers manage context windows and chunking strategies.
The service integrates with the Content Signals framework to define how content is used for AI training and search.

#mlp #dist #finops

Read original

Cloudflare BlogFeb 12, 2026

Introducing Markdown for Agents

Why it matters: As AI agents become primary web consumers, serving raw HTML is inefficient and costly. This feature treats agents as first-class citizens, drastically reducing LLM token costs and improving parsing accuracy by providing clean, structured data directly at the network edge.

Cloudflare introduced 'Markdown for Agents,' a feature that automatically converts HTML content to Markdown in real-time at the edge.
Markdown significantly reduces token consumption by up to 80% compared to HTML, optimizing costs and context window usage for LLMs.
The feature utilizes standard HTTP content negotiation, allowing AI agents to request Markdown via the 'Accept: text/markdown' header.
Responses include an 'x-markdown-tokens' header to help developers manage context windows and chunking strategies effectively.
The system integrates with the Content Signals framework to define how content should be used for AI training and search indexing.

#dist #mlp #finops

Read original

GitHub EngineeringFeb 11, 2026

GitHub availability report: January 2026

Why it matters: This report highlights the risks of major infrastructure upgrades and model configuration changes in high-scale environments. It underscores the importance of robust rollback procedures and the need for load testing to detect resource contention before production deployment.

GitHub Copilot experienced a significant outage on January 13 due to a configuration error during a model update, peaking at 100% error rates.
The Copilot recovery was delayed by secondary availability issues with upstream provider OpenAI's GPT-4.1 model.
On January 15, a major version upgrade to data store infrastructure caused resource contention, leading to widespread latency across GitHub services.
The infrastructure incident impacted 1.8% of web and API requests, primarily affecting unauthenticated users through slow queries and timeouts.
Both incidents were mitigated via rollbacks to previous stable versions while GitHub works on improved high-load validation and configuration safeguards.

#sre #data #mlp

Read original

Microsoft Azure BlogFeb 11, 2026

Agentic cloud operations: A new way to run the cloud

Why it matters: As cloud complexity outpaces human capacity, agentic operations allow engineers to move from manual toil to high-level orchestration. By automating context-aware diagnosis and remediation, teams can maintain reliability and efficiency at the scale required for modern AI workloads.

Agentic cloud operations shift from manual, dashboard-centric management to dynamic, AI-driven systems that correlate signals and take autonomous actions.
Azure Copilot serves as the central agentic interface, integrating with subscriptions, resources, and policies to provide context-aware operational intelligence.
Specialized agents cover the full cloud lifecycle, including migration planning, infrastructure-as-code generation, and automated deployment validation.
Real-time observability and troubleshooting agents accelerate root cause analysis by diagnosing health signals across the full stack and recommending fixes.
Resiliency and optimization agents continuously identify gaps in recovery configurations and execute cost-saving or performance-enhancing adjustments.

#sre #mlp #finops

Read original

Page 10 of 44

Prev 1...8 9 10 11 12...44 Next