Explore the latest engineering posts and summaries

Search by topic, company, or concept and scan results quickly.

Posts indexed431

Last indexedMar 14, 2026

Pinterest EngineeringFeb 24, 2026

Piqama: Pinterest Quota Management Ecosystem

Why it matters: Managing resources at scale requires more than just hard limits. Piqama provides a unified framework for capacity and rate-limiting, enabling automated rightsizing and budget alignment. This reduces manual overhead while improving resource efficiency and system reliability across platforms.

Piqama is a unified quota management ecosystem at Pinterest handling physical resources, service limits like QPS, and application-specific units.
The platform manages the full quota lifecycle including schema definition, pluggable validation rules, and ownership-based authorization.
It supports both capacity-based quotas for Big Data workloads (integrated with Yunikorn) and rate-limiting for online storage services.
A centralized management portal provides visibility and self-service capabilities for quota updates and usage tracking.
Governance features include automated usage statistics collection via Apache Iceberg and an auto-rightsizing service for predictive resource allocation.
The system integrates with Pinterest's chargeback and entitlement systems to align resource consumption with financial budgets.

#sre #data #finops

Read original

GitHub EngineeringFeb 24, 2026

Multi-agent workflows often fail. Here’s how to engineer ones that don’t.

Why it matters: As LLMs move from chat to autonomous workflows, reliability depends on rigorous engineering. Applying distributed systems principles like typed contracts and schema enforcement prevents the subtle, cascading failures common in complex multi-agent orchestrations.

Treat multi-agent workflows as distributed systems rather than chat interfaces to manage shared state and non-deterministic behavior.
Implement typed schemas for all inter-agent communication to turn vague debugging into clear contract violation failures.
Use action schemas to constrain agent outputs to a strict, predefined set of valid operations like 'assign' or 'close'.
Adopt the Model Context Protocol (MCP) to enforce input and output validation for tools before execution occurs.
Design for failure by validating every agent boundary, logging intermediate states, and building in retry mechanisms.
Constrain agent actions and interfaces before adding more agents to a workflow to reduce the failure surface.

#dist #mlp

Read original

Netflix Tech BlogFeb 23, 2026

MediaFM: The Multimodal AI Foundation for Media Understanding at Netflix

Why it matters: MediaFM demonstrates how to scale multimodal foundation models for long-form video. By fusing audio, visual, and text signals with temporal context, it enables nuanced content understanding that improves recommendation cold starts, ad placement, and automated asset creation.

MediaFM is a tri-modal Transformer-based encoder that generates contextual embeddings for video shots by fusing audio, video, and text signals.
The model processes sequences of up to 512 shots, utilizing a global context token for title-level metadata to capture long-form narrative dependencies.
Input modalities include SeqCLIP for video frames, wav2vec2 for audio samples, and OpenAI's text-embedding-3-large for subtitles and closed captions.
Training employs a Masked Shot Modeling (MSM) objective, where the model predicts masked fused embeddings by minimizing cosine distance.
Optimization was performed using Muon for hidden parameters and AdamW for others, showing noticeable improvements in model performance.
Evaluation demonstrates that 'embedding in context'—extracting shots within their full episode sequence—significantly outperforms standalone clip embedding.
The foundation model supports diverse downstream applications including ad relevancy classification, clip popularity ranking, and automated tagging.

#mlp #data

Read original

Salesforce EngineeringFeb 23, 2026

Building an AI-Accelerated Compliance Automation Platform for 24x Faster Audits

Why it matters: Automating compliance reduces operational risk and engineering toil. By moving from fragile UI-driven workflows to API-first systems using AI-assisted development, teams can deliver audit-ready evidence 24x faster while maintaining high engineering standards.

Developed FastTrack, an API-first automation platform that replaced manual, screenshot-driven compliance audits for mobile app stores.
Achieved a 24x reduction in audit execution time by moving from fragile UI-based workflows to deterministic API integrations.
Utilized AI-assisted development to accelerate the path from system design to production, enabling rapid prototyping of validation logic.
Addressed data granularity gaps in the Google Play Console API by redefining evidence boundaries in collaboration with compliance stakeholders.
Embedded runtime validation and transparent API query logging to create verifiable, compliance-grade audit trails.

#mobile #security #culture

Read original

Cloudflare BlogFeb 23, 2026

Cloudflare One is the first SASE offering modern post-quantum encryption across the full platform

Why it matters: With NIST setting a 2030 deadline to deprecate classical encryption, engineers must adopt post-quantum standards now to prevent 'Harvest Now, Decrypt Later' attacks. This update provides built-in crypto agility for SASE, simplifying the transition to quantum-resistant networking.

Cloudflare One is the first SASE platform to implement post-quantum (PQ) encryption across its entire suite, including Secure Web Gateway, Zero Trust, and WAN.
The platform utilizes hybrid ML-KEM (Module-Lattice-based Key-Encapsulation Mechanism) alongside classical ECDHE to provide quantum-resistant key agreement.
Support extends to Cloudflare IPsec and the Cloudflare One Appliance, securing both site-to-site and outbound internet traffic against future quantum threats.
The implementation specifically targets 'Harvest Now, Decrypt Later' attacks by upgrading the key establishment phase of cryptographic handshakes.
Cloudflare One Appliance version 2026.2.0 is generally available with these upgrades, while the Cloudflare IPsec upgrade has entered closed beta.

#security #dist

Read original

Salesforce EngineeringFeb 21, 2026

From Audio to Action: How Speech Invocable Action Powers Native AI Automation Across Salesforce

Why it matters: This shift to native speech automation eliminates third-party security risks and simplifies complex AI integration. It demonstrates how to build resource-intensive AI features within a multi-tenant environment while maintaining strict data residency and platform stability.

Salesforce developed Speech Invocable Action to provide native, secure speech-to-text and translation within its platform trust boundary.
The architecture manages shared memory and compute resources to ensure stability across concurrent multi-tenant workloads.
Defensive design uses structured error categories, enabling developers to implement explicit fallback logic in Flows and Agentforce.
The team leveraged AI-assisted development tools like Claude Code to navigate complex internal APIs and accelerate production delivery.
Standardizing speech as a composable action removes the need for external integrations and boilerplate code for audio streaming.

#mlp #security #data

Read original

Cloudflare BlogFeb 21, 2026

Cloudflare outage on February 20, 2026

Why it matters: This incident highlights the risks of automated configuration propagation in global networks. It demonstrates how a single API change can trigger widespread BGP withdrawals and how software bugs can complicate recovery, emphasizing the need for 'fail small' deployment strategies.

A configuration change to the BYOIP management pipeline caused the unintentional withdrawal of approximately 1,100 BGP prefixes.
The outage lasted 6 hours and 7 minutes, impacting services including Magic Transit, Spectrum, and Dedicated Egress.
While many prefixes were restored by reverting the change, a software bug deleted edge configurations for 300 prefixes, requiring manual restoration.
The Addressing API's immediate propagation mechanism meant that the configuration error was distributed globally to the edge almost instantly.
Impacted customers experienced BGP Path Hunting, where connections repeatedly failed while trying to find valid routes to destination IPs.
Cloudflare is implementing a 'Fail Small' resilience plan to improve how these systems roll out changes and prevent global-scale failures.

#sre #dist

Read original

Cloudflare BlogFeb 20, 2026

Code Mode: give agents an entire API in 1,000 tokens

Why it matters: Code Mode solves the context window bottleneck for AI agents by replacing thousands of tool definitions with a programmable interface. This allows agents to interact with massive APIs efficiently and securely, significantly reducing token costs and latency while improving task performance.

Cloudflare's Code Mode reduces AI agent context usage by 99.9%, fitting the entire Cloudflare API into just 1,000 tokens.
It replaces thousands of individual tool definitions with two primary tools: search() for discovery and execute() for action.
Agents generate JavaScript code to interact with a typed SDK, enabling complex multi-step operations in a single round trip.
Execution occurs within a secure Dynamic Worker isolate, a V8 sandbox that prevents prompt injection leaks and unauthorized access.
This pattern allows agents to handle massive APIs that would otherwise exceed the context limits of modern foundation models.
Cloudflare has open-sourced a Code Mode SDK to facilitate this pattern in third-party MCP servers and AI agents.

#mlp #security #dist

Read original

Spotify EngineeringFeb 19, 2026

Our Multi-Agent Architecture for Smarter Advertising

Why it matters: This shift from monolithic AI features to a multi-agent architecture demonstrates how to scale complex ML systems. It provides a blueprint for managing autonomous components that collaborate to solve high-stakes business problems like ad optimization.

Spotify transitioned from simple AI features to a robust multi-agent architecture for their advertising platform.
The architecture addresses structural inefficiencies by delegating specialized tasks to autonomous agents.
This approach enables better decision-making and optimization across complex advertising workflows.
The system focuses on scalability and modularity, allowing for independent agent updates and improvements.
By using multi-agent systems, Spotify can handle high-dimensional data and real-time constraints more effectively.

#mlp #dist #data

Read original

GitHub EngineeringFeb 19, 2026

How AI is reshaping developer choice (and Octoverse data proves it)

Why it matters: AI is fundamentally reshaping the tech stack by favoring languages like TypeScript that provide better constraints for LLMs. Octoverse 2025 data shows that AI reduces the friction of complex syntax, making reliability and utility the primary drivers of developer choice over ease of use.

TypeScript has officially surpassed Python and JavaScript to become the most-used language on GitHub as of 2025.
AI tools like Copilot are reducing the 'complexity penalty' of languages, leading to a 66% year-over-year growth for TypeScript.
Strongly typed languages provide better constraints for LLMs, resulting in more reliable and contextually accurate code generation.
Shell scripting in AI-generated projects increased by 206%, suggesting AI is effectively handling high-friction boilerplate tasks.
Over 1.1 million public repositories now integrate LLM SDKs, marking a shift from experimental to mainstream AI adoption.
Developers are advised to establish architectural patterns manually before using AI, as LLMs amplify existing code quality patterns.

#culture #frontend #mlp

Read original

Page 8 of 44

Prev 1...6 7 8 9 10...44 Next