Curated topic

data

Posts tagged with data

Cloudflare BlogApr 28, 2026

Shutdowns, power outages, and conflict: a review of Q1 2026 Internet disruptions

Why it matters: Monitoring global disruptions helps engineers distinguish between application bugs and systemic infrastructure failures. These events underscore the importance of multi-region redundancy and the technical mechanisms, like BGP and filtering, that govern global internet reachability.

Uganda's election-related shutdown reduced domestic traffic from 72 Gbps to 1 Gbps, demonstrating the impact of state-mandated mobile network suspensions.
Iran's first Q1 shutdown involved a massive loss of announced IPv6 address space, specifically impacting ISPs like Asiatech and RASANA.
Technical analysis suggests Iran's second shutdown utilized filtering rather than BGP route withdrawals, as IP space remained announced despite traffic drops.
Physical infrastructure failures, including three grid collapses in Cuba and cable damage in Congo, caused significant regional connectivity gaps.
Military actions in Ukraine and the Middle East directly impacted hyperscaler cloud infrastructure and regional network stability.

#dist #security #data

Read original

Pinterest EngineeringApr 27, 2026

From Clicks to Conversions: Architecting Shopping Conversion Candidate Generation at Pinterest

Why it matters: Optimizing for sparse conversion events is a major challenge in ad tech. This architecture shows how to effectively combine sparse labels with dense engagement signals using parallel DCN v2 and multi-task learning to drive significant business value and advertiser RoAS.

Pinterest developed a dedicated candidate generation model to optimize for lower-funnel conversions, addressing the sparsity and noise of offsite purchase signals.
The architecture utilizes a two-tower model with parallel DCN v2 and MLP layers, decoupling the learning of explicit feature interactions from implicit abstract patterns.
To mitigate data sparsity, the model uses multi-task learning, supplementing conversion labels with log-weighted engagement data based on click duration.
Feature engineering combines real-time context via GraphSAGE embeddings with long-term user history processed through a Transformer.
A unified multi-task architecture with a single dot-product head was adopted to simplify retrieval while maintaining performance across multiple objectives.
Training incorporates hard negatives from non-engaged impressions to better reflect the actual distribution of served ads and improve model robustness.

#mlp #data

Read original

Netflix Tech BlogApr 24, 2026

Scaling Camera File Processing at Netflix

Why it matters: This article illustrates how to scale specialized domain workflows by integrating industry-standard tools into cloud-native infrastructure. It provides a blueprint for 'buy vs. build' decisions and demonstrates high-throughput media processing using distributed compute platforms.

Netflix developed the Media Production Suite (MPS) to automate media workflows and metadata management for global film productions.
The system integrates FilmLight's API (FLAPI) as a core image processing engine instead of building a custom solution from scratch.
MPS uses FLAPI to parse camera metadata, normalize it into a searchable schema, and ensure data integrity via ASC Media Hash Lists.
The platform automates VFX plate generation using open standards like ACES Metadata Files (AMF) and Framing Decision Lists (ASC FDL).
By containerizing FLAPI with Docker and deploying on the Cosmos platform, Netflix achieves massive horizontal scaling in the cloud.
The 'Media Processing Factory' approach moves heavy compute tasks from local workstations to cloud infrastructure for better reliability.

#dist #data

Read original

Spotify EngineeringApr 22, 2026

Background Coding Agents: Supercharging Downstream Consumer Dataset Migrations (Honk, Part 4)

Why it matters: Automating dataset migrations at scale reduces developer toil and prevents technical debt. By using background agents to update downstream consumers, organizations can accelerate infrastructure evolution without overwhelming product teams with manual migration tasks.

Spotify utilizes background coding agents to automate the migration of thousands of downstream dataset consumers.
The system integrates with Backstage and Fleet Management to track progress and manage automated pull requests across the organization.
Automation reduces manual toil for product teams by programmatically updating code references to new dataset versions.
The approach shifts the migration burden from data consumers to automated infrastructure, accelerating platform evolution.
Validation and automated testing are used to ensure that background code changes maintain the integrity of downstream data pipelines.

#data #sre #dist

Read original

Airbnb EngineeringApr 21, 2026

Building a fault-tolerant metrics storage system at Airbnb

Why it matters: Scaling observability for 1,000+ services requires balancing multi-tenant isolation with operational efficiency. Airbnb's approach to shuffle sharding and automated control planes provides a blueprint for building resilient, petabyte-scale metrics systems that avoid 'flying blind' during outages.

Airbnb built an internal metrics system managing 50M samples/sec and 1.3B active time series across 2.5PB of data.
Adopted service-based multi-tenancy to ensure stable grouping and precise resource attribution for over 1,000 services.
Implemented shuffle sharding to isolate tenant workloads, preventing localized failures or traffic spikes from impacting the entire fleet.
Developed a centralized control plane to automate tenant onboarding and dynamically manage ingestion limits and configurations.
Enhanced reliability using shadow clusters, write guardrails, and query sharding to normalize performance across variable read payloads.
Optimized storage compaction and query execution to maintain a p99 latency under 30 seconds for large-scale data requests.

#sre #dist #data

Read original

Engineering at MetaApr 21, 2026

Modernizing the Facebook Groups Search to Unlock the Power of Community Knowledge

Why it matters: This modernization shows how to scale semantic search for massive datasets. By combining hybrid retrieval with LLM-based evaluation, engineers can improve search relevance and engagement while overcoming the bottlenecks of manual labeling and keyword-matching limitations.

Re-architected Facebook Groups Search using a hybrid retrieval system that combines lexical pathways via Unicorn with semantic pathways via a Search Semantic Retriever (SSR).
Implemented a 12-layer, 200-million-parameter SSR model that encodes natural language into dense vectors for approximate nearest neighbor search using Faiss.
Deployed a Multi-Task Multi-Label (MTML) ranking architecture to simultaneously predict multiple engagement signals such as clicks, likes, and shares.
Integrated an automated model-based evaluation framework using Llama 3 as a judge to scale relevance scoring and accelerate the model development lifecycle.
Successfully addressed user friction points in content discovery and validation, leading to measurable improvements in search engagement and relevance.

#mlp #data

Read original

PlanetScale Tech BlogApr 21, 2026

Approaches to tenancy in Postgres

Why it matters: Choosing the right multi-tenancy model is critical for database scalability and security. This guide helps engineers avoid common pitfalls like RLS complexity or schema sprawl, favoring a performant shared-schema approach that scales to thousands of tenants.

Shared-schema is the recommended multi-tenancy approach, using a tenant_id column to isolate data within common tables.
Avoid schema-per-tenant or database-per-tenant models unless tenants require unique schema structures, as they complicate migrations.
Use BIGINT for tenant identifiers to ensure performance and stability compared to string-based IDs.
Lead most indexes with the tenant_id column to optimize query performance across large, multi-tenant tables.
Enforce tenant isolation at the application layer rather than relying on Postgres Row-Level Security (RLS) to avoid silent failures.
Leverage declarative partitioning with tenant_id as the partition key to manage large datasets and improve maintenance.

#data #sre

Read original

Pinterest EngineeringApr 20, 2026

Smarter URL Normalization at Scale: How MIQPS Powers Content Deduplication at Pinterest

Why it matters: Redundant processing of duplicate URLs wastes massive computational resources. This automated, data-driven approach to normalization reduces infrastructure costs and improves data quality by identifying content identity before expensive rendering or ingestion steps occur.

Pinterest developed MIQPS (Minimal Important Query Param Set) to automate URL normalization and content deduplication across millions of merchant domains.
The algorithm classifies query parameters as 'neutral' (noise like tracking IDs) or 'non-neutral' (content-defining like product IDs) through empirical testing.
URLs are grouped by parameter patterns to evaluate parameter importance within specific contexts, such as distinguishing between product and category pages.
The system uses visual content fingerprints to determine if stripping a parameter changes the rendered page, ensuring high-fidelity deduplication.
By normalizing URLs before ingestion, Pinterest significantly reduces redundant fetching, rendering, and computational waste in their media platform.

#data #dist #finops

Read original

Salesforce EngineeringApr 20, 2026

How Agentforce Lead Nurturing Agents Generated $100M+ Pipeline Under Rate-Limited Infrastructure

Why it matters: This article demonstrates how to build scalable, autonomous AI agent systems that overcome infrastructure constraints like rate limits. It provides a blueprint for moving from LLM prototypes to production-grade systems that drive significant business value through automated workflows.

Salesforce transitioned Sales Cloud from a system of record to a system of action using autonomous Agentforce Lead Nurturing agents.
The system automates lead outreach, qualification, and scheduling by monitoring inbound signals in real-time without human triggers.
Engineers implemented deterministic workflows on top of LLM generation to maintain response consistency and contextual relevance.
A centralized queue-based architecture regulates dispatch across execution contexts to manage strict LLM and email rate limits.
The internal deployment successfully generated over $100M in pipeline and 10,000 opportunities through horizontal scaling.
The architecture eliminates response latency by processing leads continuously, overcoming the capacity limits of manual sales workflows.

#mlp #dist #data

Read original

Cloudflare BlogApr 17, 2026

Introducing the Agent Readiness score. Is your site agent-ready?

Why it matters: As AI agents become primary web consumers, sites must transition from human-centric to machine-readable formats. Adopting these standards ensures content is accurately indexed by LLMs, reduces scraping overhead, and enables automated agentic workflows and commerce.

Cloudflare launched isitagentready.com, a tool that audits websites for AI agent compatibility across discoverability, content, access control, and capabilities.
A new Cloudflare Radar dataset tracks global adoption of AI standards, revealing that while 78% of sites use robots.txt, fewer than 4% support Markdown content negotiation.
The readiness score evaluates support for emerging standards like the Model Context Protocol (MCP), API Catalogs (RFC 9727), and Web Bot Auth.
Cloudflare overhauled its own developer documentation to serve as a model for agent-friendly design, utilizing Markdown and structured metadata to lower LLM processing costs.
The audit tool provides specific prompts that developers can give to coding agents to automatically implement missing standards and improve site scores.
The initiative introduces support for agentic commerce protocols, including x402 and the Universal Commerce Protocol (UCP), to facilitate automated transactions.

#mlp #data #dist

Read original

Page 1 of 24

Prev1 2 3...24 Next