Curated topic

data

Posts tagged with data

PlanetScale Tech BlogMar 13, 2026

Scaling Postgres connections with PgBouncer

Why it matters: Postgres's process-per-connection model limits scalability for modern apps needing thousands of concurrent connections. PgBouncer is the industry-standard solution to prevent resource exhaustion and context-switching overhead, ensuring database stability under high load.

PgBouncer addresses the resource overhead of Postgres's process-per-connection architecture by multiplexing many client connections onto a smaller pool of server connections.
Transaction pooling is the recommended mode for most applications, as it releases server connections back to the pool immediately after a COMMIT or ROLLBACK.
The configuration hierarchy includes max_client_conn for incoming traffic and default_pool_size for outgoing connections to the database.
Total potential connections to Postgres are calculated as the number of pools multiplied by the pool size, which must remain below Postgres's max_connections.
PlanetScale offers local, dedicated primary, and dedicated replica PgBouncer configurations to optimize for high availability and read-heavy workloads.
Using PgBouncer allows for thousands of simultaneous client connections while keeping the database's forked process count low and manageable.

#data #sre

Read original

Spotify EngineeringMar 12, 2026

Inside the Archive: The Tech Behind Your 2025 Wrapped Highlights

Why it matters: This demonstrates how to turn massive datasets into personalized user experiences at scale, a key challenge for data-intensive consumer applications.

Spotify leverages large-scale data processing to identify unique listening patterns and 'moments' from a user's year.
The system transforms raw streaming logs into personalized narrative highlights using automated storytelling engines.
Scalability is a core challenge, as the system must process data for hundreds of millions of users simultaneously.
The architecture involves complex data pipelines and machine learning models to categorize and rank listening events.

#data #dist #mlp

Read original

Airbnb EngineeringMar 12, 2026

Recommending Travel Destinations to Help Users Explore

Why it matters: This approach demonstrates how to adapt NLP architectures for travel recommendations by balancing short-term intent with long-term history. It addresses the cold-start problem for dormant users while improving geolocation accuracy through multi-task learning.

Developed a Transformer-based framework that treats user actions like bookings, views, and searches as tokens to predict destination intent.
Integrated long-term historical interests with short-term contextual signals to capture both stable preferences and immediate travel needs.
Implemented a dual-training strategy to balance 'active' users with recent activity and 'dormant' users who haven't visited the platform recently.
Utilized multi-task learning with city-level and region-level prediction heads to improve the model's understanding of geolocation relationships.
Deployed the model in search autosuggest and abandoned search email notifications, leading to significant gains in bookings and user engagement.

#mlp #data

Read original

Salesforce EngineeringMar 11, 2026

Beyond CRM: How Salesforce Engineered an Enterprise Agent Platform for Any Workload

Why it matters: It demonstrates how to build a scalable, trust-first AI agent architecture. By integrating deterministic graphs with unstructured data and open standards like MCP, it provides a blueprint for enterprise-grade AI orchestration and governance beyond simple chat interfaces.

Salesforce is evolving its architecture into a general-purpose enterprise agent platform capable of handling non-CRM workloads through Agentforce and Data 360.
The platform uses AgentScript and AgentGraph to provide deterministic structure and orchestration for non-deterministic AI reasoning flows.
Data 360 acts as a unified context system, harmonizing structured and unstructured data with deep metadata enrichment for more accurate agent grounding.
A dedicated trust layer manages identity, credential context, and policy enforcement to protect against prompt injection and unauthorized data access.
The architecture supports open standards like Model Context Protocol (MCP) and Agent-to-Agent (A2A) for cross-platform tool invocation and orchestration.
Agents utilize short-term, long-term, and episodic memory combined with user personalization profiles to improve reasoning reliability.

#data #mlp #security

Read original

Cloudflare BlogMar 10, 2026

Building a security overview dashboard for actionable insights

Why it matters: Security teams are overwhelmed by data noise. This architecture demonstrates how to transform massive telemetry into prioritized, actionable insights using a distributed system of specialized microservices, reducing incident response times and closing critical configuration gaps.

Cloudflare revamped its Security Overview dashboard to prioritize actionable 'Security Action Items' over raw data visibility, reducing noise for security teams.
The system addresses the 'configuration gap' by surfacing the status of security tools, identifying when features are inactive or incorrectly set to 'Log Only' mode.
The backend architecture utilizes specialized microservices called 'checkers' that scale independently to process over 10 million insights daily.
Checkers operate via two mechanisms: scheduled deep-inspection tasks for complex configurations and real-time event handlers for immediate risk detection.
Deep-linking between the overview and analytics dashboards reduces the 'tab switching tax' by automatically applying relevant filters during incident investigation.
Action items are categorized by criticality and type, allowing defenders to move from reactive monitoring to proactive control of their security posture.

#security #dist #data

Read original

Cloudflare BlogMar 10, 2026

Investigating multi-vector attacks in Log Explorer

Why it matters: Engineers need holistic visibility to combat multi-vector attacks. By centralizing edge telemetry and Zero Trust events, teams can correlate disparate signals, significantly reducing detection time and improving forensic accuracy without managing complex log pipelines.

Cloudflare Log Explorer integrates 14 new datasets to provide 360-degree visibility across Application Services and Cloudflare One portfolios.
The platform correlates telemetry from HTTP requests, L3/L4 network logs, and Zero Trust events to identify sophisticated multi-vector attacks.
Zone-scoped logs capture edge interactions like DNS queries, WAF events, and Page Shield audits before traffic reaches origin servers.
Account-scoped logs track identity-based authentication, administrative changes, and device posture to monitor internal security health.
Centralized logging at the edge helps security analysts reduce Mean Time to Detect (MTTD) by providing a unified interface for deep-dive forensics.

#security #data

Read original

Salesforce EngineeringMar 9, 2026

Engineering Platform Trust: Cutting Customer Case Volume 20x with Petabyte-Scale Health Signals

Why it matters: This system demonstrates how to transform massive, fragmented telemetry into actionable insights. By standardizing health metrics and isolating analytics from production, engineers can proactively identify risks, reduce support overhead, and ensure platform stability at a petabyte scale.

Salesforce's Technical Health Score (THS) quantifies implementation health across five pillars: Security, Efficiency, Operational Excellence, Customization, and Observability.
The architecture processes petabytes of telemetry via an off-core analytics platform, ensuring zero impact on live transactional workloads.
Diverse metrics are normalized into a 1–100 scale using distribution-based methods to compare organizations against peers of similar complexity.
A signal-qualification framework filters for actionability, ensuring the score reflects customer-controlled configurations rather than platform-level issues.
This proactive approach has successfully reduced support case volume by 20x for customers who maintain high technical health scores.

#data #sre #security

Read original

Engineering at MetaMar 9, 2026

How Advanced Browsing Protection Works in Messenger

Why it matters: It demonstrates how to implement privacy-preserving security features in end-to-end encrypted environments. Engineers can learn how to balance cryptographic privacy primitives like PIR and OPRF with the practical performance requirements of large-scale real-time messaging.

Messenger's Advanced Browsing Protection (ABP) uses Private Information Retrieval (PIR) to check links against a malicious URL database without revealing user activity to the server.
The system employs Oblivious Pseudorandom Functions (OPRF) to ensure the server cannot see the specific content of the client's query during the lookup process.
To handle URL prefix matching for subpaths, the system groups links by domain rather than requiring exact matches, preventing multiple queries that could leak data.
ABP addresses the privacy-efficiency tradeoff by sharding the database into buckets, carefully managing the number of bits leaked to the server to optimize performance.
The architecture is designed to scale to millions of potentially malicious websites while maintaining low latency for users within end-to-end encrypted chats.

#security #dist #data

Read original

Pinterest EngineeringMar 6, 2026

Unified Context-Intent Embeddings for Scalable Text-to-SQL

Why it matters: Scaling Text-to-SQL in large enterprises fails with simple RAG due to schema complexity. By encoding historical analyst intent and governance metadata into embeddings, engineers can build agents that provide trustworthy, context-aware queries instead of just syntactically correct ones.

Pinterest evolved its Text-to-SQL system into a production Analytics Agent by focusing on analytical intent rather than just raw SQL syntax.
The system utilizes unified context-intent embeddings, which translate historical SQL queries into semantically rich natural language descriptions using LLMs.
A three-step pipeline injects domain context, such as glossary terms and metric definitions, before converting SQL to structured text summaries.
Retrieval is enhanced by structural and statistical patterns, extracting validated join keys and aggregation logic from historical query data.
A governance-aware ranking system prioritizes trustworthy data by incorporating table tiers, usage signals, and documentation quality from the PinCat catalog.
This approach addresses the challenges of a massive data warehouse by grounding AI outputs in patterns that have historically worked for human analysts.

#data #mlp

Read original

Netflix Tech BlogMar 6, 2026

Scaling Global Storytelling: Modernizing Localization Analytics at Netflix

Why it matters: Scaling localization requires moving from siloed data pipelines to a centralized architecture. By consolidating business logic and focusing on backend reliability, engineers reduce technical debt and ensure data consistency across global teams while unlocking granular user behavior insights.

Netflix modernized its localization analytics by consolidating fragmented pipelines and siloed dashboards into a unified backend architecture.
The team implemented a 'write once, read many' strategy, centralizing complex business logic into core tables to ensure data consistency across domains.
An audit of over 40 tools led to the prioritization of backend consolidation over frontend patches to reduce long-term maintenance burdens.
They addressed 'Not-So-Tech Debt' by improving the user experience and creating intuitive metrics like unified Language Asset Consumption.
Future initiatives include event-level analytics to capture granular data, such as subtitle reading speed, to optimize member engagement and style guidelines.

#data #culture

Read original

Page 1 of 19

Prev1 2 3...19 Next