Curated topic

data

Posts tagged with data

Airbnb EngineeringMar 12, 2026

Recommending Travel Destinations to Help Users Explore

Why it matters: This approach demonstrates how to adapt NLP architectures for travel recommendations by balancing short-term intent with long-term history. It addresses the cold-start problem for dormant users while improving geolocation accuracy through multi-task learning.

Developed a Transformer-based framework that treats user actions like bookings, views, and searches as tokens to predict destination intent.
Integrated long-term historical interests with short-term contextual signals to capture both stable preferences and immediate travel needs.
Implemented a dual-training strategy to balance 'active' users with recent activity and 'dormant' users who haven't visited the platform recently.
Utilized multi-task learning with city-level and region-level prediction heads to improve the model's understanding of geolocation relationships.
Deployed the model in search autosuggest and abandoned search email notifications, leading to significant gains in bookings and user engagement.

#mlp #data

Read original

Salesforce EngineeringMar 11, 2026

Beyond CRM: How Salesforce Engineered an Enterprise Agent Platform for Any Workload

Why it matters: It demonstrates how to build a scalable, trust-first AI agent architecture. By integrating deterministic graphs with unstructured data and open standards like MCP, it provides a blueprint for enterprise-grade AI orchestration and governance beyond simple chat interfaces.

Salesforce is evolving its architecture into a general-purpose enterprise agent platform capable of handling non-CRM workloads through Agentforce and Data 360.
The platform uses AgentScript and AgentGraph to provide deterministic structure and orchestration for non-deterministic AI reasoning flows.
Data 360 acts as a unified context system, harmonizing structured and unstructured data with deep metadata enrichment for more accurate agent grounding.
A dedicated trust layer manages identity, credential context, and policy enforcement to protect against prompt injection and unauthorized data access.
The architecture supports open standards like Model Context Protocol (MCP) and Agent-to-Agent (A2A) for cross-platform tool invocation and orchestration.
Agents utilize short-term, long-term, and episodic memory combined with user personalization profiles to improve reasoning reliability.

#data #mlp #security

Read original

Cloudflare BlogMar 10, 2026

Building a security overview dashboard for actionable insights

Why it matters: Security teams are overwhelmed by data noise. This architecture demonstrates how to transform massive telemetry into prioritized, actionable insights using a distributed system of specialized microservices, reducing incident response times and closing critical configuration gaps.

Cloudflare revamped its Security Overview dashboard to prioritize actionable 'Security Action Items' over raw data visibility, reducing noise for security teams.
The system addresses the 'configuration gap' by surfacing the status of security tools, identifying when features are inactive or incorrectly set to 'Log Only' mode.
The backend architecture utilizes specialized microservices called 'checkers' that scale independently to process over 10 million insights daily.
Checkers operate via two mechanisms: scheduled deep-inspection tasks for complex configurations and real-time event handlers for immediate risk detection.
Deep-linking between the overview and analytics dashboards reduces the 'tab switching tax' by automatically applying relevant filters during incident investigation.
Action items are categorized by criticality and type, allowing defenders to move from reactive monitoring to proactive control of their security posture.

#security #dist #data

Read original

Cloudflare BlogMar 10, 2026

Investigating multi-vector attacks in Log Explorer

Why it matters: Engineers need holistic visibility to combat multi-vector attacks. By centralizing edge telemetry and Zero Trust events, teams can correlate disparate signals, significantly reducing detection time and improving forensic accuracy without managing complex log pipelines.

Cloudflare Log Explorer integrates 14 new datasets to provide 360-degree visibility across Application Services and Cloudflare One portfolios.
The platform correlates telemetry from HTTP requests, L3/L4 network logs, and Zero Trust events to identify sophisticated multi-vector attacks.
Zone-scoped logs capture edge interactions like DNS queries, WAF events, and Page Shield audits before traffic reaches origin servers.
Account-scoped logs track identity-based authentication, administrative changes, and device posture to monitor internal security health.
Centralized logging at the edge helps security analysts reduce Mean Time to Detect (MTTD) by providing a unified interface for deep-dive forensics.

#security #data

Read original

Salesforce EngineeringMar 9, 2026

Engineering Platform Trust: Cutting Customer Case Volume 20x with Petabyte-Scale Health Signals

Why it matters: This system demonstrates how to transform massive, fragmented telemetry into actionable insights. By standardizing health metrics and isolating analytics from production, engineers can proactively identify risks, reduce support overhead, and ensure platform stability at a petabyte scale.

Salesforce's Technical Health Score (THS) quantifies implementation health across five pillars: Security, Efficiency, Operational Excellence, Customization, and Observability.
The architecture processes petabytes of telemetry via an off-core analytics platform, ensuring zero impact on live transactional workloads.
Diverse metrics are normalized into a 1–100 scale using distribution-based methods to compare organizations against peers of similar complexity.
A signal-qualification framework filters for actionability, ensuring the score reflects customer-controlled configurations rather than platform-level issues.
This proactive approach has successfully reduced support case volume by 20x for customers who maintain high technical health scores.

#data #sre #security

Read original

Engineering at MetaMar 9, 2026

How Advanced Browsing Protection Works in Messenger

Why it matters: It demonstrates how to implement privacy-preserving security features in end-to-end encrypted environments. Engineers can learn how to balance cryptographic privacy primitives like PIR and OPRF with the practical performance requirements of large-scale real-time messaging.

Messenger's Advanced Browsing Protection (ABP) uses Private Information Retrieval (PIR) to check links against a malicious URL database without revealing user activity to the server.
The system employs Oblivious Pseudorandom Functions (OPRF) to ensure the server cannot see the specific content of the client's query during the lookup process.
To handle URL prefix matching for subpaths, the system groups links by domain rather than requiring exact matches, preventing multiple queries that could leak data.
ABP addresses the privacy-efficiency tradeoff by sharding the database into buckets, carefully managing the number of bits leaked to the server to optimize performance.
The architecture is designed to scale to millions of potentially malicious websites while maintaining low latency for users within end-to-end encrypted chats.

#security #dist #data

Read original

Pinterest EngineeringMar 6, 2026

Unified Context-Intent Embeddings for Scalable Text-to-SQL

Why it matters: Scaling Text-to-SQL in large enterprises fails with simple RAG due to schema complexity. By encoding historical analyst intent and governance metadata into embeddings, engineers can build agents that provide trustworthy, context-aware queries instead of just syntactically correct ones.

Pinterest evolved its Text-to-SQL system into a production Analytics Agent by focusing on analytical intent rather than just raw SQL syntax.
The system utilizes unified context-intent embeddings, which translate historical SQL queries into semantically rich natural language descriptions using LLMs.
A three-step pipeline injects domain context, such as glossary terms and metric definitions, before converting SQL to structured text summaries.
Retrieval is enhanced by structural and statistical patterns, extracting validated join keys and aggregation logic from historical query data.
A governance-aware ranking system prioritizes trustworthy data by incorporating table tiers, usage signals, and documentation quality from the PinCat catalog.
This approach addresses the challenges of a massive data warehouse by grounding AI outputs in patterns that have historically worked for human analysts.

#data #mlp

Read original

Netflix Tech BlogMar 6, 2026

Scaling Global Storytelling: Modernizing Localization Analytics at Netflix

Why it matters: Scaling localization requires moving from siloed data pipelines to a centralized architecture. By consolidating business logic and focusing on backend reliability, engineers reduce technical debt and ensure data consistency across global teams while unlocking granular user behavior insights.

Netflix modernized its localization analytics by consolidating fragmented pipelines and siloed dashboards into a unified backend architecture.
The team implemented a 'write once, read many' strategy, centralizing complex business logic into core tables to ensure data consistency across domains.
An audit of over 40 tools led to the prioritization of backend consolidation over frontend patches to reduce long-term maintenance burdens.
They addressed 'Not-So-Tech Debt' by improving the user experience and creating intuitive metrics like unified Language Asset Consumption.
Future initiatives include event-level analytics to capture granular data, such as subtitle reading speed, to optimize member engagement and style guidelines.

#data #culture

Read original

Cloudflare BlogMar 6, 2026

From the endpoint to the prompt: a unified data security vision in Cloudflare One

Why it matters: This unified approach addresses the 'endpoint-to-prompt' challenge, ensuring security policies follow data across tools and AI interfaces. For engineers, it simplifies visibility and control over sensitive information without sacrificing productivity or creating siloed security gaps.

Cloudflare One introduces browser-based RDP clipboard controls to manage data movement between local devices and remote sessions.
Operation mapping now enriches logs with specific SaaS actions, such as 'SendPrompt' in ChatGPT, to simplify policy authoring and forensic analysis.
Endpoint DLP is being integrated into the Cloudflare One Client to protect data in use, specifically monitoring and controlling OS clipboard activity.
New CASB integrations provide security scanning for Microsoft 365 Copilot, identifying sensitive data risks within AI-driven workflows.
The unified vision aims to secure data across its entire lifecycle: in transit, at rest, in use on endpoints, and at the AI prompt interface.

#security #data

Read original

Salesforce EngineeringMar 5, 2026

How Data 360 Optimized Kubernetes Scheduling Architecture, Delivering 13% Cost Savings

Why it matters: Optimizing Kubernetes scheduling for bursty Spark workloads resolves the conflict between cost efficiency and job stability. By moving from reactive consolidation to proactive bin-packing, engineers can achieve significant cost savings without triggering disruptive pod evictions.

Salesforce's Data 360 team optimized Kubernetes scheduling for Spark workloads, managing 2 million daily applications at global scale.
The default LeastAllocated strategy caused node fragmentation by spreading executors across the cluster, leaving many nodes underutilized.
Reactive autoscaling with Karpenter led to job instability, as evicting executors for consolidation triggered expensive Spark task retries.
The team implemented a custom scheduler using the MostAllocated scoring strategy via the NodeResourcesFit plugin to prioritize high-density bin-packing.
This proactive placement logic ensures executors are packed onto existing nodes before spinning up new capacity, reducing fragmentation.
The architectural shift delivered a 13% reduction in infrastructure costs while maintaining high reliability for critical data workloads.

#data #finops #dist

Read original

Page 15 of 33

Prev 1...13 14 15 16 17...33 Next