Curated topic

data

Posts tagged with data

Cloudflare BlogJan 13, 2026

What we know about Iran’s Internet shutdown

Why it matters: Understanding how nation-states manipulate BGP and IP announcements to enforce shutdowns is crucial for engineers building resilient, global systems. It highlights the vulnerability of centralized network infrastructure and the importance of monitoring tools like Cloudflare Radar.

Iran implemented a near-total internet shutdown starting January 8, 2026, following widespread civil protests.
Cloudflare Radar observed a 98.5% drop in announced IPv6 address space, signaling a deliberate disruption of routing paths.
Overall traffic volume plummeted by 90% within a 30-minute window as major ISPs like MCCI, IranCell, and TCI went offline.
By 18:45 UTC on January 8, internet traffic from the country reached effectively zero, indicating a complete disconnection from the global web.
Brief spikes in DNS traffic (1.1.1.1) and university network connectivity were observed on January 9 before being shut down again.

#data #security #dist

Read original

Microsoft Azure BlogJan 11, 2026

Bridging the gap between AI and medicine: Claude in Microsoft Foundry advances capabilities for healthcare and life sciences customers

Why it matters: This integration enables engineers to build specialized AI agents for highly regulated sectors. By combining Claude's reasoning with domain-specific MCPs and Azure's secure infrastructure, teams can automate complex medical reasoning and R&D tasks while maintaining strict compliance.

Anthropic and Microsoft launched Claude for Healthcare and Life Sciences on Microsoft Foundry, offering domain-specific AI agents for complex medical workflows.
The platform utilizes Model Context Protocols (MCPs) and specialized connectors to integrate Claude with scientific databases and clinical systems.
Healthcare features automate administrative tasks like prior authorization and claims appeals using advanced reasoning and evidence synthesis.
Life sciences capabilities support bioinformatics, experimental protocol design, and molecular design via code interpreter workflows.
The solution is built on Azure’s HIPAA-ready infrastructure, ensuring enterprise-grade security and biosafety guardrails for regulated environments.

#mlp #security #data

Read original

Spotify EngineeringJan 7, 2026

Why We Use Separate Tech Stacks for Personalization and Experimentation

Why it matters: Separating these stacks allows engineering teams to optimize for specific performance and reliability needs. It reduces architectural complexity, ensuring that ML-driven personalization doesn't compromise the statistical validity of A/B testing frameworks.

Spotify maintains distinct technical stacks for personalization and experimentation to address their unique operational requirements.
Personalization systems are optimized for low-latency model inference and high-throughput content delivery.
Experimentation infrastructure focuses on statistical validity, randomized assignment, and unbiased metric analysis.
Decoupling these domains prevents architectural complexity and avoids the pitfalls of a monolithic 'one-size-fits-all' solution.
Independent stacks allow teams to scale infrastructure based on specific data lifecycles and performance bottlenecks.

#mlp #data #dist

Read original

Salesforce EngineeringJan 7, 2026

Migration at Scale: Moving Marketing Cloud Caching from Memcached to Redis at 1.5M RPS Without Downtime

Why it matters: This migration provides a blueprint for modernizing stateful infrastructure at massive scale. It demonstrates how to achieve engine-level transitions without downtime or application changes while maintaining sub-millisecond performance and high availability.

Successfully migrated Marketing Cloud's caching layer from Memcached to Redis Cluster at 1.5M RPS with zero downtime.
Implemented a Dynamic Cache Router to enable percentage-based traffic shifts and double-writes for cache warm-up without application code changes.
Addressed functional parity risks by standardizing TTL semantics and key-handling behaviors across more than 50 distinct services.
Utilized service grouping by key ownership to prevent split-brain scenarios and data inconsistencies during the transition.
Maintained strict performance SLAs throughout the migration, sustaining P50 latency near 1ms and P99 latency around 20ms.

#sre #dist #data

Read original

Salesforce EngineeringDec 29, 2025

How Agentforce Empowered Incident Response Automation to Cut Common Resolution Time by 70 – 80%

Why it matters: Automating incident response at hyperscale reduces human error and cognitive load during high-pressure events. By using AI agents to correlate billions of signals, teams can cut resolution times by up to 80%, shifting from reactive manual triage to proactive, explainable mitigation.

Salesforce developed the Incident Command Deputy (ICD) platform, a multi-agent system powered by Agentforce to automate incident response.
The system utilizes AI-based anomaly detection across metrics, logs, and traces to replace static thresholds and manual monitoring at hyperscale.
ICD unifies fragmented data from observability, CI/CD, and change management systems into a single reasoning surface for AI agents.
Agentforce-powered agents automate evidence collection and hypothesis generation, significantly reducing cognitive load for engineers during 3:00 AM incidents.
The platform has successfully reduced resolution time for common Severity 2 incidents by 70-80%, with many detected and resolved within ten minutes.

#sre #mlp #data

Read original

Salesforce EngineeringDec 22, 2025

Shattering AWS’s 250K-IP Ceiling: How Data 360 Reached 1 Million IPs with Zero-Downtime Migration

Why it matters: Scaling to 100,000+ tenants requires overcoming cloud provider networking limits. This migration demonstrates how to bypass AWS IP ceilings using prefix delegation and custom observability without downtime, ensuring infrastructure doesn't bottleneck hyperscale data growth.

Overcame the AWS Network Address Usage (NAU) hard limit of 250,000 IPs per VPC to support 1 million IPs for Data 360.
Implemented AWS prefix delegation, which assigns IP addresses in contiguous 16-address blocks to significantly increase network efficiency.
Navigated Hyperforce architectural constraints, including immutable subnet structures and strict security group rules, without altering VPC boundaries.
Developed custom observability tools to monitor IP fragmentation and contiguous block availability, filling gaps in native AWS and Hyperforce metrics.
Utilized AI-driven validation and phased rollouts to ensure zero-downtime migration for massive Spark-driven data processing workloads.

#sre #dist #data

Read original

Cloudflare BlogDec 22, 2025

How Workers powers our internal maintenance scheduling pipeline

Why it matters: Manual infrastructure management fails at scale. This article shows how Cloudflare uses serverless Workers and graph-based data modeling to automate global maintenance scheduling, preventing downtime by programmatically enforcing safety constraints across distributed data centers.

Cloudflare transitioned from manual maintenance coordination to an automated scheduler built on Cloudflare Workers to manage 330+ global data centers.
The system enforces safety constraints to prevent simultaneous downtime of redundant edge routers and customer-specific egress IP pools.
To solve 'out of memory' errors on the Workers platform, the team implemented a graph-based data interface inspired by Facebook’s TAO.
The scheduler uses a graph model of objects and associations to load only the regional data necessary for specific maintenance requests.
The tool programmatically identifies overlapping maintenance windows and alerts operators to potential conflicts to ensure high availability.

#sre #dist #data

Read original

Dropbox Tech BlogDec 18, 2025

Inside the feature store powering real-time AI in Dropbox Dash

Why it matters: Building a scalable feature store is essential for real-time AI applications that require low-latency retrieval of complex user signals across hybrid environments. This approach enables engineers to move quickly from experimentation to production without managing underlying infrastructure.

Dropbox Dash utilizes a custom feature store to manage data signals for real-time machine learning ranking across fragmented company content.
The system bridges a hybrid infrastructure consisting of on-premises low-latency services and a Spark-native cloud environment for data processing.
Engineers selected Feast as the framework for its modular architecture and clear separation between feature definitions and infrastructure management.
To meet sub-100ms latency requirements, the store uses an in-house DynamoDB-compatible solution (Dynovault) for high-concurrency parallel reads.
The architecture supports both batch processing of historical data and real-time streaming ingestion to capture immediate user intent.

#mlp #data #dist

Read original

Cloudflare BlogDec 18, 2025

Announcing support for GROUP BY, SUM, and other aggregation queries in R2 SQL

Why it matters: Engineers can now perform complex analytical queries directly on R2 data without egress or external processing. This distributed approach to aggregations enables high-performance log analysis and reporting across massive datasets using familiar SQL syntax.

Cloudflare R2 SQL now supports SQL aggregations including GROUP BY, SUM, COUNT, and HAVING statements.
The engine executes queries over Apache Parquet files stored in the R2 Data Catalog using a distributed architecture.
Implements a scatter-gather approach where worker nodes compute pre-aggregates to horizontally scale computation.
Pre-aggregates represent partial states, such as intermediate sums and counts, which are merged by a coordinator node.
Introduces shuffling aggregations to handle complex operations like ORDER BY and HAVING on computed aggregate columns.
The system is designed to spot trends, generate reports, and identify anomalies in large-scale log data.

#dist #data

Read original

Microsoft Azure BlogDec 17, 2025

Microsoft named a Leader in Gartner® Magic Quadrant™ for AI Application Development Platforms

Why it matters: Microsoft's leadership in AI platforms highlights the transition from experimental LLM demos to production-grade agentic workflows. For engineers, this provides a unified framework for data grounding, multi-agent orchestration, and governance across cloud and edge environments.

Microsoft Foundry serves as a unified platform for building, deploying, and governing agentic AI applications at scale.
Foundry IQ and Tools provide a secure grounding API with over 1,400 connectors to integrate agents with real-world enterprise data.
Foundry Agent Service supports multi-agent orchestration, allowing autonomous agents to coordinate and drive complex business workflows.
The Foundry Control Plane offers enterprise-grade observability, audit trails, and policy enforcement for autonomous systems.
Deployment flexibility is enabled through Foundry Models for cloud-based GenAI Ops and Foundry Local for low-latency, on-device AI execution.

#mlp #data #dist

Read original

Page 13 of 24

Prev 1...11 12 13 14 15...24 Next