Curated topic

data

Posts tagged with data

GitHub EngineeringFeb 11, 2026

GitHub availability report: January 2026

Why it matters: This report highlights the risks of major infrastructure upgrades and model configuration changes in high-scale environments. It underscores the importance of robust rollback procedures and the need for load testing to detect resource contention before production deployment.

GitHub Copilot experienced a significant outage on January 13 due to a configuration error during a model update, peaking at 100% error rates.
The Copilot recovery was delayed by secondary availability issues with upstream provider OpenAI's GPT-4.1 model.
On January 15, a major version upgrade to data store infrastructure caused resource contention, leading to widespread latency across GitHub services.
The infrastructure incident impacted 1.8% of web and API requests, primarily affecting unauthenticated users through slow queries and timeouts.
Both incidents were mitigated via rollbacks to previous stable versions while GitHub works on improved high-load validation and configuration safeguards.

#sre #data #mlp

Read original

Salesforce EngineeringFeb 11, 2026

Against the Clock: How Data 360 Launched the Informatica Help Agent in 24 Days

Why it matters: This article demonstrates how a robust data foundation like Data 360 enables rapid AI deployment. It provides a blueprint for handling large-scale unstructured data and meeting aggressive deadlines through architectural reuse and automated data preparation.

Leveraged Data 360 to unify and index over 100,000 unstructured documents into a searchable knowledge base for AI agents.
Met a strict 24-day post-acquisition deadline by prioritizing production-grade foundations over complex edge cases.
Automated the cleanup of raw HTML documentation, removing noise like headers and navigation menus to improve retrieval precision.
Utilized a sitemaps crawling feature and Python workflows to ingest diverse content sources into a standardized format.
Implemented metadata tagging and optimized chunking strategies to handle complex product versioning and ensure high retrieval accuracy.
Achieved an 80% resolution rate with only 5% human escalation, demonstrating the effectiveness of the data-centric approach.

#data #mlp

Read original

Salesforce EngineeringFeb 9, 2026

How Agentic Memory Enables Durable, Reliable AI Agents Across Millions of Enterprise Users

Why it matters: This architecture solves the statelessness problem in AI agents, enabling long-term context and reliability at scale. It provides a blueprint for building governable, auditable AI systems that maintain user trust while reducing prompt noise and latency through structured memory layers.

Agentic Memory transforms stateless AI agents into durable collaborators by externalizing memory into a structured, persistent data layer linked to a profile graph.
The architecture separates short-term session context from long-term memory, ensuring continuity across different communication channels and sessions.
To ensure reliability, the system uses a pipeline with confidence scoring, write/read gates, and hybrid semantic validation to filter and update memory records.
Adaptive context allows agents to dynamically prioritize and prune information in real-time, reducing latency and noise compared to raw prompt injection.
Structured reasoning and session-level tracing provide an auditable history of agent decisions, making AI behavior explainable and compliant with enterprise standards.

#mlp #data #dist

Read original

Microsoft Azure BlogFeb 9, 2026

Five Reasons to attend SQLCon

Why it matters: This event represents a critical convergence of traditional SQL expertise and modern AI-driven data platforms. It provides engineers with direct access to product teams and hands-on training to align their data strategy with the latest advancements in Azure and Microsoft Fabric.

SQLCon is co-located with the Microsoft Fabric Community Conference, offering dual access to SQL and Fabric ecosystems.
The event features 50 technical sessions covering SQL Server, Azure SQL, performance tuning, and security governance.
Hands-on workshops provide repeatable scripts and patterns for database migration, modernization, and AI integration.
Over 30 members of the SQL product team will be present to share roadmap updates and engineering insights.
Sessions will demonstrate new capabilities in SQL tooling, including Copilot integrations and Fabric SQL experiences.

#data #culture

Read original

Microsoft Azure BlogFeb 5, 2026

Claude Opus 4.6: Anthropic’s powerful model for coding, agents, and enterprise workflows is now available in Microsoft Foundry

Why it matters: This integration brings Anthropic's most advanced reasoning to Azure, enabling engineers to build secure, agentic workflows with a 1M token context window. It simplifies the path to production by combining frontier intelligence with enterprise-grade governance and data connectivity.

Claude Opus 4.6 is now available in Microsoft Foundry, integrating Anthropic's most advanced reasoning model with Azure's secure infrastructure.
The model features a 1M token context window (beta) and 128K max output, optimized for large-scale codebases and complex document analysis.
Integration with Foundry IQ enables agents to access and act on data across Microsoft 365, Fabric, and the web.
Engineers can leverage the model for autonomous coding tasks, including refactoring, bug detection, and full-lifecycle development.
The platform provides enterprise-grade governance, access controls, and operational tools to accelerate the transition from experimentation to production.
Specific industry applications include high-context financial analysis, legal drafting, and cybersecurity workflows.

#mlp #data #security

Read original

Pinterest EngineeringFeb 5, 2026

Next Generation DB Ingestion at Pinterest

Why it matters: Transitioning from batch to real-time ingestion is critical for modern data-driven apps. Pinterest's architecture shows how to use CDC and Iceberg to reduce latency from days to minutes while cutting costs and ensuring compliance through efficient row-level updates and unified pipelines.

Pinterest replaced fragmented, high-latency batch ingestion with a unified CDC-based framework using Flink, Spark, and Apache Iceberg.
The system captures changes from MySQL, TiDB, and KVStore via a custom CDC service, writing events to Kafka with sub-second latency.
A dual-table architecture uses append-only CDC tables for change logs and Base tables for mirrored snapshots updated via Spark's MERGE INTO.
Standardizing on Iceberg's Merge-on-Read (MOR) strategy significantly reduced storage and compute costs compared to Copy-on-Write (COW).
The framework supports row-level deletions natively, improving data compliance and handling petabyte-scale data across thousands of pipelines.

#data #dist #finops

Read original

Salesforce EngineeringFeb 5, 2026

Re-Architecting Enterprise Applications for an Agentic System of Action

Why it matters: This shift moves beyond AI wrappers to fundamental architectural changes. It enables software to handle edge cases and cross-domain coordination autonomously, reducing the need for human intervention while maintaining reliability through governed action contracts.

Enterprise software is shifting from static systems of record to dynamic systems of action by integrating agentic reasoning into core architectures.
Deterministic workflows are combined with agents in a hybrid model where agents handle situational judgment within governed, validated execution paths.
Applications must be decomposed into explicit, machine-readable actions that encode business intent, preconditions, and constraints.
One-way system events are reimagined as conversational entry points, allowing agents to maintain state and continuity across multiple communication channels.
Scalable orchestration requires managing interdependent agents that share structured context while balancing reasoning fidelity with latency and cost.

#mlp #dist #data

Read original

Microsoft Azure BlogFeb 4, 2026

Enhanced storage resiliency with Azure NetApp Files Elastic zone-redundant service

Why it matters: It provides a managed, high-availability storage solution that ensures zero data loss and seamless failover across availability zones. This simplifies disaster recovery for mission-critical workloads like SAP HANA and SQL Server while optimizing costs and metadata performance.

Azure NetApp Files Elastic ZRS provides synchronous data replication across three or more availability zones within a single region.
The service features automated, service-managed failover that maintains the same mount targets and endpoints during zone-level outages.
It supports NFS and SMB protocols with enterprise-grade management capabilities including snapshots, clones, and storage tiering.
The architecture is cost-optimized, allowing for volumes as small as 1 GiB and reducing costs compared to manual cross-zone replication.
Future updates will introduce simultaneous multi-protocol access (NFS, SMB, and Object REST API) and custom region pairs for disaster recovery.
Optimized for metadata-heavy workloads, the service uses a shared QoS architecture to maintain low-latency operations during file enumeration.

#sre #dist #data

Read original

Cloudflare BlogFeb 3, 2026

Improve global upload performance with R2 Local Uploads

Why it matters: Engineers can significantly reduce upload latency for global users without managing complex multi-region replication logic. It provides the performance of a local edge cache with the reliability and strong consistency of centralized object storage.

Cloudflare R2 launched Local Uploads in open beta to improve global write performance by up to 75%.
Data is initially written to a storage location near the client and then asynchronously replicated to the bucket's home region.
The system maintains strong consistency, ensuring objects are immediately accessible for reads after the initial write.
Architecture utilizes R2 Gateway Workers for routing and Durable Objects for distributed metadata management.
Synthetic benchmarks show Time to Last Byte (TTLB) dropping from 2s to 500ms for cross-region 5MB uploads.
The feature is specifically designed for globally distributed workloads like media uploads and telemetry collection.

#dist #data #sre

Read original

Microsoft Azure BlogFeb 2, 2026

PostgreSQL on Azure supercharged for AI

Why it matters: PostgreSQL is evolving into a central hub for AI development. By integrating vector search, LLM orchestration, and seamless IDE workflows directly into the managed database service, Microsoft reduces the friction of building and scaling intelligent, data-driven applications.

Azure HorizonDB introduced as a PostgreSQL-compatible cloud service optimized for scale-out and ultra-low latency.
Integrated VS Code extension enables provisioning and managing PostgreSQL instances directly within the IDE.
In-database AI features allow developers to invoke LLMs via SQL for text classification and embedding generation.
DiskANN vector indexing and semantic ranking support high-performance similarity searches for AI agents.
Native Model Context Protocol (MCP) server support connects PostgreSQL directly to Microsoft Foundry's agent framework.
Zero-ETL mirroring to Microsoft Fabric and Parquet file support streamline real-time analytics and data movement.

#data #mlp #dist

Read original

Page 5 of 19

Prev 1...3 4 5 6 7...19 Next