Curated topic

data

Posts tagged with data

Cloudflare BlogDec 15, 2025

ChatGPT's rivals, Kwai's quiet rise: the top Internet services of 2025

Why it matters: This report offers critical insights into evolving user behavior, platform dominance, and emerging tech trends like AI and digital finance. Engineers can leverage this data to inform product strategy, infrastructure planning, and understand the competitive landscape of internet services.

Cloudflare's 2025 report ranks top internet services based on anonymized DNS query data from its 1.1.1.1 resolver, highlighting shifts in popularity across nine categories.
Generative AI saw significant competition, with Claude, Gemini, and Perplexity challenging ChatGPT, and Gemini reaching the #2 spot by year-end.
The social media landscape shifted: Instagram rose to #5 overall, while TikTok and X declined, and Kwai gained traction in emerging markets.
Asian e-commerce platforms like Shopee and Temu joined Amazon in the global top 3, indicating a significant regional climb.
Google, Facebook, and Apple remained the top three overall internet services, with Microsoft and Instagram showing strong growth in their rankings.
Digital finance services like Stripe and neobank Nubank demonstrated continued dominance and growth, alongside a surge in cryptocurrency traffic for platforms like OKX.

#data #mlp

Read original

Cloudflare BlogDec 15, 2025

The 2025 Cloudflare Radar Year in Review: The rise of AI, post-quantum, and record-breaking DDoS attacks

Why it matters: This review offers critical insights into evolving Internet trends, including AI's impact on web traffic, the rise of post-quantum security, and network performance, essential for engineers building and securing online services.

Global Internet traffic grew 19% in 2025, with Starlink traffic doubling and Googlebot leading verified bot activity for search and AI training.
Post-quantum encrypted web traffic reached 52% of human-generated requests, highlighting a significant shift in security adoption.
AI-related crawling surged, with Googlebot's dual-purpose crawls dominating and "user action" crawling increasing 15x. AI bots were also frequently blocked via robots.txt.
Meta's llama-3-8b-instruct was the most popular model on Workers AI, primarily used for text generation tasks.
Mobile traffic saw iOS devices account for 35% globally, while HTTP/2 and HTTP/3 adoption continued to rise.

#security #data #mlp

Read original

PlanetScale Tech BlogDec 15, 2025

$50 PlanetScale Metal is GA for Postgres

Why it matters: Engineers can now access high-performance, NVMe-backed Postgres hardware at a fraction of the previous cost. The decoupling of storage and compute allows for better resource optimization and cost efficiency for diverse workloads, from small high-traffic apps to large data-heavy systems.

PlanetScale Metal for Postgres now offers smaller instances starting at $50/month with 1GiB RAM.
Storage and compute are now decoupled, allowing for up to 300GB of storage per GiB of RAM.
All instances utilize locally attached NVMe drives to ensure low latency and high reliability.
Users can choose from eight storage capacities ranging from 10GB to 1.2TB across various CPU/RAM tiers.
The service supports online resizing and is available on AWS with both Intel and ARM CPU options.

#data #finops #sre

Read original

Salesforce EngineeringDec 12, 2025

4x Faster: How AI-Assisted Development Accelerated Building New SQL Dialects for Zero Copy Connectors

Why it matters: Scaling data virtualization across 100+ platforms requires handling diverse SQL semantics. By combining AI-driven configuration with massive automated validation, engineers can accelerate connector development by 4x while ensuring cross-engine query correctness and consistency.

Transitioned from manual C++ SQL transformations to a JSON-based configuration-driven dialect framework to scale connector development.
Leveraged AI agents to interpret remote SQL documentation and generate approximately 2,000 lines of JSON configuration per dialect.
Implemented a test-driven AI workflow that uses an ordered suite of tests to refine dialect sections and prevent regressions.
Developed an automated validation pipeline executing 25,000 queries to compare Hyper's local execution against remote engine results.
Created a closed-loop feedback system where remote error messages and result deviations are fed back into the AI model for iterative refinement.
Achieved a 4x reduction in engineering effort, cutting dialect construction time from 40 days to 10 days per engine.

#data #dist #mlp

Read original

Microsoft Azure BlogDec 11, 2025

Introducing GPT-5.2 in Microsoft Foundry: The new standard for enterprise AI

Why it matters: This article introduces GPT-5.2 in Microsoft Foundry, a new enterprise AI model designed for complex problem-solving and agentic execution. It offers advanced reasoning, context handling, and robust governance, setting a new standard for reliable and secure AI development in professional settings.

GPT-5.2 is generally available in Microsoft Foundry, designed for enterprise AI with advanced reasoning and agentic capabilities.
It offers deeper logical chains, richer context handling, and agentic execution to generate shippable artifacts like code and design docs.
Built on a new architecture, it delivers superior performance, efficiency, and reasoning depth, with enhanced safety and integrations.
Two versions are available: GPT-5.2 for complex problem-solving and GPT-5.2-Chat for efficient everyday tasks and learning.
Optimized for agent scenarios, it supports multi-step logical chains, context-aware planning, and end-to-end task coordination.
Includes enterprise-grade safety, governance, and managed identities for secure and compliant AI adoption.
Enables building AI agents for analytics, app modernization, data pipelines, and customer experiences across industries.

#mlp #dist #data

Read original

Microsoft Azure BlogDec 11, 2025

Azure Storage innovations: Unlocking the future of data

Why it matters: These Azure Storage innovations provide engineers with enhanced scalability, performance, and simplified management for AI workloads, from training to inference, enabling more efficient development and deployment of advanced AI solutions.

Azure Blob Storage is significantly enhanced for the entire AI lifecycle, offering exabyte scale, 10s of Tbps throughput, and millions of IOPS to power GPU-intensive AI model training and deployment.
Azure Managed Lustre (AMLFS) 2.0 (preview) provides a high-performance parallel file system for petabyte-scale AI training data, supporting 25 PiB namespaces and up to 512 GBps throughput, with Hierarchical Storage Management (HSM) integration for Azure Blob Storage.
AMLFS includes new auto-import and auto-export features to efficiently move data between Lustre and Blob Storage, optimizing GPU utilization and streamlining the AI data pipeline.
Premium Blob Storage delivers consistent low-latency and up to 3X faster retrieval performance, crucial for AI inferencing, including Retrieval-Augmented Generation (RAG) agents and enterprise data security.
The LangChain Azure Blob Loader is introduced, offering improved security, memory efficiency, and up to 5x faster performance for open-source AI frameworks.
New AI-driven tools like Storage Discovery and Copilot simplify exabyte-scale data management and analysis through intuitive dashboards and natural language queries.

#data #mlp

Read original

Pinterest EngineeringDec 10, 2025

LLM-Powered Relevance Assessment for Pinterest Search

Why it matters: This approach enables faster, more cost-effective evaluation of search ranking models in A/B tests. Engineers can detect smaller, more nuanced effects, accelerating product iteration and improving user experience by deploying features with higher confidence.

Pinterest uses fine-tuned open-source LLMs to automate search relevance assessment, overcoming the limitations of costly and slow human annotations.
The LLMs are trained on a 5-level relevance guideline using a cross-encoder architecture and comprehensive Pin textual features, supporting multilingual search.
This approach significantly reduces labeling costs and time, enabling much larger and more sophisticated stratified query sampling designs.
Stratified sampling, based on query interest and popularity, ensures sample representativeness and drastically reduces measurement variance.
The implementation led to a significant reduction in Minimum Detectable Effects (MDEs) from 1.3-1.5% to <= 0.25%, accelerating A/B experiment velocity and feature deployment.
Paired sampling and sDCG@K are used to measure the relevance impact of A/B experiments on search ranking.

#mlp #data

Read original

Microsoft Azure BlogDec 10, 2025

Actioning agentic AI: 5 ways to build with news from Microsoft Ignite 2025

Why it matters: This article details significant AI platform advancements from Microsoft Ignite, offering developers more model choices and improved semantic understanding for building robust, secure, and flexible AI applications and agents.

Microsoft Ignite 2025 showcased significant advancements in agentic AI and cloud solutions, emphasizing rapid developer adoption.
Microsoft Foundry now integrates Claude models (Sonnet, Opus) alongside OpenAI's GPT, providing developers with diverse model choices for AI application and agent development.
This model diversity in Azure Foundry offers flexibility, enterprise-grade security, compliance, and governance for building AI solutions.
New Microsoft IQ offerings aim to enhance semantic understanding, connecting productivity apps, analytics platforms, and AI development environments.

#mlp #data

Read original

Pinterest EngineeringDec 8, 2025

How Pinterest Built a Real‑Time Radar for Violative Content using AI

Why it matters: This system provides real-time, statistically robust insights into content safety, enabling platforms to proactively identify and mitigate harms. It's crucial for maintaining user trust and scaling content moderation efficiently with AI.

Pinterest developed an AI-assisted system to measure "prevalence" of policy-violating content, focusing on the percentage of total views.
This system addresses the shortcomings of report-only metrics, which often miss under-reported harms and lack statistical power.
It utilizes ML-assisted sampling from daily user impressions, leveraging production risk scores for efficiency while ensuring unbiased prevalence estimates.
A multimodal LLM (vision + text) enables bulk labeling of sampled content, significantly reducing latency and cost compared to human review.
Inverse-probability weighting ensures unbiased, design-consistent prevalence metrics, decoupling measurement from enforcement model thresholds.
Continuous calibration, human validation, and periodic checks against SME-labeled gold sets maintain LLM accuracy and detect model drift.
The system provides daily, statistically powered insights for faster interventions and effective content safety tracking.

#mlp #data #security

Read original

Pinterest EngineeringDec 5, 2025

Improving Quality of Recommended Content through Pinner Surveys

Why it matters: This article demonstrates a practical approach to de-biasing recommendation systems by integrating direct user feedback via surveys into ML model training. Engineers can learn how to move beyond pure engagement metrics to build more user-centric and high-quality content platforms.

Pinterest implemented in-app Pinner surveys to gather direct user feedback on content visual quality, moving beyond traditional engagement metrics.
The survey design collected at least 10 ratings per image for 5k Pins across diverse interest verticals, averaging scores to ensure data reliability and reduce subjectivity.
A machine learning model was trained using this aggregated survey data, mapping image embedding features to a single score (0-1) indicating perceived visual quality.
This ML model is integrated into Pinterest's core recommendation systems, including Homefeed, Related Pins, and Search, to promote higher quality content.
The approach aims to de-bias recommendation systems, prevent the promotion of low-quality "clickbait," and align content delivery with user well-being and satisfaction.

#mlp #data

Read original

Page 9 of 19

Prev 1...7 8 9 10 11...19 Next