Curated topic

mlp

Posts tagged with mlp

GitHub EngineeringFeb 18, 2026

What to expect for open source in 2026

Why it matters: As open source scales globally and AI-generated contributions surge, engineers must shift from ad-hoc management to formal governance and automated triaging. This shift is vital for building sustainable projects that can handle increased volume without burning out maintainers.

Open source is becoming increasingly global, with significant developer growth in India, Brazil, and Indonesia requiring asynchronous communication strategies.
AI has lowered the entry barrier for new developers but introduced 'AI slop,' leading to a high volume of low-quality pull requests and issues.
Maintainers are adopting AI defensively to automate triage, label issues, and detect duplicates to manage the influx of contributions.
Sustainable projects must implement formal governance models and clear advancement paths from contributor to maintainer to prevent burnout.
While AI-focused projects represent 60% of top growth, established tools like VS Code continue to thrive by supporting broad international communities.

#culture #mlp #dist

Read original

Microsoft Azure BlogFeb 17, 2026

Claude Sonnet 4.6 in Microsoft Foundry-Frontier Performance for Scale

Why it matters: Claude Sonnet 4.6 brings frontier-level reasoning and a 1M token context window to Microsoft Foundry. For engineers, this enables more efficient large-scale code analysis, sophisticated browser automation, and better cost-performance control for agentic workflows in enterprise environments.

Claude Sonnet 4.6 is now available in Microsoft Foundry, offering near-Opus-level intelligence with improved token efficiency and lower costs.
Features a 1 million token context window (beta) and 128K output limit, enabling analysis of massive codebases and long-form documents.
Introduces adaptive thinking and effort parameters, allowing developers to tune the model's reasoning for better quality-latency-cost tradeoffs.
Enhanced for software engineering with stronger reasoning across complex codebases and reliable performance in iterative development cycles.
Significant improvements in computer use, scoring 72.5% on OSWorld Verified for precise browser automation and cross-app task execution.
Designed as a direct upgrade to Sonnet 4.5, requiring minimal prompting changes for existing enterprise search and agentic pipelines.

#mlp #data

Read original

Salesforce EngineeringFeb 16, 2026

How Agentforce Achieved Accurate Flow Generation Across 461 Billion Monthly Executions Using a Constrained DSL

Why it matters: This approach demonstrates how to scale LLM-driven automation by replacing black-box fine-tuning with deterministic DSLs. It ensures reliability and debuggability for mission-critical workflows while significantly reducing the operational overhead of model maintenance.

Salesforce transitioned from fine-tuned LLMs to a constrained, multi-stage DSL framework to improve the accuracy of natural-language-to-Flow generation.
The system manages over 461 billion monthly executions across 63+ Flow varieties by enforcing strict metadata rules and validation gates.
A modular pipeline separates the process into an Architect phase for structural planning and a Developer phase for low-level metadata production.
DSL constructs are derived programmatically from Flow Metadata WSDL, ensuring generation rules stay synchronized with evolving platform schemas.
This deterministic approach eliminates expensive model retraining cycles, allowing for faster response to schema changes and correctness fixes.

#mlp #dist

Read original

Pinterest EngineeringFeb 13, 2026

GPU-Serving Two-Tower Models for Lightweight Ads Engagement Prediction

Why it matters: Transitioning to GPU serving for lightweight ranking allows engineers to deploy sophisticated architectures like MMOE-DCN. This shift significantly improves prediction accuracy and business metrics without sacrificing the strict latency requirements of real-time recommendation systems.

Pinterest transitioned its ads lightweight ranking from CPU to GPU serving to support more complex model architectures while maintaining low latency.
The new architecture replaces Multi-Task Multi-Domain (MTMD) models with a Multi-gate Mixture-of-Experts (MMOE) and Deep & Cross Network (DCN) design.
GPU serving enabled a 5-10% reduction in offline CTR loss and significant improvements in online metrics like Cost-Per-Click (CPC) and Click-Through Rate (CTR).
Training efficiency was optimized using BF16 precision, fused kernels, GPU prefetching, and increased batch sizes on p4d instances.
Segmenting standard and shopping ad scenarios for separate training doubled offline model iteration speed.
The two-tower paradigm uses offline batch updates for Pin embeddings and real-time generation for query embeddings to balance performance and latency.

#mlp #dist

Read original

GitHub EngineeringFeb 13, 2026

Automate repository tasks with GitHub Agentic Workflows

Why it matters: GitHub Agentic Workflows lower the barrier for complex repository automation by replacing rigid YAML with intent-driven Markdown. This enables 'Continuous AI,' allowing teams to automate cognitive tasks like issue triage and CI debugging while maintaining strict security and audit guardrails.

GitHub Agentic Workflows allow developers to automate repository tasks using plain Markdown instructions executed by AI coding agents.
The workflows run within GitHub Actions, leveraging existing infrastructure for permissions, sandboxing, and auditing.
Supported agent engines include Copilot CLI, Claude Code, and OpenAI Codex, allowing for flexible execution of non-deterministic tasks.
Key use cases include automated issue triaging, continuous documentation updates, and proactive investigation of CI failures with proposed fixes.
The initiative introduces 'Continuous AI,' a paradigm designed to augment traditional CI/CD with intent-driven automation for cognitive chores.

#mlp #culture #sre

Read original

Netflix Tech BlogFeb 13, 2026

Scaling LLM Post-Training at Netflix

Why it matters: Scaling LLM post-training requires solving complex distributed systems problems like GPU synchronization. This framework allows engineers to focus on model innovation rather than infrastructure, enabling faster iteration on domain-specific AI experiences at scale.

Netflix developed an internal Post-Training Framework to abstract infrastructure complexity for LLM alignment tasks like SFT, DPO, and Reinforcement Learning.
The framework addresses data engineering hurdles including precise loss masking for chat templates and efficient sequence packing to minimize GPU idle time.
It utilizes PyTorch FSDP and Ray to manage distributed state and orchestrate multi-node GPU clusters for models that exceed single-device memory.
The architecture supports complex RL workflows by interleaving rollout generation with policy updates across decoupled Ray actors.
Modular components for data, model, compute, and workflow allow developers to customize architectures and vocabularies for domain-specific Netflix use cases.

#mlp #dist #data

Read original

Dropbox Tech BlogFeb 12, 2026

How low-bit inference enables efficient AI

Why it matters: As AI models scale to trillions of parameters, low-bit inference is essential for maintaining low latency and cost-efficiency. It allows engineers to deploy sophisticated models on existing hardware by optimizing memory usage and maximizing throughput via specialized GPU cores.

Low-bit inference reduces memory and compute requirements by decreasing numerical precision during model serving.
Large-scale models like Kimi-K2.5 (1T parameters) require these optimizations to manage energy and hardware constraints.
Compute costs in attention-based models are driven by matrix multiplications in linear layers and the attention mechanism.
Specialized hardware, such as NVIDIA Tensor Cores and AMD Matrix Cores, doubles throughput when precision is halved.
Quantization is critical for delivering responsive, cost-effective AI features like search and summarization in production.

#mlp #finops

Read original

Microsoft Azure BlogFeb 12, 2026

The data behind the design: How Pantone built agentic AI with an AI-ready database

Why it matters: Pantone's approach provides a blueprint for scaling niche domain expertise via agentic AI. It demonstrates how a multi-agent architecture supported by a robust NoSQL database like Azure Cosmos DB can transform static data into interactive, high-value creative tools.

Pantone implemented a multi-agent AI architecture to digitize and scale decades of proprietary color science and trend expertise.
The system features specialized agents, including a "chief color scientist" agent, to manage complex reasoning and context-aware responses.
Azure Cosmos DB serves as the foundational database, providing the low-latency retrieval needed for real-time conversational context and history.
The Palette Generator uses these agents to transform abstract user prompts into curated, data-driven color palettes instantly.
The project highlights a shift from basic LLM prompting to sophisticated orchestration where databases play a critical role in agentic reasoning.

#data #mlp

Read original

Cloudflare BlogFeb 12, 2026

Introducing Markdown for Agents

Why it matters: As AI agents become primary web consumers, optimizing content for them is crucial. This feature reduces LLM token costs by 80% and simplifies data ingestion pipelines, making it easier to build efficient, agent-friendly applications at the edge.

Cloudflare introduced 'Markdown for Agents' to automatically convert HTML content to Markdown in real-time for AI crawlers.
Converting HTML to Markdown can reduce token usage by approximately 80%, lowering costs and processing complexity for AI pipelines.
The feature leverages HTTP content negotiation, enabling agents to request Markdown via the 'Accept: text/markdown' header.
Responses include an 'x-markdown-tokens' header to help developers manage context windows and chunking strategies.
The service integrates with the Content Signals framework to define how content is used for AI training and search.

#mlp #dist #finops

Read original

Cloudflare BlogFeb 12, 2026

Introducing Markdown for Agents

Why it matters: As AI agents become primary web consumers, serving raw HTML is inefficient and costly. This feature treats agents as first-class citizens, drastically reducing LLM token costs and improving parsing accuracy by providing clean, structured data directly at the network edge.

Cloudflare introduced 'Markdown for Agents,' a feature that automatically converts HTML content to Markdown in real-time at the edge.
Markdown significantly reduces token consumption by up to 80% compared to HTML, optimizing costs and context window usage for LLMs.
The feature utilizes standard HTTP content negotiation, allowing AI agents to request Markdown via the 'Accept: text/markdown' header.
Responses include an 'x-markdown-tokens' header to help developers manage context windows and chunking strategies effectively.
The system integrates with the Content Signals framework to define how content should be used for AI training and search indexing.

#dist #mlp #finops

Read original

Page 12 of 27

Prev 1...10 11 12 13 14...27 Next