Curated topic

mlp

Posts tagged with mlp

GitHub EngineeringApr 17, 2026

Building an emoji list generator with the GitHub Copilot CLI

Why it matters: This demonstrates how AI-assisted development and specialized SDKs can drastically reduce the time needed to build functional internal tools. It highlights the shift from manual coding to high-level planning and architectural review using modern LLMs.

The project demonstrates using GitHub Copilot CLI's 'plan mode' with Claude Sonnet 4.6 to architect a new tool.
Integration of @github/copilot-sdk allows for intelligent text-to-emoji mapping within the application.
The terminal user interface (TUI) was built using @opentui/core for a responsive command-line experience.
Development involved an iterative process where the AI asked clarifying questions to refine the tech stack and project scope.
The final implementation was generated using Claude Opus 4.7, showcasing the speed of AI-assisted prototyping.

#mlp #culture

Read original

Cloudflare BlogApr 17, 2026

Introducing the Agent Readiness score. Is your site agent-ready?

Why it matters: As AI agents become primary web consumers, sites must transition from human-centric to machine-readable formats. Adopting these standards ensures content is accurately indexed by LLMs, reduces scraping overhead, and enables automated agentic workflows and commerce.

Cloudflare launched isitagentready.com, a tool that audits websites for AI agent compatibility across discoverability, content, access control, and capabilities.
A new Cloudflare Radar dataset tracks global adoption of AI standards, revealing that while 78% of sites use robots.txt, fewer than 4% support Markdown content negotiation.
The readiness score evaluates support for emerging standards like the Model Context Protocol (MCP), API Catalogs (RFC 9727), and Web Bot Auth.
Cloudflare overhauled its own developer documentation to serve as a model for agent-friendly design, utilizing Markdown and structured metadata to lower LLM processing costs.
The audit tool provides specific prompts that developers can give to coding agents to automatically implement missing standards and improve site scores.
The initiative introduces support for agentic commerce protocols, including x402 and the Universal Commerce Protocol (UCP), to facilitate automated transactions.

#mlp #data #dist

Read original

Cloudflare BlogApr 17, 2026

Agents that remember: introducing Agent Memory

Why it matters: Agent Memory solves the 'context rot' problem where LLM performance degrades as context windows grow. By providing a managed, retrieval-based persistent memory layer, engineers can build smarter agents that retain long-term knowledge across sessions without increasing token costs or latency.

Cloudflare's Agent Memory is a managed service providing persistent, retrieval-based storage for AI agents.
It mitigates 'context rot' by extracting key information from conversations instead of filling the context window.
The API supports core operations: ingest (bulk extraction), remember (explicit storage), and recall (synthesized retrieval).
It integrates directly with Cloudflare Workers via bindings and offers a REST API for external agent frameworks.
Memory profiles can be shared across multiple agents and users, enabling collective knowledge and tribal knowledge retention.

#mlp #data

Read original

Cloudflare BlogApr 17, 2026

Introducing Flagship: feature flags built for the age of AI

Why it matters: Traditional feature flags add latency or fail in serverless environments. Flagship integrates flags into the edge runtime, enabling safe, high-performance deployments and autonomous AI releases without manual intervention or performance penalties.

Cloudflare introduces Flagship, a native feature flag service built on the OpenFeature CNCF standard.
Designed for AI-driven development, enabling autonomous agents to safely deploy and test code via controlled rollouts.
Eliminates external network latency by evaluating flags directly at the edge using Cloudflare Workers KV and Durable Objects.
Provides a native Worker binding that allows for typed flag evaluation without HTTP overhead or SDK initialization delays.
Uses a distributed architecture where Durable Objects act as the source of truth and KV handles global replication.

#dist #sre #mlp

Read original

Cloudflare BlogApr 17, 2026

Redirects for AI Training enforces canonical content

Why it matters: AI models often provide outdated information because crawlers ignore standard SEO signals. This tool ensures AI agents ingest current data by enforcing canonical paths via redirects, improving the accuracy of LLM-generated answers about your technical products.

Cloudflare's new 'Redirects for AI Training' feature converts existing canonical tags into HTTP 301 redirects specifically for verified AI crawlers.
The tool addresses the failure of AI bots to honor standard SEO signals like noindex meta tags and deprecation banners on legacy documentation.
It utilizes the cf.verified_bot_category field to identify training bots like GPTBot and ClaudeBot while leaving human and search engine traffic unaffected.
The feature automatically detects non-self-referencing canonical tags in HTML and issues redirects at the edge before the response is served.
This approach scales better than manual redirect rules by leveraging existing SEO infrastructure already present on most web pages.
Cloudflare Radar now includes response status code analysis to help developers monitor how AI crawlers interact with their content.

#mlp #dist #data

Read original

Cloudflare BlogApr 17, 2026

Unweight: how we compressed an LLM 22% without sacrificing quality

Why it matters: Unweight addresses the memory bandwidth bottleneck in LLM inference without the quality loss of quantization. By enabling lossless compression and on-chip decompression, engineers can fit more models on existing hardware and reduce latency, making high-performance inference more cost-effective.

Cloudflare's Unweight achieves 15-22% lossless compression of LLM weights, ensuring bit-exact outputs without specialized hardware.
It targets the memory bandwidth bottleneck by compressing the redundant 8-bit exponent field in BF16 weights using Huffman coding.
Decompression occurs in fast on-chip shared memory, feeding tensor cores directly to minimize latency and avoid main memory round-trips.
The system selectively compresses MLP weights, saving approximately 3GB of VRAM for Llama-3.1-8B models.
An autotuner optimizes execution strategies based on weight matrix shape and batch size for maximum efficiency.

#mlp #dist #finops

Read original

Engineering at MetaApr 16, 2026

Capacity Efficiency at Meta: How Unified AI Agents Optimize Performance at Hyperscale

Why it matters: At hyperscale, even 0.1% regressions waste massive power. Meta’s AI agents automate performance optimization, saving hundreds of megawatts and thousands of engineering hours. This demonstrates how LLMs can encode domain expertise to manage infrastructure efficiency autonomously.

Meta developed a unified AI agent platform to automate both proactive performance optimization (offense) and regression mitigation (defense).
The platform utilizes 'MCP Tools' for standardized code execution and 'Skills' to encode senior engineers' domain expertise into LLM reasoning.
Automated investigation has compressed manual regression analysis from approximately 10 hours down to just 30 minutes.
The system has recovered hundreds of megawatts of power, equivalent to the annual energy consumption of hundreds of thousands of homes.
AI agents now handle the entire pipeline from identifying an efficiency opportunity to generating a ready-to-review pull request.

#sre #finops #mlp

Read original

Cloudflare BlogApr 16, 2026

Cloudflare’s AI Platform: an inference layer designed for agents

Why it matters: Building agentic AI requires chaining multiple models, which increases latency and failure risks. Cloudflare’s unified API simplifies multi-provider management, provides cost transparency, and offers a low-latency path for custom and third-party models at the edge.

Cloudflare has unified AI Gateway and Workers AI into a single inference layer, allowing developers to call third-party models like OpenAI and Anthropic via the AI.run() binding.
The platform supports over 70 models from 12+ providers, enabling multimodal applications involving text, image, video, and speech through a single API and credit system.
A new 'Bring Your Own Model' feature leverages Replicate’s Cog technology to containerize and deploy custom fine-tuned models directly to Cloudflare's global network.
AI Gateway provides centralized cost management and observability, allowing teams to track AI spend using custom metadata like team IDs or specific application workflows.
The infrastructure is optimized for agentic AI, featuring automatic retries on upstream failures and low-latency paths to minimize time-to-first-token in chained model calls.

#mlp #dist #finops

Read original

Cloudflare BlogApr 16, 2026

Building the foundation for running extra-large language models

Why it matters: This article provides a blueprint for optimizing LLM infrastructure by decoupling inference stages. It demonstrates how to maximize expensive GPU utilization and reduce latency for long-context agentic applications through clever software engineering and cache management.

Implemented Prefill Decode (PD) disaggregation to separate compute-bound prefill tasks from memory-bound decode tasks, maximizing GPU efficiency.
Developed a token-aware load balancer that estimates in-flight tokens to spread load evenly across prefill and decode endpoint pools.
Achieved a 3x improvement in intertoken latency, reducing p90 time per token from ~100ms to 20-30ms.
Utilized x-session-affinity headers to improve prompt caching, increasing input token cache hit ratios from 60% to 80%.
Integrated Moonshot AI’s Mooncake Transfer Engine to facilitate efficient KV cache sharing across multiple GPUs for extra-large model instances.
Optimized infrastructure specifically for agentic workflows which involve high input token volumes and long-context system prompts.

#mlp #dist #sre

Read original

Cloudflare BlogApr 16, 2026

Cloudflare’s AI Platform: an inference layer designed for agents

Why it matters: This unified inference layer simplifies building complex AI agents by eliminating provider lock-in and centralizing cost management. It allows engineers to switch models with one line of code while ensuring high reliability and low latency across distributed global infrastructure.

Cloudflare AI Gateway now offers a unified API (AI.run) to access 70+ models from 12+ providers including OpenAI, Anthropic, and Google.
Centralized cost management allows developers to monitor AI spend across multiple providers using custom metadata for granular usage tracking.
New 'Bring Your Own Model' support leverages Replicate’s Cog technology to containerize and deploy custom or fine-tuned models to Workers AI.
Built-in reliability features include automatic retries on upstream failures and zero-setup default gateways to mitigate provider outages.
The platform is optimized for agentic workflows, reducing latency in multi-step inference chains through Cloudflare's global edge network.

#mlp #dist #finops

Read original

Page 2 of 27

Prev 1 2 3 4...27 Next