Curated topic

dist

Posts tagged with dist

GitHub EngineeringNov 17, 2025

Why it matters: This release significantly improves Git's performance for large repositories by introducing `git last-modified` for faster tree-level blame and enhancing `git maintenance` with more efficient repacking strategies. These updates streamline developer workflows and reduce operational overhead.

Git 2.52 introduces `git last-modified`, a new command for efficiently determining the most recent commit for every file within a given directory (tree-level blame).
This command offers a significant performance improvement, being over 5 times faster than traditional methods like iterating `git log -1` for each file.
The core functionality of `git last-modified` was developed by GitHub as `blame-tree` and has now been open-sourced and integrated into Git.
The release also brings advancements to `git maintenance`, a command for scheduled or ad-hoc repository housekeeping tasks.
Git maintenance now supports alternative strategies like `incremental-repack` to improve efficiency for very large repositories, moving beyond the default "all-into-one" repacks.

#sre #dist

Read original

Engineering at MetaNov 17, 2025

Enhancing HDR on Instagram for iOS With Dolby Vision

Why it matters: This article details the intricate process of preserving HDR video metadata (Dolby Vision, AMVE) across a large-scale video pipeline. It's crucial for engineers working on media processing, mobile development, and ensuring high-quality user experiences on global platforms.

Instagram for iOS now supports Dolby Vision and Ambient Viewing Environment (AMVE) metadata for enhanced HDR video playback.
This involved preserving unique Dolby Vision and AMVE metadata from iPhone-produced HDR videos throughout Meta's video processing pipeline.
Previously, FFmpeg-based transcoding systems discarded this metadata, impacting picture consistency, especially at low screen brightness.
Meta collaborated with the community to add AMVE support to FFmpeg and adopted Dolby Vision Profile 10 for AV1 delivery.
This enhancement makes Instagram the first Meta app to support Dolby Vision video, with future expansion across other Meta platforms.
The solution addresses challenges like carrying Dolby Vision metadata in non-HEVC codecs and managing different Dolby Vision profiles.

#mobile #dist

Read original

Cloudflare BlogNov 17, 2025

Replicate is joining Cloudflare

Why it matters: This acquisition significantly enhances Cloudflare's AI capabilities, offering developers a vast model catalog and simplified deployment on a global, high-performance edge network. It streamlines AI application development, making advanced models more accessible and efficient for engineers.

Replicate, a leading AI model platform, is joining Cloudflare to integrate its services into Cloudflare's Developer Platform.
This acquisition significantly expands Cloudflare's Workers AI model catalog, enabling users to run fine-tuned and custom models directly on the platform.
Replicate's platform simplifies AI model deployment by abstracting complex infrastructure, utilizing its open-source tool Cog for containerization.
Cloudflare's 'AI Cloud' provides serverless GPU inference at the edge (Workers AI), a control plane (AI Gateway), data storage (Vectorize, R2), and orchestration tools.
The combined entity will offer a comprehensive selection of over 50,000 models from Replicate's catalog, runnable on Cloudflare's global, high-performance network.

#mlp #dist

Read original

GitHub EngineeringNov 13, 2025

GitHub Availability Report: October 2025

Why it matters: This report offers critical insights into distributed systems resilience, dependency management, and incident response. Engineers can learn from these real-world outages to build more robust, fault-tolerant services, emphasizing proactive measures and graceful degradation strategies.

GitHub experienced four incidents in October, leading to degraded performance across services like API, Actions, Codespaces, and mobile notifications.
Causes included a network device brought online prematurely, an erroneous configuration change for mobile push notifications, and two separate third-party dependency outages.
The most significant incident was a widespread third-party provider outage, severely impacting Codespaces, Actions runners, and the Enterprise Importer.
GitHub is implementing measures such as enhanced validation, reviewing cloud resource management, evaluating critical path dependencies, and improving monitoring.
Future efforts focus on reducing reliance on external providers and implementing graceful degradation strategies to enhance system resilience against outages.

#sre #dist

Read original

Cloudflare BlogNov 13, 2025

Finding the grain of sand in a heap of Salt

Why it matters: This article is crucial for SREs and infrastructure engineers dealing with large-scale configuration management. It demonstrates how to build systems that automate root cause analysis for CM failures, significantly reducing release delays and operational toil.

Cloudflare tackled the challenge of quickly identifying root causes for Salt configuration management failures across thousands of servers with high change volumes.
Salt, a CM tool, employs a master/minion architecture and declarative state system to manage large fleets and ensure consistent configurations.
Cloudflare's deployment pipeline for Salt changes incorporates blast radius protection and guardrails, designed to "fail safe" by halting deployments upon configuration failure.
While preventing customer impact, these halts necessitate human intervention for root cause analysis, leading to significant SRE toil and release delays.
A new architectural solution enables self-service root cause identification by correlating Salt failures with git commits, external services, and ad hoc releases.
This system has successfully reduced software release delays by over 5% and minimized repetitive triage for SRE teams.

#sre #dist

Read original

Microsoft Azure BlogNov 12, 2025

Infinite scale: The architecture behind the Azure AI superfactory

Why it matters: This article details groundbreaking innovations in datacenter architecture, cooling, and networking, crucial for building planet-scale AI compute infrastructure. It offers engineers insights into designing highly efficient, reliable, and performant systems for future AI demands.

Azure's new Fairwater AI datacenter architecture creates a "planet-scale AI superfactory" by densely integrating hundreds of thousands of NVIDIA GPUs into a single flat network.
The design maximizes compute density with a two-story building structure to minimize cable lengths and employs a highly efficient, closed-loop liquid cooling system for high-power racks (140kW).
It supports diverse AI workloads through dynamic allocation across multiple Fairwater sites, linked by a dedicated AI WAN backbone, maximizing GPU utilization.
Power infrastructure is optimized for high availability and cost-efficiency, utilizing resilient grid power and co-developed software/hardware solutions to manage power oscillations.
Advanced networking includes NVLink for intra-rack, a two-tier Ethernet backend with SONiC for scale-out, and a custom Multi-Path Reliable Connected (MRC) protocol for ultra-reliable, low-latency performance.

#dist #mlp

Read original

Cloudflare BlogNov 12, 2025

Connecting to production: the architecture of remote bindings

Why it matters: This feature significantly enhances local development for Cloudflare Workers, allowing engineers to test against real production data and services without deploying. It streamlines workflows, accelerates iteration, and ensures higher confidence in code changes before deployment.

Cloudflare's remote bindings enable local Worker development to connect directly to deployed production resources like R2 and D1, eliminating the need for full deployments during testing.
This feature significantly enhances the developer experience by allowing engineers to test local code changes against real data and services, accelerating iteration speed and improving confidence.
The new approach unifies the development workflow, replacing the older `wrangler dev --remote` mode with a per-binding `remote: true` option within the standard local development environment.
Architecturally, remote bindings leverage Cloudflare's existing production binding mechanisms, treating them as service bindings rather than creating new API wrappers.
This design avoids the complexity of replicating entire binding API surfaces and ensures compatibility with operations that lack direct HTTP API equivalents, streamlining implementation and maintenance.

#dist #sre

Read original

Engineering at MetaNov 11, 2025

StyleX: A Styling Library for CSS at Scale

Why it matters: StyleX offers a robust solution for managing CSS at scale, providing performance benefits of static CSS with the developer experience of CSS-in-JS. It ensures maintainability, reduces bundle sizes, and prevents styling conflicts in large, complex applications.

StyleX is Meta's open-sourced styling system, combining CSS-in-JS ergonomics with static CSS performance for large-scale applications.
It functions as a build-time compiler, extracting styles to generate collision-free, atomic CSS, significantly reducing CSS bundle size.
StyleX addresses historical CSS challenges at Meta, such as specificity wars and large bundles, by enforcing constraints for predictable and scalable styling.
The system enables expressive, type-safe style authoring in JavaScript, supporting composition and conditional logic while compiling to static output.
Its core is a Babel plugin that processes style objects, normalizes values, and outputs optimized, atomic CSS classes for efficient rendering.

#frontend #dist

Read original

Pinterest EngineeringNov 10, 2025

Slashing CI Wait Times: How Pinterest Cut Android Testing Build Times by 36%+

Why it matters: This article demonstrates how investing in in-house test infrastructure and smart sharding can drastically improve CI/CD efficiency and developer velocity by reducing build times and flakiness. It highlights the benefits of taking control over critical testing environments.

Pinterest significantly reduced Android E2E CI build times by 36% by transitioning from Firebase Test Lab to an in-house testing platform, PinTestLab.
The core innovation is a runtime-aware sharding mechanism that uses historical test duration and stability data to balance test loads across parallel shards.
This in-house solution, running on EC2 bare-metal instances with optimized resource allocation, provided direct control over the testing stack and eliminated third-party flakiness.
The new sharding approach decreased the slowest shard's runtime by 55% and drastically reduced the variance between fastest and slowest shards.
Building PinTestLab was driven by FTL's high setup overhead, infrastructure instability, and the lack of suitable third-party alternatives for large-scale native emulator support.

#sre #dist #mobile

Read original

Engineering at MetaNov 10, 2025

Meta’s Generative Ads Model (GEM): The Central Brain Accelerating Ads Recommendation AI Innovation

Why it matters: This article details how Meta built and scaled a massive LLM-inspired foundation model for ads, showcasing innovations in architecture, training, and knowledge transfer for significant performance gains. It offers insights into building large-scale recommendation systems.

Meta's Generative Ads Model (GEM) is a new LLM-inspired foundation model enhancing ad recommendation performance and advertiser ROI.
Its novel architecture allows efficient scaling and precise predictions, leveraging thousands of GPUs for training.
GEM propagates learnings across Meta's ad model fleet through advanced post-training and knowledge transfer techniques.
It has already delivered significant increases in ad conversions on Instagram (5%) and Facebook (3%).
GEM achieves 4x efficiency in performance gains, 2x knowledge transfer effectiveness, and a 23x increase in training FLOPS.

#mlp #data #dist

Read original

Page 13 of 22

Prev 1...11 12 13 14 15...22 Next