Dropbox Tech Blog

Dropbox Tech BlogFeb 26, 2026

Using LLMs to amplify human labeling and improve Dash search relevance

Why it matters: Effective RAG systems depend on high-quality search ranking. Using LLMs to scale relevance labeling allows engineers to train more accurate models faster, overcoming the scalability and privacy limitations of traditional human-only labeling workflows.

Dropbox Dash uses a Retrieval-Augmented Generation (RAG) pattern, where search ranking quality is critical for grounding LLM responses.
Ranking models are trained using XGBoost on query-document pairs scored on a 1-5 relevance scale.
While human labeling provides a high-quality gold standard, it is expensive, slow, and poses privacy risks with proprietary data.
LLMs amplify human efforts by generating relevance labels at scale, significantly increasing the volume of training data available.
The hybrid labeling approach combines a small set of human-labeled data with LLM-assisted evaluation to improve model accuracy.
LLM-based labeling effectively addresses the cold-start problem for new search features where user behavior data is sparse.

#mlp #data

Read original

Dropbox Tech BlogFeb 12, 2026

How low-bit inference enables efficient AI

Why it matters: As AI models scale to trillions of parameters, low-bit inference is essential for maintaining low latency and cost-efficiency. It allows engineers to deploy sophisticated models on existing hardware by optimizing memory usage and maximizing throughput via specialized GPU cores.

Low-bit inference reduces memory and compute requirements by decreasing numerical precision during model serving.
Large-scale models like Kimi-K2.5 (1T parameters) require these optimizations to manage energy and hardware constraints.
Compute costs in attention-based models are driven by matrix multiplications in linear layers and the attention mechanism.
Specialized hardware, such as NVIDIA Tensor Cores and AMD Matrix Cores, doubles throughput when precision is halved.
Quantization is critical for delivering responsive, cost-effective AI features like search and summarization in production.

#mlp #finops

Read original

Dropbox Tech BlogFeb 11, 2026

Insights from our executive roundtable on AI and engineering productivity

Why it matters: AI is shifting from experimental to essential in the SDLC. Dropbox's experience shows that combining off-the-shelf tools with custom solutions for specific monorepo constraints can measurably increase PR throughput and improve developer satisfaction at scale.

Dropbox integrated AI tools like Claude Code and Cursor into their engineering workflow to accelerate feature delivery.
AI adoption was elevated to a company-level priority to reduce administrative overhead and align leadership.
The engineering team developed custom AI tooling to handle scale constraints of their large, multi-language monorepo.
A specific custom tool monitors pull requests for failed builds and automatically proposes fixes using an internal AI platform.
Data shows a direct correlation between high AI tool engagement and increased pull request (PR) throughput per engineer.
Internal surveys indicate a significant shift toward positive developer sentiment as AI tools become more integrated.

#mlp #culture

Read original

Dropbox Tech BlogJan 28, 2026

Engineering VP Josh Clemm on how we use knowledge graphs, MCP, and DSPy in Dash

Why it matters: Engineers face increasing data fragmentation across SaaS silos. This post details how to build a unified context engine using knowledge graphs, multimodal processing, and prompt optimization (DSPy) to enable effective RAG and agentic workflows over proprietary enterprise data.

Dropbox Dash functions as a universal context engine, integrating disparate SaaS applications and proprietary content into a unified searchable index.
The system utilizes custom crawlers to navigate complex API rate limits, diverse authentication schemes, and granular permission systems (ACLs).
Content enrichment involves normalizing files into markdown and using multimodal models for scene extraction in video and transcription in audio.
Knowledge graphs are employed to map relationships between entities across platforms, providing deeper context for agentic queries.
The engineering team leverages DSPy for programmatic prompt optimization and 'LLM as a judge' frameworks for automated evaluation.
The architecture explores the Model Context Protocol (MCP) to standardize how LLMs interact with external data sources and tools.

#mlp #data #dist

Read original

Dropbox Tech BlogDec 18, 2025

Inside the feature store powering real-time AI in Dropbox Dash

Why it matters: Building a scalable feature store is essential for real-time AI applications that require low-latency retrieval of complex user signals across hybrid environments. This approach enables engineers to move quickly from experimentation to production without managing underlying infrastructure.

Dropbox Dash utilizes a custom feature store to manage data signals for real-time machine learning ranking across fragmented company content.
The system bridges a hybrid infrastructure consisting of on-premises low-latency services and a Spark-native cloud environment for data processing.
Engineers selected Feast as the framework for its modular architecture and clear separation between feature definitions and infrastructure management.
To meet sub-100ms latency requirements, the store uses an in-house DynamoDB-compatible solution (Dynovault) for high-concurrency parallel reads.
The architecture supports both batch processing of historical data and real-time streaming ingestion to capture immediate user intent.

#mlp #data #dist

Read original

Dropbox Tech BlogNov 26, 2025

Building the future: highlights from Dropbox’s 2025 summer intern class

Why it matters: This article showcases how intern-led projects drive critical production improvements in ML observability, storage latency, and developer productivity, highlighting the practical application of AI in enterprise-scale infrastructure.

Dropbox's 2025 intern program integrated 28 engineering interns into high-impact projects supporting Dropbox Dash, an AI-powered universal search tool.
Interns refactored the file history tracking system within the metadata infrastructure, significantly reducing operational costs and simplifying legacy systems.
The ML Platform team developed 'AI Sentinel,' a monitoring system providing real-time operational visibility into the health of machine learning model deployments.
Storage Core improvements included implementing health-aware routing in Magic Pocket to mitigate PUT latencies during scheduled disk restarts.
The Web Developer Experience team built an AI-powered automation tool for code migrations that automatically generates pull requests for developers.

#culture #mlp #dist

Read original

Dropbox Tech BlogNov 17, 2025

How Dash uses context engineering for smarter AI

Why it matters: As AI moves from search to agents, managing the context window is critical. This article explains how to prevent performance degradation and context rot by curating tools and data, ensuring models remain fast and accurate even as capabilities expand.

Dropbox Dash transitioned from a standard RAG search system to an agentic AI capable of planning and executing complex tasks.
Context engineering was implemented to solve 'analysis paralysis' caused by providing the model with too many tool options and definitions.
The team utilizes the Model Context Protocol (MCP) but optimizes it to reduce token consumption and prevent performance degradation.
To combat 'context rot,' Dash limits tool definitions in the context window and filters for only the most relevant data.
Specialized agents are deployed for tasks requiring deeper reasoning to maintain precision without overwhelming the primary model.

#mlp #data

Read original

Dropbox Tech BlogOct 23, 2025

With Mobius Labs' Aana models, we're bringing deeper multimodal understanding to Dropbox Dash

Why it matters: Engineers must process massive unstructured multimedia data efficiently. This integration demonstrates how specialized architectures can achieve deep multimodal understanding at exabyte scale while maintaining low computational overhead and high search relevance.

Dropbox is integrating Mobius Labs' Aana models into Dropbox Dash to enhance multimodal search and understanding.
The Aana architecture is designed for high efficiency, significantly reducing computational requirements compared to traditional multimodal models.
Unlike siloed processing, Aana analyzes the relationships between text, audio, and video to interpret complex scenes and actions.
The system is built to handle 'Dropbox scale,' processing exabytes of rich media content across various applications.
This integration allows users to query multimedia files for specific insights without manual tagging or folder navigation.

#mlp #data #dist

Read original

Dropbox Tech BlogOct 22, 2025

Half-Quadratic Quantization of large machine learning models

Why it matters: HQQ enables engineers to deploy massive LLMs on consumer-grade hardware with minimal setup. By removing the need for calibration data and drastically reducing quantization time, it simplifies the pipeline for optimizing and testing state-of-the-art models at scale.

Introduces Half-Quadratic Quantization (HQQ), a data-free quantization technique for Large Language Models.
Achieves quantization speeds up to 50x faster than GPTQ, processing a Llama-2-70B model in under 5 minutes.
Eliminates the need for calibration datasets, removing data bias and reducing computational overhead during deployment.
Utilizes sparsity-promoting loss and hyper-Laplacian distributions to better model weight outliers compared to standard squared error.
Demonstrates that 2-bit quantized large models can outperform smaller full-precision models within similar memory constraints.

#mlp

Read original

Dropbox Tech BlogOct 2, 2025

A practical blueprint for evaluating conversational AI at scale

Why it matters: Building reliable LLM applications requires moving beyond ad-hoc testing. This framework shows engineers how to implement a rigorous, code-like evaluation pipeline to manage the unpredictability of probabilistic AI components and ensure consistent performance at scale.

LLM pipelines involve complex probabilistic stages like intent classification and retrieval, requiring systematic evaluation to prevent regressions.
Dropbox Dash moved from ad-hoc testing to an evaluation-first approach, treating every model or prompt change with the same rigor as production code.
A hybrid dataset strategy combines public benchmarks like MS MARCO for baselining with internal production logs to capture real-world user behavior.
Synthetic data generation using LLMs helps create evaluation sets for diverse content types, including tables, images, and factual lookups.
Traditional NLP metrics like BLEU and ROUGE are often inadequate for RAG systems, necessitating the development of more actionable, task-specific rubrics.

#mlp #data

Read original

Page 1 of 2

Prev1 2 Next