Dropbox Tech Blog
https://dropbox.tech/Why it matters: Engineers face increasing data fragmentation across SaaS silos. This post details how to build a unified context engine using knowledge graphs, multimodal processing, and prompt optimization (DSPy) to enable effective RAG and agentic workflows over proprietary enterprise data.
- •Dropbox Dash functions as a universal context engine, integrating disparate SaaS applications and proprietary content into a unified searchable index.
- •The system utilizes custom crawlers to navigate complex API rate limits, diverse authentication schemes, and granular permission systems (ACLs).
- •Content enrichment involves normalizing files into markdown and using multimodal models for scene extraction in video and transcription in audio.
- •Knowledge graphs are employed to map relationships between entities across platforms, providing deeper context for agentic queries.
- •The engineering team leverages DSPy for programmatic prompt optimization and 'LLM as a judge' frameworks for automated evaluation.
- •The architecture explores the Model Context Protocol (MCP) to standardize how LLMs interact with external data sources and tools.
Why it matters: Building a scalable feature store is essential for real-time AI applications that require low-latency retrieval of complex user signals across hybrid environments. This approach enables engineers to move quickly from experimentation to production without managing underlying infrastructure.
- •Dropbox Dash utilizes a custom feature store to manage data signals for real-time machine learning ranking across fragmented company content.
- •The system bridges a hybrid infrastructure consisting of on-premises low-latency services and a Spark-native cloud environment for data processing.
- •Engineers selected Feast as the framework for its modular architecture and clear separation between feature definitions and infrastructure management.
- •To meet sub-100ms latency requirements, the store uses an in-house DynamoDB-compatible solution (Dynovault) for high-concurrency parallel reads.
- •The architecture supports both batch processing of historical data and real-time streaming ingestion to capture immediate user intent.
Why it matters: This article showcases how intern-led projects drive critical production improvements in ML observability, storage latency, and developer productivity, highlighting the practical application of AI in enterprise-scale infrastructure.
- •Dropbox's 2025 intern program integrated 28 engineering interns into high-impact projects supporting Dropbox Dash, an AI-powered universal search tool.
- •Interns refactored the file history tracking system within the metadata infrastructure, significantly reducing operational costs and simplifying legacy systems.
- •The ML Platform team developed 'AI Sentinel,' a monitoring system providing real-time operational visibility into the health of machine learning model deployments.
- •Storage Core improvements included implementing health-aware routing in Magic Pocket to mitigate PUT latencies during scheduled disk restarts.
- •The Web Developer Experience team built an AI-powered automation tool for code migrations that automatically generates pull requests for developers.
Why it matters: As AI moves from search to agents, managing the context window is critical. This article explains how to prevent performance degradation and context rot by curating tools and data, ensuring models remain fast and accurate even as capabilities expand.
- •Dropbox Dash transitioned from a standard RAG search system to an agentic AI capable of planning and executing complex tasks.
- •Context engineering was implemented to solve 'analysis paralysis' caused by providing the model with too many tool options and definitions.
- •The team utilizes the Model Context Protocol (MCP) but optimizes it to reduce token consumption and prevent performance degradation.
- •To combat 'context rot,' Dash limits tool definitions in the context window and filters for only the most relevant data.
- •Specialized agents are deployed for tasks requiring deeper reasoning to maintain precision without overwhelming the primary model.
Why it matters: Engineers must process massive unstructured multimedia data efficiently. This integration demonstrates how specialized architectures can achieve deep multimodal understanding at exabyte scale while maintaining low computational overhead and high search relevance.
- •Dropbox is integrating Mobius Labs' Aana models into Dropbox Dash to enhance multimodal search and understanding.
- •The Aana architecture is designed for high efficiency, significantly reducing computational requirements compared to traditional multimodal models.
- •Unlike siloed processing, Aana analyzes the relationships between text, audio, and video to interpret complex scenes and actions.
- •The system is built to handle 'Dropbox scale,' processing exabytes of rich media content across various applications.
- •This integration allows users to query multimedia files for specific insights without manual tagging or folder navigation.
Why it matters: HQQ enables engineers to deploy massive LLMs on consumer-grade hardware with minimal setup. By removing the need for calibration data and drastically reducing quantization time, it simplifies the pipeline for optimizing and testing state-of-the-art models at scale.
- •Introduces Half-Quadratic Quantization (HQQ), a data-free quantization technique for Large Language Models.
- •Achieves quantization speeds up to 50x faster than GPTQ, processing a Llama-2-70B model in under 5 minutes.
- •Eliminates the need for calibration datasets, removing data bias and reducing computational overhead during deployment.
- •Utilizes sparsity-promoting loss and hyper-Laplacian distributions to better model weight outliers compared to standard squared error.
- •Demonstrates that 2-bit quantized large models can outperform smaller full-precision models within similar memory constraints.
Why it matters: Building reliable LLM applications requires moving beyond ad-hoc testing. This framework shows engineers how to implement a rigorous, code-like evaluation pipeline to manage the unpredictability of probabilistic AI components and ensure consistent performance at scale.
- •LLM pipelines involve complex probabilistic stages like intent classification and retrieval, requiring systematic evaluation to prevent regressions.
- •Dropbox Dash moved from ad-hoc testing to an evaluation-first approach, treating every model or prompt change with the same rigor as production code.
- •A hybrid dataset strategy combines public benchmarks like MS MARCO for baselining with internal production logs to capture real-world user behavior.
- •Synthetic data generation using LLMs helps create evaluation sets for diverse content types, including tables, images, and factual lookups.
- •Traditional NLP metrics like BLEU and ROUGE are often inadequate for RAG systems, necessitating the development of more actionable, task-specific rubrics.
Why it matters: As AI workloads push GPU power consumption beyond the limits of traditional air cooling, liquid cooling becomes essential. This project demonstrates a viable path for maintaining hardware reliability and efficiency in high-density data centers.
- •Dropbox engineers developed a custom liquid cooling system for GPU servers during Hack Week 2025 to address the thermal demands of AI workloads.
- •The team built a prototype from scratch using radiators, pumps, reservoirs, and manifolds when pre-assembled units were unavailable.
- •Stress tests revealed that liquid cooling reduced operating temperatures by 20–30°C compared to standard air-cooled production systems.
- •The project enabled reduced fan speeds for secondary components, leading to quieter operation and potential power savings.
- •The initiative serves as a proof-of-concept for future-proofing data center infrastructure against the rising power consumption of next-gen GPUs.
- •Future plans include expanding testing with dedicated liquid cooling labs across multiple Dropbox data centers.
Why it matters: Dropbox's jump to 90% AI adoption provides a blueprint for scaling developer productivity. It shows how combining leadership alignment with a mix of third-party and internal tools can transform the SDLC and overcome developer skepticism toward AI-assisted workflows.
- •Dropbox achieved over 90% AI tool adoption among engineers by 2025 through strong leadership alignment and a structured change management plan.
- •The engineering organization utilizes AI across the entire software development lifecycle, including code generation, testing, debugging, and incident resolution.
- •A three-pronged strategy was employed: evaluating external tools like GitHub Copilot, developing custom internal AI solutions, and fostering a culture of knowledge sharing.
- •Initial adoption challenges, such as distrust of output quality and workflow friction, were addressed through peer-to-peer training and clear performance metrics.
- •The company balances third-party integrations with in-house development to solve specific organizational problems while building internal machine learning expertise.
Why it matters: Engineers often struggle to balance robust security with system performance. This approach demonstrates how to implement scalable, team-level encryption at rest using HSMs without sacrificing the speed of file sharing or the functionality of content search in a distributed environment.
- •Dropbox developed a team-based encryption system using Hardware Security Modules (HSM) for secure key generation and storage.
- •The architecture solves the performance bottleneck of re-encrypting 4MB file blocks during cross-team sharing operations.
- •Unique top-level keys allow enterprise teams to instantly disable access to their data, providing granular control over sensitive information.
- •The system balances high security with usability, maintaining features like content search that are often lost in traditional end-to-end encryption.
- •This security framework serves as the foundation for protecting AI-driven tools like Dropbox Dash and its universal search capabilities.