Dropbox Tech Blog

Dropbox Tech BlogJun 12, 2026

How Dropbox uses MCP and Dash to close the design-to-code security gap

Why it matters: This approach solves the persistent problem of security requirements getting lost during long development cycles. By using MCP and AI to bridge the gap between documentation and code, engineers ensure critical threat mitigations are implemented without manual overhead or human error.

Dropbox identified a significant design-to-code gap where security threat models are often disconnected from actual implementation in pull requests.
Data analysis revealed that only 12% of PRs link back to original security reviews, with a median delay of five weeks between design and implementation.
The solution leverages the Model Context Protocol (MCP) to allow AI agents to access indexed documentation via Dash, Dropbox's AI-powered search tool.
The system automatically retrieves relevant threat models during code review and evaluates if changes align with previously agreed-upon security mitigations.
By using Dash's MCP server, the agent draws on years of security reviews and engineering docs without requiring manual linking or custom integrations.
This approach helps identify security-sensitive work that might have missed review, providing an early signal during the development lifecycle.

#security #mlp

Read original

Dropbox Tech BlogMay 28, 2026

Beyond code generation: rethinking engineering productivity in the age of AI agents

Why it matters: AI tools accelerate coding but can overwhelm CI/CD and review pipelines. This shift from writing code to orchestrating agents requires new platforms and metrics to ensure that increased output actually translates into customer value without breaking engineering systems.

Dropbox is transitioning from AI copilots to autonomous agents that can edit files, run tests, and iterate on failures independently.
Increased code generation speed has shifted engineering bottlenecks downstream to code reviews, CI systems, and release coordination.
Nova, an internal platform, allows engineers to execute scoped tasks via agents and now accounts for approximately 8% of all pull requests.
AI agents are being prioritized for high-toil engineering work such as migrations, flaky test remediation, and dependency updates.
Productivity measurement is evolving from simple output metrics like PR count to holistic signals like review burden and customer impact.
The engineering role is shifting toward intent definition, architectural oversight, and final quality validation rather than manual implementation.

#mlp #culture #sre

Read original

Dropbox Tech BlogMay 21, 2026

Introducing Nova, our internal platform for coding agents

Why it matters: Nova shows how to scale AI agents in complex enterprise environments. By moving beyond simple chat to a platform that validates code changes within a real build system, engineers can automate high-toil tasks like CI debugging and migrations while maintaining high code quality.

Dropbox built Nova, an internal platform for running coding agents to automate repetitive engineering tasks like debugging CI failures and updating dependencies.
The platform provides isolated cloud environments with codebase snapshots, allowing agents to operate within Dropbox's specific monorepo and Bazel build system.
Nova supports both interactive developer sessions and asynchronous, automated workflows triggered by internal systems.
A key feature is the feedback loop where agents receive real-time validation results from build and test commands to iterate on code patches.
By centralizing agent execution, the platform ensures consistent context engineering and infrastructure integration across various use cases.

#mlp #sre

Read original

Dropbox Tech BlogApr 2, 2026

Improving storage efficiency in Magic Pocket, our immutable blob store

Why it matters: Managing storage overhead at exabyte scale is critical for cost efficiency. This article provides a blueprint for handling fragmentation in immutable systems, ensuring infrastructure growth is driven by actual data needs rather than system-induced waste.

Magic Pocket is Dropbox's custom, exabyte-scale immutable blob store designed for high durability and efficiency.
Because the system is immutable, deleted or updated data remains on disk until reclaimed through a compaction process.
A recent change to data placement reduced write amplification but inadvertently increased fragmentation and storage overhead.
Storage overhead is managed by identifying live blobs via garbage collection and rewriting them into new, dense volumes.
Dropbox implemented a multi-strategy compaction approach to target under-filled volumes and drive down infrastructure costs.

#data #dist #finops

Read original

Dropbox Tech BlogMar 25, 2026

Reducing our monorepo size to improve developer velocity

Why it matters: Monorepo bloat directly impacts developer velocity and CI efficiency. This article highlights how Git's internal delta compression heuristics can fail at scale, providing a blueprint for diagnosing and fixing repository growth issues before hitting platform limits like GitHub's 100GB cap.

Dropbox reduced its server monorepo size from 87GB to 20GB, a 77% reduction that cut clone times from over an hour to under 15 minutes.
The repository was approaching GitHub's 100GB hard limit, creating operational risk and slowing down CI/CD pipelines.
Investigation revealed that Git's default delta compression heuristic, which only checks the last 16 characters of a file path, was failing on i18n files.
Because language codes appeared early in the directory structure, Git failed to pair similar translation files for efficient diffing, causing massive storage bloat.
The optimization significantly improved developer velocity by reducing the overhead of fresh clones for new environments and automated testing jobs.

#sre #culture

Read original

Dropbox Tech BlogMar 17, 2026

How we optimized Dash's relevance judge with DSPy

Why it matters: Scaling LLM-based evaluation is difficult because prompts are model-specific. Using DSPy transforms prompt engineering into a systematic optimization process, allowing teams to maintain high relevance accuracy while swapping models to meet cost and latency requirements.

Dropbox Dash utilizes a relevance judge to rank search results and generate training data grounded in company-specific context.
Manual prompt engineering was found to be fragile and non-transferable when migrating between different LLM providers or sizes.
The team adopted DSPy to automate prompt optimization, turning manual tuning into a repeatable, programmatic optimization loop.
Success is measured using Normalized Mean Squared Error (NMSE) by comparing model-generated scores against human-labeled benchmarks.
The optimization objective accounts for structural reliability, ensuring models consistently output valid JSON for production pipelines.
This systematic approach allows Dropbox to scale relevance labeling while balancing the trade-offs between model cost, latency, and accuracy.

#mlp #data

Read original

Dropbox Tech BlogFeb 26, 2026

Using LLMs to amplify human labeling and improve Dash search relevance

Why it matters: Effective RAG systems depend on high-quality search ranking. Using LLMs to scale relevance labeling allows engineers to train more accurate models faster, overcoming the scalability and privacy limitations of traditional human-only labeling workflows.

Dropbox Dash uses a Retrieval-Augmented Generation (RAG) pattern, where search ranking quality is critical for grounding LLM responses.
Ranking models are trained using XGBoost on query-document pairs scored on a 1-5 relevance scale.
While human labeling provides a high-quality gold standard, it is expensive, slow, and poses privacy risks with proprietary data.
LLMs amplify human efforts by generating relevance labels at scale, significantly increasing the volume of training data available.
The hybrid labeling approach combines a small set of human-labeled data with LLM-assisted evaluation to improve model accuracy.
LLM-based labeling effectively addresses the cold-start problem for new search features where user behavior data is sparse.

#mlp #data

Read original

Dropbox Tech BlogFeb 12, 2026

How low-bit inference enables efficient AI

Why it matters: As AI models scale to trillions of parameters, low-bit inference is essential for maintaining low latency and cost-efficiency. It allows engineers to deploy sophisticated models on existing hardware by optimizing memory usage and maximizing throughput via specialized GPU cores.

Low-bit inference reduces memory and compute requirements by decreasing numerical precision during model serving.
Large-scale models like Kimi-K2.5 (1T parameters) require these optimizations to manage energy and hardware constraints.
Compute costs in attention-based models are driven by matrix multiplications in linear layers and the attention mechanism.
Specialized hardware, such as NVIDIA Tensor Cores and AMD Matrix Cores, doubles throughput when precision is halved.
Quantization is critical for delivering responsive, cost-effective AI features like search and summarization in production.

#mlp #finops

Read original

Dropbox Tech BlogFeb 11, 2026

Insights from our executive roundtable on AI and engineering productivity

Why it matters: AI is shifting from experimental to essential in the SDLC. Dropbox's experience shows that combining off-the-shelf tools with custom solutions for specific monorepo constraints can measurably increase PR throughput and improve developer satisfaction at scale.

Dropbox integrated AI tools like Claude Code and Cursor into their engineering workflow to accelerate feature delivery.
AI adoption was elevated to a company-level priority to reduce administrative overhead and align leadership.
The engineering team developed custom AI tooling to handle scale constraints of their large, multi-language monorepo.
A specific custom tool monitors pull requests for failed builds and automatically proposes fixes using an internal AI platform.
Data shows a direct correlation between high AI tool engagement and increased pull request (PR) throughput per engineer.
Internal surveys indicate a significant shift toward positive developer sentiment as AI tools become more integrated.

#mlp #culture

Read original

Dropbox Tech BlogJan 28, 2026

Engineering VP Josh Clemm on how we use knowledge graphs, MCP, and DSPy in Dash

Why it matters: Engineers face increasing data fragmentation across SaaS silos. This post details how to build a unified context engine using knowledge graphs, multimodal processing, and prompt optimization (DSPy) to enable effective RAG and agentic workflows over proprietary enterprise data.

Dropbox Dash functions as a universal context engine, integrating disparate SaaS applications and proprietary content into a unified searchable index.
The system utilizes custom crawlers to navigate complex API rate limits, diverse authentication schemes, and granular permission systems (ACLs).
Content enrichment involves normalizing files into markdown and using multimodal models for scene extraction in video and transcription in audio.
Knowledge graphs are employed to map relationships between entities across platforms, providing deeper context for agentic queries.
The engineering team leverages DSPy for programmatic prompt optimization and 'LLM as a judge' frameworks for automated evaluation.
The architecture explores the Model Context Protocol (MCP) to standardize how LLMs interact with external data sources and tools.

#mlp #data #dist

Read original

Page 1 of 2

Prev1 2 Next