GitHub Availability Report: November 2025
Why It Matters
This report highlights common infrastructure challenges like rate limiting, certificate management, and configuration errors. It offers valuable insights into incident response, mitigation strategies, and proactive measures for maintaining high availability in complex distributed systems.
Key Takeaways
- •GitHub experienced three incidents in November 2025, affecting Dependabot, Git operations, and Copilot services.
- •A Dependabot incident was caused by hitting GitHub Container Registry rate limits, resolved by adjusting job rates and increasing limits.
- •All Git operations failed due to an expired TLS certificate for internal service-to-service communication, mitigated by certificate replacement and service restarts.
- •A Copilot outage for the Claude Sonnet 4.5 model resulted from a misconfiguration in an internal service, which was resolved by reverting the change.
- •Post-incident actions include adding new monitoring, auditing certificates, accelerating automation for certificate management, and improving cross-service deploy safeguards.
Keywords
Content Preview
In November, we experienced three incidents that resulted in degraded performance across GitHub services.
November 17 16:52 UTC (lasting 2 hours and 16 minutes)
On November 17, 2025, from 16:52 to 19:08 UTC, Dependabot was hitting a rate limit in GitHub Container Registry (GHCR) and was unable to complete about 57% of jobs within SLO.
To mitigate the issue, we lowered the rate at which Dependabot started jobs and increased the GHCR rate limit. This mitigated the circumstances and led to the resolution of the incident.
Longer term, we’re adding new monitors and alerts to help prevent this in the future.
November 18 20:30 UTC (lasting 1 hour and 4 minutes)
On November 18, 2025, from 20:30 to 21:34 UTC, we experienced failures on all Git operations, including both SSH and HTTP Git client interactions, as well as raw file access. These failures also impacted products that rely on Git operations.
The root cause was an expired TLS certificate used for internal service-to-service communication. We mitigated the incident by replacing the expired certificate and restarting impacted services. Once those services were restarted we saw a full recovery.
We have updated our alerting to cover the expired certificate, and we are performing an audit of other certificates in this area to ensure they also have the proper alerting and automation before expiration. In parallel, we are accelerating efforts to eliminate our remaining manually managed certificates, ensuring all service-to-service communication is fully automated.
November 28 05:59 UTC (lasting 2 hours and 24 minutes)
On November 28, 2025, between approximately 05:59 and 08:24 UTC, Copilot experienced an outage affecting the Claude Sonnet 4.5 model. Users attempting to use this model received an HTTP 400 error indicating no model was available until an alternative model was selected. Other models were not impacted.
The issue was caused by a misconfiguration deployed to an internal service, which made Claude Sonnet 4.5 erroneously listed as unavailable. The problem was identified and mitigated by reverting the configuration change. We are working to improve cross-service deploy safeguards to prevent similar incidents in the future.
Follow our status page for real-time updates on status changes and post-incident recaps. To learn more about what we’re working on, check out the engineering section on the GitHub Blog.
The post GitHub Availability Report: November 2025 appeared first on The GitHub Blog.
Continue reading on the original blog to support the author
Read Full Article