Partial outage on Ghost(Pro)

Write-up

The root cause of this incident was an invisible change made by our upstream provider, which caused one of our database servers to behave as though it was under significantly more load than usual.

We subsequently spent far too long initially a to find the load and fix it and then - after discovering the issue was upstream - waiting for their resolution, rather than failing over to a new server.

We completed a detailed postmortem process. The incident lasted for far longer than we feel is acceptable, and although the root cause was upstream, the time to recovery was entirely within our control.

We’re making significant changes to how we respond to similar incidents in the future, including:

Improved approach for analysis and remediation for high-impact incidents as a team
Better default remediation steps for DB performance issues
Changes to our DB cluster setups to make performance issues easier to mitigate
Our upstream vendor has also changed their policy for when they will perform changes and how they will communicate that to us

These changes will help us recover service significantly faster in the future.

We apologise to everyone who was impacted by this issue. If you have further questions, please reach out to support@ghost.org.

Write-up

Partial outage on Ghost(Pro)

Partial outage

View the incident

The root cause of this incident was an invisible change made by our upstream provider, which caused one of our database servers to behave as though it was under significantly more load than usual.

We’re making significant changes to how we respond to similar incidents in the future, including:

Improved approach for analysis and remediation for high-impact incidents as a team
Better default remediation steps for DB performance issues
Changes to our DB cluster setups to make performance issues easier to mitigate
Our upstream vendor has also changed their policy for when they will perform changes and how they will communicate that to us

These changes will help us recover service significantly faster in the future.

We apologise to everyone who was impacted by this issue. If you have further questions, please reach out to support@ghost.org.