Back in March 2023, we urgently patched a high impact DDOS vulnerability that we introduced in our own code via a patch release to the
silverstripe/graphql module. In theory, an attacker could have used this vulnerability to repeatedly force the regeneration of the GraphQL schema which would impact the availability of the site for other users.
Given the somewhat unusual situation, we thought it would be worthwhile to review the incident to improve our process and minimise the risk that this would occur again. Specifically, these factors led us to do review:
- the severity of the issue
- the fact that the change had been implemented recently
- the alteration to our regular vulnerability management process.
This is a summary of that review.
Underlying issues that led to the vulnerability
The issue that the patch was meant to fix had been identified as a high impact bug. Basically, the GraphQL schema of some Silverstripe CMS sites would get corrupted out of the blue for no apparent reason which would stop key CMS features from working.
Despite the high impact of the issue, it was left unaddressed for several months after the initial investigation because of interdependencies with other teams at Silverstripe and pressure on the CMS Squad to hit the Silverstripe CMS 5 milestone.
When we finally did get a chance to look at the issue, we felt a time pressure to come up with a solution quickly so we could ship a fix before we had to get back to CMS 5 work. This was compounded by the fact that the issue was intermittent and that we didn’t know how to trigger it systematically. This led us to create a stopgap solution without fully understanding the problem or the downstream impact.
Even so, the potential DDOS attack vector was called out at the outset of the work. We did mitigate that specific scenario, but failed to identify all possible variations on the original attack.
More broadly, inconsistencies around where to store the GraphQL schema made things worse. This made diagnosing the issue more difficult and affects the overall resilience of our GraphQL implementation to this day. As a team, we also do not have the full appreciation of how the GraphQL is put together because of the departure of several key staff over the last two years.
What are some aspects that mitigated the problem
The CMS Squad quickly identified the attack vector a few days after the patch was released. Internally, the team quickly called out that following the regular process would delay the release of a patch and allow the vulnerability to be deployed to more sites. When patching undisclosed security vulnerabilities, we normally wait until the next regular release.
We took the time to validate how many times the vulnerable
silverstripe/graphql patch releases had been downloaded by looking at Packagist install statistics and our own internal installation statistics on Silverstripe Cloud. This allowed us to better evaluate the impact of deviating from our regular security process.
The team was able to adapt the regular security process to quickly reverse the patch and protect Silverstripe CMS projects.
We briefly considered trying to implement mitigation measures rather than completely revert the vulnerable patch. We concluded that simply reverting the patch would be safer. If we tried to keep the patch and just address the vulnerability, there’d be a chance we end up not entirely removing the vulnerability given the time pressure.
- We’ll review how and when the GraphQL schema generation works and better align it with other practices in Silverstripe CMS.
- We’ll improve our understanding of the GraphQL module, so we can better understand and maintain the module.
- When potential security issues are identified during the development process, we’ll take a step back and make sure we fully understand them before proceeding.