Blog

Categorically Unsafe Software

TIME TO CLASS UP THE JOINT!
Released

By: Senior Technical Advisors Bob Lord and Jack Cable, Senior Advisor Lauren Zabierek

In many of our writings about the secure by design initiative, we use phrases like “classes of coding error”, “classes of vulnerability”, or “categories of defect”. You may wonder why we place so much emphasis on grouping defects together rather than focusing on individual ones. In fact, we’ve had many people ask us why we urge software manufacturers to eliminate entire classes of defect like cross-site scripting (XSS), SQL injection (SQLi), directory traversal, and memory unsafety, as called for in our Secure by Design Pledge. To illustrate how focusing on quality and eliminating groups of errors improves security, we offer our current thinking. 

From patterns to progress

Copying successful industries. While it might seem like a novel concept to making software more secure, root cause analysis and mitigation of repeated classes of defect is the norm in industries that have significantly higher levels of quality and safety. In the aviation industry, experts analyze safety-related incidents to understand not just the proximate causes of the problem, but the multiple contributing factors, thereby allowing them to make recommendations to change pilot, crew, and air traffic controller training, plane maintenance, and cockpit and aircraft design, among others. 

Recognizing patterns. To improve product quality and security at scale, we need to spot patterns of recurring defect so that we can move from addressing each defect one at a time to eliminating them from the start. What kinds of coding errors do developers repeatedly make at any given software manufacturer, or across the industry? When we group negative outcomes into classes, we can start to engage in systems thinking to understand the real root causes and potential remedies. Moreover, this helps us to better predict how these product defects will be exploited by malicious actors, thereby giving defenders a leg up when leveraging limited resources.

Analyzing trends over time. Pattern detection is just the first step. We also need to understand how those patterns change over time. Are classes of defect increasing or decreasing over time for any given software product, or across the entire industry? By looking at trends, we can start to see which software companies are making progress and which need to initiate a quality improvement program. 

Generalizing remedies. By thinking about classes of defect, we can think beyond the symptom of the problem, and start to reason more about ways to generalize remedies. This line of thinking also serves to shift the responsibility of security from the least capable to the most equipped. Instead of software developers asking, “How can I fix this one defect?” they can ask “How can I prevent all similar defects?” Rather than fixing one SQL injection (SQLi) defect, why not eliminate them entirely, as some companies appear to have done? 

Remediation scale. Some classes of coding error can be eliminated with relatively low effort, while some may require significant effort. Until we learn how widespread various classes of defect are, we won’t be able to distinguish between classes of defect that companies should eliminate on their own, and those that require a coordinated effort by the larger software ecosystem.

Shift left. In the context of software development, people use the phrase “shift left” to mean conducting certain activities earlier in the software development lifecycle (SDLC). Part of the idea is that preventing coding mistakes is cheaper than catching and fixing them later in the timeline. The phrases “shift left” and “eliminating classes of vulnerability” are different sides of the same coin. If you truly align your product security program to prevent defects as early in the development cycle as possible, meaning that you are shifting security left on a timeline, you must contemplate ways to eliminate entire classes of coding error.

Developer ecosystems. Google’s March 2024 whitepaper titled “Secure by Design at Google” includes this important observation: “The security posture of software products and services is an emergent property of the developer ecosystem in which they are designed, implemented and deployed”. They further write that “careful design of developer ecosystems can drastically lower the incidence of certain kinds of defects, and in some cases practically eliminate them”. The idea is that repeated types of software defects are not the fault of the individual software developer. That means that “developer training” might not be the best remedy. Instead, they argue, those repeat offender defects are an emergent property of the tools and practices that the company has given their coders. Some tools make it nearly impossible for the developer to avoid coding errors. As one example, see the prevalence of memory safety defects in languages like C/C++ compared to others like Swift, C#, Java, Rust, Python, JavaScript, Go, and Ruby. 

Costs

Reducing costs to software manufacturers. Software defects can be reported at random times. Pulling software developers off other tasks to address software defects can be expensive and disruptive to project schedules. As any business looks to save where it can, it is useful to examine the costs of insecurity to the company. To instead achieve economies of scale, companies should invest in the tools and resources needed to prevent the introduction of entire classes of defect and achieve secure outcomes like secure developer ecosystems.

Reducing costs to the customers. Applying software updates is not a trivial matter in businesses, small or large – not to mention the costs of an intrusion. We now know that security will not be achieved by simply “patching harder.” Therefore, reducing the number of critical security fixes can reduce the load on IT professionals, and improve customer security.

Increasing costs to the threat actors. When we eliminate entire classes of defect, we make it harder for threat actors to exploit simple vulnerabilities. That raises their cost to conduct malicious cyber activity. If product teams eliminate enough of the classic defects, they may price some threat actors out of the market. 

Aren’t we doing that already?

Some software companies are already working to eliminate classes of coding error. Some have even accomplished that goal for the most common classes. But there is evidence that the industry as a whole is not making sufficient progress. In fact, many top software products fail to protect their customers from exploitation of the most common classes of defect. We read in the news about these common defects causing significant damage to companies and government agencies.

Let’s compare two documents, MITRE’s 2007 paper titled, “Unforgivable Vulnerabilities”, and their 2023 analysis of “Stubborn Weaknesses”. It would be a reasonable hope that the classes of defect from 2007 would have been eliminated and that the 2023 report would have all new classes of defect that are more expensive to exploit. We would hope that the software industry would eliminate top classes of defect every few years because doing so would increase the cost to threat actors.

Sadly, the truth is that the software industry has made inadequate progress since 2007, the year the iPhone was introduced. Of the 13 “unforgivable vulnerabilities”, most are still present in the 2023 report in one form or another. We are still plagued by classes of defect like memory unsafety, XSS, SQL injection, directory traversal, and improper input sanitization. What’s especially noteworthy is that for most of these classes of defect, we have known of ways to prevent them at scale for years, and even decades. Some damaging and costly cyber intrusions were likely preventable.

Over the past 17+ years, many software companies have prioritized fixing software defects found in customer deployments over fixing them in their product designs, thereby putting customers at risk and leading to significant real-world harms.

The CWE/CPE challenge

The above two reports rely on the Common Vulnerability and Exposure (CVE) program, and in particular, CVE Numbering Authorities’ (CNA) commitment to provide timely, complete, and correct CVE records. Especially important to root cause analysis are the Common Weakness Enumeration (CWE) and Common Platform Enumeration (CPE) fields. The CWE explains the type of coding error that created the defect. The CPE provides information about product and platform naming. 

Today, some CNAs ensure CWE and CPE fields are included with all CVE records they create, but many are not as diligent. Until all CNAs and software manufacturers fully commit themselves to ensuring that their CVE records (and CWE/CPE fields) are timely, complete, and correct, we will struggle to become a data-driven industry. Incomplete or inaccurate data will keep us in the dark about the root causes of cyber intrusions and inhibit our ability to prevent them at scale. 

But we’re seeing signs of progress. Recently Microsoft announced that they would “now publish root cause data for Microsoft CVEs using the Common Weakness Enumeration (CWE™) industry standard”. Their blog post is worth reading to understand their perspective. This development — and the commitment of 68 software manufacturers as part of our pledge to do the same — is exciting, and we hope all CNAs and software manufacturers follow suit. 

Conclusion

The next time you see a security-sensitive update for a software product, don’t focus on how the threat actors are abusing that specific coding error. Ask yourself what class of coding error it belongs to. If it belongs to one of the “unforgivable vulnerabilities” from the 2007 paper, or one of the recurring “stubborn weaknesses”, ask yourself why the software manufacturer shipped the product with that defect when systemic preventions are well-known. More importantly, ask those software companies what they are doing to eliminate that entire class of defect. As customers, we should demand that companies stop the practice of shipping defective software, and then only fixing the problems that are found in the field, often after a customer has been injured. 

The bottom line is that a secure by design software development program necessitates formal efforts to eliminate entire classes of defect before the product ships rather than playing whack-a-mole with defects that appear on customer systems in production. For the software manufacturer, a secure by design program that works to eliminate entire classes of defect is likely to be cheaper in the long run and will create a higher quality product that requires fewer emergency fixes. Such a program should be part of the company’s business strategy. For the customers, it will reduce the burden of having to apply as many urgent software updates. For our country, it will result in greater security and safety. It is the norm in other industries to perform root cause analysis and to work towards eliminating classes of defect and it is long past time for it to be the norm in the software industry.

For more information on developing software that is secure by design, please see CISA’s Secure by Design whitepaper. We encourage all software manufacturers to demonstrate their commitment by taking CISA’s Secure by Design Pledge.