Scoring Methodology¶
At a Glance
This page describes a 7-dimension scoring framework used to rank software targets for vulnerability research prioritization. Each target receives a weighted composite score (minimum 14, maximum 70) and is assigned to one of four priority tiers: Critical, High, Medium, or Low. The methodology is semi-quantitative and fully reproducible, enabling consistent comparative ranking across diverse software targets.
Overview¶
Choosing where to focus vulnerability research effort requires balancing many factors: how widely deployed a target is, how much attack surface it exposes, how severe a compromise would be, and how tractable the codebase is for analysis. A purely qualitative approach (gut feel, reputation) introduces bias and is difficult to communicate or reproduce. A purely quantitative approach demands data that is often unavailable or unreliable across heterogeneous software projects.
This framework takes a semi-quantitative middle path. Each target is scored on seven criteria using a simple 1-to-5 scale, with each criterion carrying an explicit weight that reflects its relative importance to security impact. The resulting composite score provides a defensible, comparable ranking without overstating precision.
Scoring Criteria¶
The table below defines all seven dimensions. Each criterion is scored from 1 (lowest relevance) to 5 (highest relevance), then multiplied by its weight.
| Criterion | Weight | 1 (Low) | 5 (High) |
|---|---|---|---|
| Deployment Scale | 3x | Niche or specialized use | Billions of installations worldwide |
| Cross-Platform Presence | 1x | Single OS or architecture | All major platforms plus embedded/IoT |
| Protocol/Input Exposure | 3x | No external or untrusted input | Directly network-facing, parses untrusted data |
| Privilege Level | 2x | Unprivileged userspace process | Kernel, root, or hypervisor context |
| Dependency Footprint | 2x | Standalone with few downstream consumers | Foundational library with thousands of dependents |
| Codebase Complexity | 1x | Small, simple, modern codebase | Large, complex, legacy codebase |
| Historical CVE Density | 2x | Few known CVEs over project lifetime | Frequent high-severity CVEs |
Total weight: 3 + 1 + 3 + 2 + 2 + 1 + 2 = 14
Criterion Rationale¶
Deployment Scale (3x): The number of installations directly determines how many systems are affected by a vulnerability. A flaw in software running on billions of devices has categorically more impact than one in a niche tool. This criterion receives the highest weight alongside Protocol/Input Exposure.
Cross-Platform Presence (1x): Software that runs across operating systems and architectures broadens the potential attack surface and complicates patching. However, cross-platform presence alone does not make software more exploitable, so it receives a lower weight.
Protocol/Input Exposure (3x): Network-facing software that parses untrusted input is the primary entry point for remote exploitation. This criterion shares the highest weight with Deployment Scale because exposure determines reachability, and reachability determines blast radius.
Privilege Level (2x): A vulnerability in kernel or hypervisor code can lead to full system compromise, while a bug in an unprivileged process may be contained by OS-level isolation. Privilege level shapes the severity ceiling of any discovered vulnerability.
Dependency Footprint (2x): Libraries consumed by thousands of downstream projects (such as OpenSSL or zlib) multiply impact through transitive dependency chains. A single flaw propagates across every project that links against the library.
Codebase Complexity (1x): Large, legacy codebases with complex control flow are harder to audit and more likely to harbor latent bugs. This criterion is weighted lower because complexity affects research tractability more than it affects impact.
Historical CVE Density (2x): A track record of frequent, high-severity vulnerabilities signals ongoing risk and suggests that additional undiscovered flaws are likely. Past CVE density is an imperfect but useful proxy for current attack surface quality.
Composite Score and Priority Tiers¶
The composite score is the sum of all weighted criterion scores:
Composite = (Deployment Scale x 3) + (Cross-Platform x 1) + (Protocol/Input Exposure x 3) + (Privilege Level x 2) + (Dependency Footprint x 2) + (Codebase Complexity x 1) + (Historical CVE Density x 2)
With every criterion scored at 1, the minimum composite is 14. With every criterion scored at 5, the maximum is 70.
Targets are assigned to priority tiers based on their composite score:
| Tier | Score Range | Description |
|---|---|---|
| Critical | 55 to 70 | Highest-priority targets for vulnerability research |
| High | 40 to 54 | Important targets with significant security relevance |
| Medium | 25 to 39 | Moderate-priority targets worth monitoring |
| Low | Below 25 | Lower-priority targets in this context |
Low Tier Population
The Low tier is expected to be sparsely populated in this section. The targets covered here are security-critical software by selection, so most will score Medium or above. A Low score does not imply the software is unimportant, only that it ranks lower relative to other high-value targets.
Worked Example: curl/libcurl¶
To illustrate the scoring process, consider curl/libcurl, the ubiquitous data transfer library and command-line tool.
| Criterion | Score | Rationale | Weighted |
|---|---|---|---|
| Deployment Scale | 5 | Installed on virtually every Linux, macOS, and Windows system; embedded in countless applications | 15 |
| Cross-Platform Presence | 5 | Runs on all major operating systems, embedded platforms, and IoT devices | 5 |
| Protocol/Input Exposure | 4 | Parses URLs, HTTP headers, TLS handshakes, and dozens of protocols from network input; typically client-initiated rather than always-listening | 12 |
| Privilege Level | 2 | Generally runs as an unprivileged user process; not a kernel component | 4 |
| Dependency Footprint | 5 | libcurl is a foundational dependency for thousands of projects across every language ecosystem | 10 |
| Codebase Complexity | 4 | Large C codebase (~170k lines), supports 28+ protocols, significant legacy surface | 4 |
| Historical CVE Density | 4 | Over 140 CVEs to date, including multiple high-severity memory safety issues | 8 |
Composite Score: 15 + 5 + 12 + 4 + 10 + 4 + 8 = 58 (Critical tier)
curl/libcurl lands in the Critical tier, which aligns with its status as one of the most widely deployed and actively researched open-source projects.
Scoring Workflow¶
The following diagram summarizes the end-to-end process for scoring a target.
flowchart TD
A[Identify Target] --> B[Score Each Criterion\n1 to 5]
B --> C[Apply Weights\nto Each Score]
C --> D[Sum Weighted Scores\nComposite Total]
D --> E{Assign Priority Tier}
E -->|55-70| F[Critical]
E -->|40-54| G[High]
E -->|25-39| H[Medium]
E -->|Below 25| I[Low] Limitations and Caveats¶
This scoring methodology is designed for practical use, not theoretical precision. Several limitations apply:
- Scores are best-effort estimates, not precise measurements. Criteria like "Deployment Scale" and "Historical CVE Density" rely on publicly available data that may be incomplete or outdated. Scores should be treated as informed approximations.
- The rubric is designed for comparative ranking, not absolute assessment. A score of 58 does not have inherent meaning in isolation. Its value comes from enabling consistent comparison across targets scored with the same framework.
- Weights reflect general security priorities and may not apply to every context. An organization focused exclusively on kernel security, for example, might weight Privilege Level more heavily. The weights chosen here optimize for broad vulnerability research prioritization.
- Temporal sensitivity is not captured. A target's score can shift as deployment patterns change, new CVEs emerge, or codebases are rewritten. Scores represent a snapshot and should be revisited periodically.
Applying the Rubric to New Targets¶
To score a target not already covered in this section:
- Gather baseline data. Identify the software's deployment footprint, supported platforms, input parsing surface, typical privilege context, downstream dependents, codebase size, and CVE history.
- Score each criterion independently. Use the 1-to-5 anchors in the criteria table as reference points. When in doubt, compare against targets that have already been scored to maintain consistency.
- Apply weights and sum. Compute the composite score using the formula above.
- Assign the priority tier. Use the tier boundaries to classify the target.
- Document rationale. Record brief justifications for each criterion score, as shown in the worked example. This ensures the score is reproducible and can be challenged or updated by others.
When scoring, aim for consistency across targets rather than precision on any single score. The framework's value lies in producing a defensible, comparable ranking across the full set of prioritized targets.
tags: - glossary
Glossary¶
| Term | Definition |
|---|---|
| AFL | American Fuzzy Lop, coverage-guided fuzzer |
| ASan | AddressSanitizer, memory error detector |
| CVE | Common Vulnerabilities and Exposures |
| AFL++ | Community-maintained successor to AFL, the de facto standard coverage-guided fuzzer |
| AEG | Automatic Exploit Generation, automated creation of working exploits from vulnerability information |
| ANTLR | ANother Tool for Language Recognition, parser generator used by grammar-aware fuzzers like Superion |
| AST | Abstract Syntax Tree, tree representation of source code structure used by static analyzers |
| BOD | Binding Operational Directive, mandatory cybersecurity directives issued by CISA |
| BOF | Buffer Overflow, writing data beyond allocated memory bounds, a common memory safety vulnerability |
| CFG | Control Flow Graph, directed graph representing all possible execution paths through a program |
| CGC | Cyber Grand Challenge, DARPA competition for autonomous vulnerability detection and patching |
| ClusterFuzz | Google's distributed fuzzing infrastructure that powers OSS-Fuzz |
| CodeQL | GitHub's query-based static analysis engine that treats code as a queryable database |
| CFAA | Computer Fraud and Abuse Act, US federal law governing computer security violations |
| CNA | CVE Numbering Authority, organization authorized to assign CVE IDs |
| CNNVD | China National Vulnerability Database of Information Security |
| CNVD | China National Vulnerability Database |
| Concolic | Concrete + Symbolic, execution that runs concrete values while tracking symbolic constraints |
| Corpus | Collection of seed inputs used by a coverage-guided fuzzer as the basis for mutation |
| Coverity | Synopsys commercial static analysis platform with deep interprocedural analysis |
| CPG | Code Property Graph, unified representation combining AST, CFG, and data-flow graph, used by Joern |
| CVSS | Common Vulnerability Scoring System, standard for rating vulnerability severity |
| CWE | Common Weakness Enumeration, categorization of software weakness types |
| DAST | Dynamic Application Security Testing, testing running applications for vulnerabilities |
| DBI | Dynamic Binary Instrumentation, modifying program behavior at runtime without recompilation |
| DFG | Data Flow Graph, graph representing how data values propagate through a program |
| DPA | Differential Power Analysis, extracting cryptographic keys by analyzing power consumption variations |
| Frida | Dynamic instrumentation toolkit for injecting scripts into running processes |
| Harness | Glue code connecting a fuzzer to its target, defining how fuzzed input is delivered |
| HWASAN | Hardware-assisted AddressSanitizer, ARM-based variant of ASan with lower overhead |
| IAST | Interactive Application Security Testing, combines elements of SAST and DAST during testing |
| Infer | Meta's open-source static analyzer based on separation logic and bi-abduction |
| JVN | Japan Vulnerability Notes, Japanese vulnerability information portal |
| KLEE | Symbolic execution engine built on LLVM for automatic test generation |
| LLM | Large Language Model, neural network trained on text/code, used for bug detection and code generation |
| LSAN | LeakSanitizer, detector for memory leaks, often used alongside AddressSanitizer |
| Meltdown | CPU vulnerability exploiting out-of-order execution to read kernel memory from user space |
| MITRE | Non-profit organization that maintains CVE, CWE, and ATT&CK frameworks |
| MTTR | Mean Time to Remediate, average duration from vulnerability disclosure to patch deployment |
| MSan | MemorySanitizer, detector for reads of uninitialized memory |
| NVD | National Vulnerability Database, NIST-maintained repository of vulnerability data |
| NIST | National Institute of Standards and Technology, US agency maintaining security standards and NVD |
| OpenSSF | Open Source Security Foundation, Linux Foundation project for open-source security |
| OSS-Fuzz | Google's free continuous fuzzing service for open-source software |
| OWASP | Open Worldwide Application Security Project, community producing security guides and tools |
| RCE | Remote Code Execution, vulnerability allowing an attacker to run arbitrary code on a target system |
| RL | Reinforcement Learning, ML paradigm where agents learn through reward-based feedback |
| S2E | Selective Symbolic Execution, whole-system analysis platform combining QEMU with KLEE |
| SARIF | Static Analysis Results Interchange Format, standard for exchanging static analysis findings |
| SAST | Static Application Security Testing, analyzing source code for vulnerabilities without execution |
| SCA | Software Composition Analysis, identifying known vulnerabilities in third-party dependencies |
| Seed | Initial input provided to a fuzzer as the starting point for mutation |
| Semgrep | Lightweight open-source static analysis tool using pattern-matching rules |
| Side-channel | Attack vector exploiting physical implementation artifacts rather than algorithmic flaws |
| SMT | Satisfiability Modulo Theories, solver used by symbolic execution to find inputs satisfying path constraints |
| Spectre | Family of CPU vulnerabilities exploiting speculative execution to leak data across security boundaries |
| SQLi | SQL Injection, injecting malicious SQL into queries via unsanitized user input |
| SSRF | Server-Side Request Forgery, tricking a server into making requests to unintended destinations |
| SymCC | Compilation-based symbolic execution tool that is 2--3 orders of magnitude faster than KLEE |
| Taint analysis | Tracking the flow of untrusted data from sources to security-sensitive sinks |
| VDP | Vulnerability Disclosure Program, formal process for receiving vulnerability reports |
| TOCTOU | Time-of-Check-Time-of-Use, race condition between validating a resource and using it |
| TSan | ThreadSanitizer, detector for data races in multithreaded programs |
| UAF | Use-After-Free, accessing memory after it has been deallocated |
| UBSan | UndefinedBehaviorSanitizer, detector for undefined behavior in C/C++ |
| Valgrind | Dynamic binary instrumentation framework for memory debugging and profiling |
| XSS | Cross-Site Scripting, injecting malicious scripts into web pages viewed by other users |
| Fine-tuning | Adapting a pre-trained ML model to a specific task using additional training data |
| AUTOSAR | Automotive Open System Architecture, standardized software framework for automotive ECUs |
| CAN | Controller Area Network, vehicle bus standard for microcontroller communication |
| DNP3 | Distributed Network Protocol, used in SCADA and utility systems |
| EDK II | EFI Development Kit II, open-source UEFI firmware development environment |
| OPC UA | Open Platform Communications Unified Architecture, industrial automation protocol |
| RTOS | Real-Time Operating System, OS designed for real-time applications with deterministic timing |
| Abstract interpretation | Mathematical framework for approximating program behavior using abstract domains |
| Dataflow analysis | Tracking how values propagate through a program to detect bugs like taint violations |