Skip to content

Stateful Fuzzing

At a Glance

Gap Fuzzing targets with complex internal state and multi-step interaction sequences
Severity High, network protocols, APIs, and session-based systems are inadequately served
Current State A handful of research tools exist (AFLNet, StateAFL, SGFuzz) but no mature, general-purpose solution
Key Barrier State space explosion when combining input grammar with protocol state machines

The Stateful Problem

Most fuzzing tools operate on a fundamentally stateless model: send one input, observe one output, reset. Coverage-guided fuzzers like AFL++ and libFuzzer are designed around this cycle; they feed a single byte sequence to a target, collect coverage, and move on. Grammar-aware fuzzers like Nautilus generate structurally valid individual inputs. Even enterprise platforms like OSS-Fuzz run stateless harnesses that process one input per execution.

This stateless model works exceptionally well for parsers, decoders, and library APIs that process self-contained inputs. But a large and critical class of software is inherently stateful: its behavior depends not just on the current input but on the entire sequence of prior interactions. Fuzzing these targets effectively requires understanding and exploring their state machines, a problem that remains largely unsolved in practice.

Network Protocols with State Machines

Network protocols are defined by state machines. An TLS handshake requires a specific sequence of messages (ClientHello, ServerHello, Certificate, etc.) in a specific order, with each message's validity depending on the current protocol state. An SMTP session requires HELO/EHLO before MAIL FROM, which must precede RCPT TO, which must precede DATA. Sending messages out of order produces immediate rejection, not interesting bug-triggering behavior.

A coverage-guided fuzzer that mutates individual bytes within a single message will almost never produce a valid multi-step protocol interaction from scratch. The overwhelming majority of mutated message sequences are rejected at the protocol state machine level, exercising only error-handling code. The deep protocol logic (where the interesting bugs live) remains unreachable.

Multi-Step API Interactions

REST APIs, database interfaces, and RPC services exhibit similar stateful behavior. Creating a resource, modifying it, and then accessing it in a specific state may trigger a bug that no single API call could reach. Authentication flows require login before accessing protected endpoints. Transaction-based systems require begin/commit/rollback sequences.

Authentication and Session-Dependent Behavior

Session management introduces state that persists across requests. A vulnerability might only manifest when a user is authenticated with specific privileges, when a session has been active for a certain duration, or when multiple concurrent sessions interact. Fuzzing these conditions requires maintaining session state across a sequence of interactions.

Resource Lifecycle Bugs

Some of the most critical stateful bugs involve resource lifecycle errors that span multiple operations: use-after-free across sessions (where one request frees a resource that a concurrent request still references), file descriptor leaks that accumulate over many requests, and race conditions between connection setup and teardown. These bugs are invisible to single-request fuzzing.

The Protocol Fuzzing Bottleneck

Synopsys Defensics addresses protocol fuzzing with over 300 pre-built protocol test suites, but it is a commercial, closed-source platform that is not coverage-guided. Open-source protocol fuzzing remains a largely manual, expert-driven activity. Each new protocol requires custom tooling, grammar specification, and state machine modeling.

Current Approaches

AFLNet

AFLNet (Pham et al., ICST 2020) adapts AFL for stateful network protocol fuzzing. It intercepts network traffic between the fuzzer and a server target, parses message boundaries, and maintains a model of the server's state machine inferred from observed response codes. AFLNet uses this state model to guide mutation toward state transitions that reach deeper protocol states.

AFLNet represents a significant step forward, demonstrating that coverage-guided fuzzing can be extended to stateful targets. It has found real bugs in live555 (RTSP), LightFTP, and Exim (SMTP). However, AFLNet's state inference is based on response codes, which provides only a coarse approximation of the server's internal state. It cannot distinguish between different internal states that produce the same response code.

StateAFL

StateAFL (Natella, NDSS 2022) improves on AFLNet by inferring protocol states from in-memory data structures rather than network response codes. It instruments the server to capture snapshots of key data structures after each message exchange, using these snapshots to build a more detailed state model. This finer-grained state tracking enables StateAFL to distinguish states that AFLNet conflates.

StateAFL's approach is more precise but also more invasive: it requires identifying which data structures represent protocol state and instrumenting them. This limits portability across targets and requires target-specific setup.

SGFuzz

SGFuzz (Ba et al., IEEE S&P 2022) takes a different approach by focusing on state-guided fuzzing for stateful targets. It uses static analysis to identify state variables in the target program and instruments them to provide state-aware feedback to the fuzzer. The fuzzer prioritizes inputs that trigger state transitions, rather than just new code coverage.

Boofuzz

Boofuzz is a fork and successor to the Sulley fuzzing framework, focused on network protocol fuzzing. Boofuzz uses a definition-based approach: users define protocol message formats and valid sequences as Python data structures, and the fuzzer generates mutations within those constraints.

Boofuzz is practical and widely used for protocol testing, but it is not coverage-guided, it relies on the completeness of the user-defined protocol model rather than runtime feedback. This makes it effective for known protocol structures but unable to discover unexpected behaviors that fall outside the model.

Limitations of Current Tools

State Space Explosion

The core challenge is combinatorial. A protocol with $n$ message types and $m$ states has $O(n \times m)$ possible transitions at each step. A sequence of $k$ messages produces $O((n \times m)^k)$ possible interaction traces. Even for simple protocols, the state space grows explosively with sequence length. Current tools use heuristics to prune this space (prioritizing novel state transitions, limiting sequence length, or focusing on specific protocol phases) but these heuristics are protocol-specific and may miss deep bugs.

State Space Management

No existing tool provides a principled, general-purpose approach to managing state space explosion in protocol fuzzing. Techniques from model checking (partial-order reduction, symmetry reduction, bounded model checking) have not been widely adopted in the fuzzing community, despite their potential relevance.

Grammar Plus State

Grammar-aware fuzzers handle structured input formats. Stateful fuzzers handle multi-step interactions. But real protocols require both: each message in a sequence must be grammatically valid and appropriate for the current protocol state. Combining grammar-aware mutation with state-guided exploration is an unsolved integration problem. Nautilus can generate valid individual messages but has no concept of message sequences. AFLNet handles sequences but treats message content as flat byte arrays.

Grammar-State Integration

A fuzzer that combines grammar-aware mutation (generating syntactically valid messages) with state-guided exploration (targeting interesting protocol state transitions) would represent a significant advance. This integration is technically challenging because the grammar constraints and state constraints interact: the valid grammar productions depend on the current state, and the reachable states depend on the message content.

Lack of Protocol-Specific Tools

Most stateful fuzzing research evaluates on a small set of well-known protocols (FTP, SMTP, RTSP, TLS). Fuzzing tools for newer, more complex protocols (HTTP/2 with its multiplexed streams, gRPC with bidirectional streaming, QUIC with its integrated TLS and congestion control, MQTT with its publish/subscribe model) are largely absent or require extensive custom development.

graph LR
    subgraph Current["Current Fuzzing Capability"]
        A[Stateless Input Fuzzing] --> B[Grammar-Aware Fuzzing]
    end

    subgraph Gap["Capability Gap"]
        C[State Machine Exploration] --> D[Grammar + State Integration]
        D --> E[Multi-Party Protocol Fuzzing]
    end

    B -.->|"Partial bridge"| C

    style Current fill:#0f3460,stroke:#16213e,color:#e0e0e0
    style Gap fill:#533483,stroke:#16213e,color:#e0e0e0

Opportunities

LLM-Assisted State Machine Inference

One of the most promising near-term directions is using large language models to bootstrap protocol state machine models. ChatAFL has demonstrated this approach, using ChatGPT to extract message types, valid state transitions, and field constraints from RFC documents. This significantly reduces the manual effort of building protocol models for stateful fuzzing.

LLM-Enriched Protocol Models

LLMs trained on protocol specifications (RFCs, API documentation) can infer state machines, message formats, and validity constraints that would otherwise require weeks of manual reverse engineering. Combining LLM-inferred protocol models with coverage-guided stateful fuzzing could make protocol fuzzing practical for a much wider range of targets. See LLM Integration for broader discussion of this opportunity.

Hybrid Approaches

Combining symbolic execution with stateful fuzzing could address the state space explosion problem. Symbolic execution can reason about which message sequences lead to interesting states without actually executing all possible sequences. Tools like angr and SymCC could potentially solve the path constraints needed to reach deep protocol states, generating message sequences that a pure fuzzer would take astronomically long to discover.

Snapshot-Based Stateful Fuzzing

Rather than replaying an entire message sequence for each test, snapshot-based approaches save the target's state at interesting points and resume fuzzing from those snapshots. This amortizes the cost of reaching deep states across many test cases. Combining VM-level snapshots (as in S2E) with protocol-aware fuzzing could dramatically improve deep-state coverage.

Hardware-Assisted State Tracking

Hardware performance counters and Intel Processor Trace could provide low-overhead state tracking for stateful targets. By monitoring memory access patterns and control flow at the hardware level, fuzzers could infer internal state transitions without invasive source-code instrumentation, making stateful fuzzing applicable to closed-source protocol implementations.

Implications

For Tool Builders

Stateful fuzzing represents a high-impact gap with clear demand. Network protocol implementations are a primary target for vulnerability researchers, yet the tooling is fragmented and immature. A tool that combines grammar-aware mutation, state-guided exploration, and coverage feedback (with a user-friendly protocol specification interface) would address a significant market need.

The specification burden is the key adoption barrier. Tools that require weeks of manual protocol modeling will only be used by specialists. LLM-assisted specification inference (as demonstrated by ChatAFL) and learning-based state model construction (as in StateAFL) point toward a future where the upfront cost of stateful fuzzing is dramatically lower.

For Security Researchers

Researchers should recognize that stateful bugs are systematically under-tested by current tools. When auditing protocol implementations, network services, or session-based applications, the absence of stateful fuzzing findings does not mean the absence of stateful bugs. Manual protocol analysis, combined with tools like AFLNet or Boofuzz, provides partial coverage, but deep stateful bugs remain likely in any complex protocol implementation.

For Organizations

Organizations deploying network services should consider stateful fuzzing as a distinct testing activity, separate from standard unit-test fuzzing. Synopsys Defensics provides commercial protocol testing for standardized protocols, but custom protocols and APIs require bespoke fuzzing campaigns. Investing in protocol-specific fuzzing infrastructure (even at a manual, expert-driven level) can uncover critical vulnerabilities that standard testing misses entirely.

The Protocol Security Market

As IoT, automotive, and industrial control systems proliferate, the number of custom and specialized protocols grows rapidly. Tools that can fuzz arbitrary stateful protocols with minimal manual specification represent a significant and growing market opportunity.


tags: - glossary


Glossary

Term Definition
AFL American Fuzzy Lop, coverage-guided fuzzer
ASan AddressSanitizer, memory error detector
CVE Common Vulnerabilities and Exposures
AFL++ Community-maintained successor to AFL, the de facto standard coverage-guided fuzzer
AEG Automatic Exploit Generation, automated creation of working exploits from vulnerability information
ANTLR ANother Tool for Language Recognition, parser generator used by grammar-aware fuzzers like Superion
AST Abstract Syntax Tree, tree representation of source code structure used by static analyzers
BOF Buffer Overflow, writing data beyond allocated memory bounds, a common memory safety vulnerability
CFG Control Flow Graph, directed graph representing all possible execution paths through a program
CGC Cyber Grand Challenge, DARPA competition for autonomous vulnerability detection and patching
ClusterFuzz Google's distributed fuzzing infrastructure that powers OSS-Fuzz
CodeQL GitHub's query-based static analysis engine that treats code as a queryable database
Concolic Concrete + Symbolic, execution that runs concrete values while tracking symbolic constraints
Corpus Collection of seed inputs used by a coverage-guided fuzzer as the basis for mutation
Coverity Synopsys commercial static analysis platform with deep interprocedural analysis
CPG Code Property Graph, unified representation combining AST, CFG, and data-flow graph, used by Joern
CVSS Common Vulnerability Scoring System, standard for rating vulnerability severity
CWE Common Weakness Enumeration, categorization of software weakness types
DAST Dynamic Application Security Testing, testing running applications for vulnerabilities
DBI Dynamic Binary Instrumentation, modifying program behavior at runtime without recompilation
DFG Data Flow Graph, graph representing how data values propagate through a program
DPA Differential Power Analysis, extracting cryptographic keys by analyzing power consumption variations
Frida Dynamic instrumentation toolkit for injecting scripts into running processes
Harness Glue code connecting a fuzzer to its target, defining how fuzzed input is delivered
HWASAN Hardware-assisted AddressSanitizer, ARM-based variant of ASan with lower overhead
IAST Interactive Application Security Testing, combines elements of SAST and DAST during testing
Infer Meta's open-source static analyzer based on separation logic and bi-abduction
KLEE Symbolic execution engine built on LLVM for automatic test generation
LLM Large Language Model, neural network trained on text/code, used for bug detection and code generation
LSAN LeakSanitizer, detector for memory leaks, often used alongside AddressSanitizer
Meltdown CPU vulnerability exploiting out-of-order execution to read kernel memory from user space
MITRE Non-profit organization that maintains CVE, CWE, and ATT&CK frameworks
MSan MemorySanitizer, detector for reads of uninitialized memory
NVD National Vulnerability Database, NIST-maintained repository of vulnerability data
NIST National Institute of Standards and Technology, US agency maintaining security standards and NVD
OSS-Fuzz Google's free continuous fuzzing service for open-source software
OWASP Open Worldwide Application Security Project, community producing security guides and tools
RCE Remote Code Execution, vulnerability allowing an attacker to run arbitrary code on a target system
RL Reinforcement Learning, ML paradigm where agents learn through reward-based feedback
S2E Selective Symbolic Execution, whole-system analysis platform combining QEMU with KLEE
SARIF Static Analysis Results Interchange Format, standard for exchanging static analysis findings
SAST Static Application Security Testing, analyzing source code for vulnerabilities without execution
SCA Software Composition Analysis, identifying known vulnerabilities in third-party dependencies
Seed Initial input provided to a fuzzer as the starting point for mutation
Semgrep Lightweight open-source static analysis tool using pattern-matching rules
Side-channel Attack vector exploiting physical implementation artifacts rather than algorithmic flaws
SMT Satisfiability Modulo Theories, solver used by symbolic execution to find inputs satisfying path constraints
Spectre Family of CPU vulnerabilities exploiting speculative execution to leak data across security boundaries
SQLi SQL Injection, injecting malicious SQL into queries via unsanitized user input
SSRF Server-Side Request Forgery, tricking a server into making requests to unintended destinations
SymCC Compilation-based symbolic execution tool that is 2--3 orders of magnitude faster than KLEE
Taint analysis Tracking the flow of untrusted data from sources to security-sensitive sinks
TOCTOU Time-of-Check-Time-of-Use, race condition between validating a resource and using it
TSan ThreadSanitizer, detector for data races in multithreaded programs
UAF Use-After-Free, accessing memory after it has been deallocated
UBSan UndefinedBehaviorSanitizer, detector for undefined behavior in C/C++
Valgrind Dynamic binary instrumentation framework for memory debugging and profiling
XSS Cross-Site Scripting, injecting malicious scripts into web pages viewed by other users
Fine-tuning Adapting a pre-trained ML model to a specific task using additional training data
Abstract interpretation Mathematical framework for approximating program behavior using abstract domains
Dataflow analysis Tracking how values propagate through a program to detect bugs like taint violations