Ghidra-Stubs Mypy Python Type Stubs Problems Discussion
Introduction
Ghidra, a powerful reverse engineering framework developed by the National Security Agency (NSA), includes a feature to generate Python type stubs (ghidra-stubs
) for its Java-based API. These type stubs are designed to enhance the development experience by providing static type checking using MyPy, a popular Python type checker. However, some users have encountered issues where valid scripts fail type checking due to problems with the generated MyPy type stubs. This article delves into these problems, providing examples and potential solutions to address these challenges in the Ghidra environment using the Ghidra-stubs package and Mypy for type checking.
Bug Description
The core issue is that certain valid Python scripts, which should ideally pass type checking when using MyPy with the --strict
flag, fail due to discrepancies in the generated type stubs. This can lead to confusion and hinder the development process, as developers rely on type checking to catch errors early.
Reproducing the Bug
To reproduce the bug, one needs to run specific example scripts through MyPy with the --strict
flag. These scripts utilize Ghidra APIs, and the expectation is that they should pass type checking without errors. However, due to issues in the type stubs, errors are flagged, indicating a mismatch between the expected types and the actual types.
Expected Behavior
The expected behavior is that when running MyPy with the --strict
flag on valid Ghidra Python scripts, the type checker should pass without any errors. This ensures that the scripts are type-safe and that the developer can rely on the type system to catch potential issues.
Environment
The issues have been observed in the following environment:
- OS: Ubuntu 22.04.5 LTS
- Java Version: openjdk 21.0.7 2025-04-15
- Ghidra Version: 11.4
- Ghidra Origin: GitHub Releases
- Python: 3.8.20
- MyPy: 1.14.1
- Ghidra-Stubs: 11.4
- Ghidra-Stubs Origin: ghidra_11.4_PUBLIC/docs/ghidra_stubs/ghidra_stubs-11.4-py3-none-any.whl
Additional Context
While a similar issue was reported in Ghidra Issue #8018, it was deemed not closely related enough to warrant commenting there. This highlights the need for a separate discussion to address the specific problems encountered with the generated MyPy type stubs.
Examples of Type Checking Issues
To illustrate the problems, let's examine a few examples where type checking fails despite the code being valid.
1. No Typed Parameters and Missing None
Return Type
Consider the following Python script:
import ghidra.features.bsim.query.protocol as ghprotocol
insertreq = ghprotocol.InsertRequest()
When this script is run through MyPy, it results in the following error:
test.py:3: error: Call to untyped function "InsertRequest" in typed context [no-untyped-call]
This error occurs even though a type stub for InsertRequest
exists:
class InsertRequest(BSimQuery[ResponseInsert]):
def __init__(self):
...
The issue arises because the __init__
method has no typed parameters and no return type, causing MyPy to treat it as an untyped function. According to resources like Stack Overflow and another Stack Overflow answer, explicitly specifying the return type as None
can resolve this. By changing the __init__
method to:
class InsertRequest(BSimQuery[ResponseInsert]):
def __init__(self) -> None:
...
the problem is fixed. The type stub generation code intentionally skips generating return types for void
functions, which may be due to version differences with MyPy or an oversight in the generation process. This example highlights the importance of explicitly defining return types, especially for methods like __init__
that don't return a value.
2. Overloading Inherited Methods
Another common issue arises with overloaded inherited methods. According to MyPy issue #5146, automatic inheritance of overloaded signatures is not fully supported. This means that signatures from the superclass need to be repeated in the subclass alongside the added overload and marked with @typing.overload
. The Ghidra type stub generation code, however, only includes the added overloads and doesn't repeat the inherited ones, leading to type checking errors.
Consider the following script:
import __main__
import ghidra.app.cmd.function as ghfunction
import ghidra.program.model.symbol as ghsymbol
import ghidra.program.model.address as ghaddress
cmd = ghfunction.ApplyFunctionDataTypesCmd(
list(), ghaddress.AddressSet(), ghsymbol.SourceType.USER_DEFINED, False, False
)
cmd.applyTo(__main__.currentProgram)
This script produces the following error:
test.py:9: error: Missing positional argument "monitor" in call to "applyTo" of "BackgroundCommand" [call-arg]
Despite the existence of the following type stubs:
class ApplyFunctionDataTypesCmd(ghidra.framework.cmd.BackgroundCommand[ghidra.program.model.listing.Program]):
...
class BackgroundCommand(Command[T], typing.Generic[T]):
def applyTo(self, obj: T, monitor: ghidra.util.task.TaskMonitor) -> bool:
...
class Command(java.lang.Object, typing.Generic[T]):
def applyTo(self, obj: T) -> bool:
...
MyPy fails to recognize the applyTo
method from Command
that only takes one argument on the ApplyFunctionDataTypesCmd
class. This is because BackgroundCommand
defines a new applyTo
method, which replaces the inherited one. To fix this, the definition of BackgroundCommand
should be changed to:
class BackgroundCommand(Command[T], typing.Generic[T]):
@typing.overload
def applyTo(self, obj: T) -> bool:
...
@typing.overload
def applyTo(self, obj: T, monitor: ghidra.util.task.TaskMonitor) -> bool:
...
By repeating the signature of applyTo
inherited from Command
and using the @typing.overload
decorator, MyPy can correctly resolve the method signatures, thus addressing the error. This underscores the necessity of explicitly handling overloaded methods in type stubs to ensure accurate type checking.
3. Special Cases
There are also several special cases where functions don't entirely align with runtime behavior or don't interact smoothly with Jython conversions. These cases require specific attention and might not have a one-size-fits-all solution.
For example, the default <E extends Exception> void withTransaction(String description, ExceptionalCallback<E> callback) throws E
method on ghidra.framework.model.DomainObject
produces a MyPy error when a Python function is passed to the callback parameter. However, this works perfectly fine at runtime. The mismatch could be due to MyPy's limitations in specifying type coercions. Possible solutions include generating more overloads for Python types or incorporating Java type stubs to provide a more comprehensive type representation.
Another instance involves the returnCommit
parameter of the public static void commitParamsToDatabase(HighFunction highFunction, boolean useDataTypes, HighFunctionDBUtil.ReturnCommitOption returnCommit, SourceType source) throws DuplicateNameException, InvalidInputException
method on ghidra.program.model.pcode.HighFunctionDBUtil
. The documentation suggests that returnCommit
is optional, but it's not marked as such in the type stubs. Developers often pass None
to these seemingly non-optional optional parameters without encountering runtime issues. This category of problems is challenging to address automatically through stub generation, as the optional nature of parameters isn't always encoded in Java types. Since any class type parameter can hypothetically accept null
, distinguishing between truly optional and non-optional parameters requires a more nuanced approach.
Addressing the Issues
To address these type checking issues, several strategies can be employed. First, the type stub generation code should be modified to explicitly include -> None
for methods like __init__
that do not return a value. This ensures that MyPy correctly interprets these methods as typed functions.
Second, the generation code needs to handle overloaded methods more effectively. This involves repeating inherited signatures in subclasses and marking them with @typing.overload
. By doing so, MyPy can accurately resolve method calls and avoid false positives during type checking.
Finally, special cases require individual attention. Generating more overloads for Python types or integrating Java type stubs may resolve some issues. For parameters that are optional but not marked as such in the Java code, manual adjustments to the type stubs might be necessary. This could involve adding Optional[...]
types or providing default values in the function signatures.
Conclusion
While Ghidra's generated MyPy type stubs significantly enhance the development experience, there are still areas for improvement. Addressing issues related to untyped functions, overloaded methods, and special cases will lead to more accurate type checking and a smoother development workflow. By implementing the suggested solutions, developers can leverage the full power of MyPy to catch errors early and write more robust Ghidra Python scripts. The ongoing effort to refine the Ghidra-stubs package is crucial for the continued success of Ghidra as a leading reverse engineering tool. By focusing on these improvements, the Ghidra community can ensure that the type stubs remain a valuable asset for developers using Python with Ghidra.