Given a package of python that have certain modules, I want to find all the usages of the methods and functions defined in the package, I am thinking in something like pycharms find usages in which given a function or method it shows you all the lines in which this method/function was called.

Let stay my package have a lot of modules and I want to look for the the usages of the functions and methods defined in module_x. Using inspect and dir I can find all the callables defined in module_x

import inspect

callables = [method_name for method_name in dir(module)
             if callable(getattr(module, method_name))]

module_inspected = inspect.getmodule(module)
module_file = module_inspected.__file__

module_x_callables = []

for name, member in inspect.getmembers(module):
    # to see if the definitions are defined/imported in the member_file that we are looking    
    if name in callables: 
        module_x_callables.append(member)
        member_file = inspect.getmodule(member).__file__
        # print('{}: {},{}'.format(name, member, callable(member)))
        print('{}'.format(name))
        print('{}'.format(member))
        #        print('parent: {}'.format(inspect.getmodule(member)))
        print('member_file: {}'.format(member_file))
        if member_file == module_file:
            source, line_no = inspect.findsource(member)
            print(line_no)
        print('\n')

Note: I this that methods inside classes will not be captured by this approach, but never mind. Lets say that I want to find all the usages of functions defined in module_x.

My question is: how can I scan the other modules in the package and see if they are using any of the defs in module_x, and if they are, return me the line numbers.

I tried to use ast, walking the tree and trying to find all the ast.Call. This actually retruns me all the calls, but I don't know how to check if this returns are defined in module_x. Even more, I was thinking in something using regex but for example there could be to functions called test_func in two different modules. Using this approach, how do I know which one I am calling?

string_code = open(file,'r').read()
tree = ast.parse(string_code)
for node in ast.walk(tree):
    #print(node)
    if isinstance(node, ast.Call):
        print('call')
        print(ast.dump(node))
        print(inspect.getmodule(node))
        print(func.value)
        print(func.attr)
        print('\n')

So, to end, my question is: how can I explore a file or a module and find all the usages and the line number of functions and methods defined in module_x. Thank you ;)

1 Answers

1
Martijn Pieters On Best Solutions

You only need to care about names that were actually imported into the module you are currently inspecting. Note that there are few complications here:

  • Imported names are available from other modules to import from the current module; import foo in the module bar makes bar.foo available from the outside. So from bar import foo is really the same thing as import foo.
  • Any object can be stored in a list, a tuple, become an attribute on another object, be stored in a dictionary, assigned to an alternative name, and can be referenced dynamically. E.g. an imported attribute stored in a list, referenced by index:

    import foo
    spam = [foo.bar]
    spam[0]()
    

    calls the foo.bar object. Tracking some of these uses through AST analysis can be done, but Python is a highly dynamic language and you'll soon run into limitations. You can't know what spam[0] = random.choice([foo.bar, foo.baz]) will produce with any certainty, for example.

  • With the use of the global and nonlocal statements, nested function scopes can alter the names in parent scopes. So a contrived function like:

    def bar():
        global foo
        import foo
    

    would import the module foo and add it to the global namespace, but only when bar() is called. Tracking this is difficult, as you need to track when bar() is actually called. This could even happen outside of the current module (import weirdmodule; weirdmodule.bar()).

If you ignore those complications, and focus only on the use of the names used in import statements, then you need to track Import and ImportFrom nodes, and track scopes (so you know if a local name masks a global, or if an imported name was imported into a local scope). You then look for Name(..., Load) nodes that reference the imported names.

I've covered tracking scopes before, see Getting all the nodes from Python AST that correspond to a particular variable with a given name. For this operation we can simplify this to a stack of dictionaries (encapsulated in a collections.ChainMap() instance), and add imports:

import ast
from collections import ChainMap
from types import MappingProxyType as readonlydict


class ModuleUseCollector(ast.NodeVisitor):
    def __init__(self, modulename, package=''):
        self.modulename = modulename
        # used to resolve from ... import ... references
        self.package = package
        self.modulepackage, _, self.modulestem = modulename.rpartition('.')
        # track scope namespaces, with a mapping of imported names (bound name to original)
        # If a name references None it is used for a different purpose in that scope
        # and so masks a name in the global namespace.
        self.scopes = ChainMap()
        self.used_at = []  # list of (name, alias, line) entries

    def visit_FunctionDef(self, node):
        self.scopes = self.scopes.new_child()
        self.generic_visit(node)
        self.scopes = self.scopes.parents

    def visit_Lambda(self, node):
        # lambdas are just functions, albeit with no statements
        self.visit_Function(node)

    def visit_ClassDef(self, node):
        # class scope is a special local scope that is re-purposed to form
        # the class attributes. By using a read-only dict proxy here this code
        # we can expect an exception when a class body contains an import 
        # statement or uses names that'd mask an imported name.
        self.scopes = self.scopes.new_child(readonlydict({}))
        self.generic_visit(node)
        self.scopes = self.scopes.parents

    def visit_Import(self, node):
        self.scopes.update({
            a.asname or a.name: a.name
            for a in node.names
            if a.name == self.modulename
        })

    def visit_ImportFrom(self, node):
        # resolve relative imports; from . import <name>, from ..<name> import <name>
        source = node.module  # can be None
        if node.level:
            package = self.package
            if node.level > 1:
                # go up levels as needed
                package = '.'.join(self.package.split('.')[:-(node.level - 1)])
            source = f'{package}.{source}' if source else package
        if self.modulename == source:
            # names imported from our target module
            self.scopes.update({
                a.asname or a.name: f'{self.modulename}.{a.name}'
                for a in node.names
            })
        elif self.modulepackage and self.modulepackage == source:
            # from package import module import, where package.module is what we want
            self.scopes.update({
                a.asname or a.name: self.modulename
                for a in node.names
                if a.name == self.modulestem
            })

    def visit_Name(self, node):
        if not isinstance(node.ctx, ast.Load):
            # store or del operation, must the name is masked in the current scope
            try:
                self.scopes[node.id] = None
            except TypeError:
                # class scope, which we made read-only. These names can't mask
                # anything so just ignore these.
                pass
            return
        # find scope this name was defined in, starting at the current scope
        imported_name = self.scopes.get(node.id)
        if imported_name is None:
            return
        self.used_at.append((imported_name, node.id, node.lineno))

Now, given a module name foo.bar and the following source code file from a module in the foo package:

from .bar import name1 as namealias1
from foo import bar as modalias1

def loremipsum(dolor):
    return namealias1(dolor)

def sitamet():
    from foo.bar import consectetur

    modalias1 = 'something else'
    consectetur(modalias1)

class Adipiscing:
    def elit_nam(self):
        return modalias1.name2(self)

you can parse the above and extract all foo.bar references with:

>>> collector = ModuleUseCollector('foo.bar', 'foo')
>>> collector.visit(ast.parse(source))
>>> for name, alias, line in collector.used_at:
...     print(f'{name} ({alias}) used on line {line}')
...
foo.bar.name1 (namealias1) used on line 5
foo.bar.consectetur (consectetur) used on line 11
foo.bar (modalias1) used on line 15

Note that the modalias1 name in the sitamet scope is not seen as an actual reference to the imported module, as it is being used as a local name instead.