Determining if a given Python module is a built-in module

10.1k views Asked by At

I am doing some parsing and introspection of various modules, but I don't want to parse built-in modules. Now, there is no special type for built-in modules like there is a types.BuiltinFunctionType, so how do I do this?

>>> import CornedBeef
>>> CornedBeef
<module 'CornedBeef' from '/meatish/CornedBeef.pyc'>
>>> CornedBeef.__file__
'/meatish/CornedBeef.pyc'
>>> del CornedBeef.__file__
>>> CornedBeef
<module 'CornedBeef' (built-in)>

According to Python, a module is apparently built-in if it doesn't have a __file__ attribute. Does this mean that hasattr(SomeModule, '__file__') is the way to check if a module is built in? Surely, it isn't exactly common to del SomeModule.__file__, but is there a more solid way to determine if a module is built-in?

4

There are 4 answers

2
jfs On BEST ANSWER

sys.builtin_module_names

A tuple of strings giving the names of all modules that are compiled into this Python interpreter. (This information is not available in any other way — modules.keys() only lists the imported modules.)

0
Ned Batchelder On

When you say, "built-in," do you mean, written in C, or do you mean, part of the standard library? If you mean the first, then looking for __file__ is the right thing to do. As you can see, even the Python interpreter uses the presence of __file__ as an indicator of built-in-ness.

If you mean "part of the standard library," then it is very hard to determine.

0
shang On

You can use imp.is_builtin to see if a module name matches a built-in module, but I can't think of any way to actually introspect a module object reliably.

You might also try the following:

>>> import imp
>>> f, path, desc = imp.find_module("sys")
>>> desc
('', '', 6)
>>> desc[2] == imp.C_BUILTIN
True
0
mateor On

If you consider it simply as asked, builtins, then the accepted answer is obviously correct.

In my case, I was looking for the standard library as well, by which I mean a list of all importable modules shipped with a given Python distribution. Questions about this have been asked several times but I couldn't find an answer that included everything I was looking for.

My use case was bucketing an arbitrary x in a Python import x statement as either:

  • included in the Python stdlib + built-ins
  • installed as a third party module
  • neither

This will work for virtualenvs or a global install. It queries the distribution of whatever python binary is running the script. The final chunk does reaches out of a virtualenv, but I consider that the desired behavior.

# You may need to use setuptools.distutils depending on Python distribution (from setuptools import distutils)
import distutils
import glob
import os
import pkgutil
import sys    

def get_python_library():

    # Get list of the loaded source modules on sys.path.
    modules = { 
        module
        for _, module, package in list(pkgutil.iter_modules())
        if package is False
    }

    # Glob all the 'top_level.txt' files installed under site-packages.
    site_packages = glob.iglob(os.path.join(os.path.dirname(os.__file__) 
                    + '/site-packages', '*-info', 'top_level.txt'))

    # Read the files for the import names and remove them from the modules list.
    modules -= {open(txt).read().strip() for txt in site_packages}

    # Get the system packages.
    system_modules = set(sys.builtin_module_names)

    # Get the just the top-level packages from the python install.
    python_root = distutils.sysconfig.get_python_lib(standard_lib=True)
    _, top_level_libs, _ = list(os.walk(python_root))[0]

    return sorted(top_level_libs + list(modules | system_modules))

Returns

A sorted list of imports: [..., 'imaplib', 'imghdr', 'imp', 'importlib', 'imputil', 'inspect', 'io', ...]

Explanation:

I broke it up into chunks so the reason each group is needed can be clear.

  • modules

    • The pkgutil.iter_modules call scans all loaded modules on sys.path and returns a generator of (module_loader, name, ispkg) tuples.
    • I turn it into a set and filter out packages, since here we care only about the source modules.
  • site_packages

    • Get a list of all installed packages under the conventional site-packages directory and remove them from the modules list. This roughly corresponds to the third party deps.
    • This was the hardest part to get right. Many things almost worked, like pip.get_installed_distributions or site. But pip returns the module names as they are on PyPi, not as they are when imported into a source file. Certain pathological packages would slip through the cracks, like:
      • requests-futures which is imported as requests_futures.
      • colors, which is actually ansicolors on PyPi and thus confounds any reasonable heuristic.
    • I am sure that there are certain low-usage modules that do not include the top_level.txt in their package. But this covered 100% of my use cases seems to work on everything that is correctly configured.
  • system_modules

    • If you don't explicitly ask for them, you won't get these system modules, like sys, gc, errno and some other optional modules.
  • top_level_libs

    • The distutils.sysconfig.get_python_lib(standard_lib=True) call returns the top-level directory of the platform independent standard library.
    • These are easy to miss because they might not live under the same python path as the other modules. If you are on OSX and running a virtualenv, these modules will actually be imported from the system install. These modules include email, logging, xml and a few more.

Conclusion

For my 2013 MacBookPro I found 403 modules for the python2.7 install.

   >>> print(sys.version)
   2.7.10 (default, Jul 13 2015, 12:05:58)
   [GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53)]
   >>> print(sys.hexversion)
   34015984
   >>> python_stdlib = get_python_libirary()
   >>> len(python_stdlib)
   403

I put up a gist of the code and output. If you think I am missing a class or have included a bogus module, I would like to hear about it.

* Alternatives

  • In writing this post I dug around the pip and setuptools API. It is possible that this information through a single module but you would really need to know your way around that API.

  • Before I started this, I was told that six has a function specifically for this problem. It makes sense that might exist but I couldn't find it myself.