What is the correct or most robust way to tell from Python if an imported module comes from a C extension as opposed to a pure Python module? This is useful, for example, if a Python package has a module with both a pure Python implementation and a C implementation, and you want to be able to tell at runtime which one is being used.
One idea is to examine the file extension of module.__file__, but I'm not sure all the file extensions one should check for and if this approach is necessarily the most reliable.
First, I don't think this is at all useful. It's very common for modules to be pure-Python wrappers around a C extension module—or, in some cases, pure-Python wrappers around a C extension module if it's available, or a pure Python implementation if not.
For some popular third-party examples:
numpyis pure Python, even though everything important is implemented in C;bintreesis pure Python, even though its classes may all be implemented either in C or in Python depending on how you build it; etc.And this is true in most of the stdlib from 3.2 on. For example, if you just
import pickle, the implementation classes will be built in C (what you used to get fromcpicklein 2.7) in CPython, while they'll be pure-Python versions in PyPy, but either waypickleitself is pure Python.But if you do want to do this, you actually need to distinguish three things:
sys.cpickle.pickle.And that's assuming you only care about CPython; if your code runs in, say, Jython, or IronPython, the implementation could be JVM or .NET rather than native code.
You can't distinguish perfectly based on
__file__, for a number of reasons:__file__at all. (This is documented in a few places—e.g., the Types and members table in theinspectdocs.) Note that if you're using something likepy2apporcx_freeze, what counts as "built-in" may be different from a standalone installation.easy_install, less so withpip) will have either a blank or useless__file__.In 3.1+, the import process has been massively cleaned up, mostly rewritten in Python, and mostly exposed to the Python layer.
So, you can use the
importlibmodule to see the chain of loaders used to load a module, and ultimately you'll get toBuiltinImporter(builtins),ExtensionFileLoader(.so/.pyd/etc.),SourceFileLoader(.py), orSourcelessFileLoader(.pyc/.pyo).You can also see the suffixes assigned to each of the four, on the current target platform, as constants in
importlib.machinery. So, you could check that theany(pathname.endswith(suffix) for suffix in importlib.machinery.EXTENSION_SUFFIXES)), but that won't actually help in, e.g., the egg/zip case unless you've already traveled up the chain anyway.The best heuristics anyone has come up with for this are the ones implemented in the
inspectmodule, so the best thing to do is to use that.The best choice will be one or more of
getsource,getsourcefile, andgetfile; which is best depends on which heuristics you want.A built-in module will raise a
TypeErrorfor any of them.An extension module ought to return an empty string for
getsourcefile. This seems to work in all the 2.5-3.4 versions I have, but I don't have 2.4 around. Forgetsource, at least in some versions, it returns the actual bytes of the .so file, even though it should be returning an empty string or raising anIOError. (In 3.x, you will almost certainly get aUnicodeErrororSyntaxError, but you probably don't want to rely on that…)Pure Python modules may return an empty string for
getsourcefileif in an egg/zip/etc. They should always return a non-empty string forgetsourceif source is available, even inside an egg/zip/etc., but if they're sourceless bytecode (.pyc/etc.) they will return an empty string or raise an IOError.The best bet is to experiment with the version you care about on the platform(s) you care about in the distribution/setup(s) you care about.