I have a shared Python code base, and I'm responsible for code that others depend on. I need to move modules from one subpackage to another directory/package to reorganize it. How do I do that in the safest way?
If I just move the code, I've got to worry about others who use it who may not have redirected their imports of it. If it's moved and the users of the code don't change their imports, their code will unexpectedly fail when the imports fail.
How can I ensure a seamless transition? Do I just copy the code and leave the old code in place until the imports have been changed? Are there any caveats to be aware of? What if I use import *
in conjunction with an __all__
? In what case would I have to support imports from the old location indefinitely?
I was recently asked to move code from one subpackage to another at work, and the approach I used didn't seem immediately obvious to other developers, so I'm documenting it here for others.
I don't recommend leaving a copy at the old place. If you have two copies of the same script, one is likely to change without the other. Instead, I recommend the following multi-step process.
The first step involves two parts that can be implemented simultaneously if you control both locations for the code.
Step One: Implement the Move
First, move the file from the old place to the new place under version control. I use a simplified interface to CVS so it was a version control copy. In most other version control systems (like mercurial, subversion, and git), you should use
mv
to move the file, e.g. with git:Important:
Don't forget to move the unittests too, and also move
__init__.py
's if there's code in old ones that need to be kept. Otherwise, just make sure to commit__init__.py
's there if they're not already in place orNext, in the place of the old code, import all the names from the new location,
so in
/location/old/script.py
:and leave a comment explaining why this is needed, and commit the change to version control. If you moved the
__init__.py
, just make sure to commit a new empty__init__.py
.There's a major caveat here.
import *
is affected by__all__
. If you have an__all__
declared, you've got two approaches to providing the missing names. You can import them all explicitly:or you can delete the file and instead import the module in the
__init__.py
, and add the path to sys.modules like this:This code will initialize the package and add the module to
sys.modules
just in time for it to be looked up there by the importer. This is the same way thatos.path
is created in the Python source. Most people would shy away from modifyingsys.modules
, however. In fact, I hesitate to leave this suggestion here, and I would not if it were not in the Python standard library.These two parts can together be pushed into production, and the move has been seamlessly implemented. If you have no control over users of your code, this may need to remain in place indefinitely for backwards compatibility.
Optional: I would then delete the old script at head (just at head, don't push it yet!) so that other developers can see the change coming and address the change in a timely fashion.
Step Two: Implement the Rereferencing
If you can do a regular expression search of all code that depends on your code, I recommend searching the code for the following regular expression:
If you are on Unix (or have Cygwin) you can do a regex search for it:
Or most IDEs have regex search.
If you do have control over the code that uses it, or you have a view on others that use it, it's fairly straightforward to change the imports from the old to the new, e.g. from :
to
and from
to
And so on.
Important:
All of these changes need to be implemented and released to production. If any production installations remain without these done, if you delete the old location, they will fail.
Step Three: delete the old script in production
This is the dangerous step. If you've missed any users/importers, their code will fail until they get their import fixes into production. You may choose to postpone this step indefinitely, but my preference is to get it done in a timely fashion if I can prove all of the changes have been pushed into production.
If you deleted it at head immediately after making the change so that others could see the change coming in development, you may have less to worry about.
Still, do not delete this until you can prove that no other users are referencing the old package location in production. If you can't prove it, don't delete it.