How to make the symlink behaviour of Python's os.walk deterministic across different file ssytem types

37 views Asked by At

I have a weird probelm with os.walk in Python 3. To illustrate let's have a directory fs with the following structure:

fs
├── data1.txt
├── data2.txt
├── data3.txt
├── dir
│   ├── file1.txt
│   ├── file2.txt
│   └── file3.txt
└── sym -> dir
Test 1: Directory on local volume

Say, this directory sits under macOS on a local disk (APFS filesystem). Now I run the following Python code:

for dirpath, subdirs, filenames in os.walk('/path/to/fs'):
    print(f"{dirpath=}\n{subdirs=}\n{filenames=}\n") 

This returns for me:

dirpath='/path/to/fs'
subdirs=['sym', 'dir']
filenames=['data1.txt', 'data3.txt', 'data2.txt']

dirpath='/path/to/fs/dir'
subdirs=[]
filenames=['file2.txt', 'file3.txt', 'file1.txt']

Please note, sym is listed as a directory in the list subdirs.

Test 2: Directory on a mounted Samba share

Now I copy the directory fs to a Samba share (in my case a mounted volume on a Synogly NAS). I run the same Python code

for dirpath, subdirs, filenames in os.walk('/path/on/SMB/share/to/fs'):
    print(f"{dirpath=}\n{subdirs=}\n{filenames=}\n") 

and now I get:

dirpath='/path/on/SMB/share/to/fs'
subdirs=['dir']
filenames=['sym', 'data1.txt', 'data3.txt', 'data2.txt']

dirpath='/path/on/SMB/share/to/fs/dir'
subdirs=[]
filenames=['file2.txt', 'file3.txt', 'file1.txt']

Please note, sym is listed as a file in the list filenames.

How can I make os.walk behaving deterministically the same on both filesystem types? Do I need to go through all eleemnts in the subdirs and filenames lists, check whether an entry is a symlinks and then myself move it, so that the lists are the same on the two filesystem types?

Note: I do not want to set followlinks=True in the os.walk call, because of the danger of infinite recursion.

Additional Details

When I list the directories via ls I get the following result:

$ ls -l /path/to/fs
total 24
-rw-r--r--@ 1 user  group    8 Feb  7 11:26 data1.txt
-rw-r--r--@ 1 user  group    8 Feb  7 11:26 data2.txt
-rw-r--r--@ 1 user  group    8 Feb  7 11:26 data3.txt
drwxr-xr-x@ 5 user  group  160 Feb  7 11:28 dir
lrwxr-xr-x@ 1 user  group    3 Feb  7 11:29 sym -> dir

and

ls -l /path/on/SMB/share/to/fs
total 64
-rwx------@ 1 user  group      8 Feb  7 11:26 data1.txt
-rwx------@ 1 user  group      8 Feb  7 11:26 data2.txt
-rwx------@ 1 user  group      8 Feb  7 11:26 data3.txt
drwx------@ 1 user  group  16384 Feb  7 11:28 dir
lrwx------@ 1 user  group   1067 Feb  7 11:31 sym -> dir

So both filesystems identify sym as a link. Why does Python treat them differently?

0

There are 0 answers