I have a weird probelm with os.walk in Python 3. To illustrate let's have a directory fs with the following structure:
fs
├── data1.txt
├── data2.txt
├── data3.txt
├── dir
│ ├── file1.txt
│ ├── file2.txt
│ └── file3.txt
└── sym -> dir
Test 1: Directory on local volume
Say, this directory sits under macOS on a local disk (APFS filesystem). Now I run the following Python code:
for dirpath, subdirs, filenames in os.walk('/path/to/fs'):
print(f"{dirpath=}\n{subdirs=}\n{filenames=}\n")
This returns for me:
dirpath='/path/to/fs'
subdirs=['sym', 'dir']
filenames=['data1.txt', 'data3.txt', 'data2.txt']
dirpath='/path/to/fs/dir'
subdirs=[]
filenames=['file2.txt', 'file3.txt', 'file1.txt']
Please note, sym is listed as a directory in the list subdirs.
Test 2: Directory on a mounted Samba share
Now I copy the directory fs to a Samba share (in my case a mounted volume on a Synogly NAS). I run the same Python code
for dirpath, subdirs, filenames in os.walk('/path/on/SMB/share/to/fs'):
print(f"{dirpath=}\n{subdirs=}\n{filenames=}\n")
and now I get:
dirpath='/path/on/SMB/share/to/fs'
subdirs=['dir']
filenames=['sym', 'data1.txt', 'data3.txt', 'data2.txt']
dirpath='/path/on/SMB/share/to/fs/dir'
subdirs=[]
filenames=['file2.txt', 'file3.txt', 'file1.txt']
Please note, sym is listed as a file in the list filenames.
How can I make os.walk behaving deterministically the same on both filesystem types? Do I need to go through all eleemnts in the subdirs and filenames lists, check whether an entry is a symlinks and then myself move it, so that the lists are the same on the two filesystem types?
Note: I do not want to set followlinks=True in the os.walk call, because of the danger of infinite recursion.
Additional Details
When I list the directories via ls I get the following result:
$ ls -l /path/to/fs
total 24
-rw-r--r--@ 1 user group 8 Feb 7 11:26 data1.txt
-rw-r--r--@ 1 user group 8 Feb 7 11:26 data2.txt
-rw-r--r--@ 1 user group 8 Feb 7 11:26 data3.txt
drwxr-xr-x@ 5 user group 160 Feb 7 11:28 dir
lrwxr-xr-x@ 1 user group 3 Feb 7 11:29 sym -> dir
and
ls -l /path/on/SMB/share/to/fs
total 64
-rwx------@ 1 user group 8 Feb 7 11:26 data1.txt
-rwx------@ 1 user group 8 Feb 7 11:26 data2.txt
-rwx------@ 1 user group 8 Feb 7 11:26 data3.txt
drwx------@ 1 user group 16384 Feb 7 11:28 dir
lrwx------@ 1 user group 1067 Feb 7 11:31 sym -> dir
So both filesystems identify sym as a link. Why does Python treat them differently?