I have 3 file in /some/dir:
$ ls /some/dir
fiot_csv2apex_nomuratest.xml fiot_csv2apex_nomurauat.xml fiot_csv2apex_nomura.xml
I want my script to extract only the file that does NOT contain substrings "uat" or "test" in its filename.
To start off simply, I'm only trying to exclude the "uat" substring but my attempts fail.
Here is the entire script that does NOT try to exclude any of those 3 files:
#!/usr/bin/env python
import xml.etree.ElementTree as ET, sys, os, re, fnmatch
param = sys.argv[1]
client = param.split('_')[0]
market = param.split('_')[1]
suffix = param.split('_')[2]
toapex_pattern = market + '*2apex*' + client + '*' + '.xml'
files_dir = '/some/dir'
config_files = os.listdir(files_dir)
for f in config_files:
if fnmatch.fnmatch(f, toapex_pattern):
print(f)
The above script will output all the 3 files in /some/dir as expected. The script is being run like this:
python /test/scripts/regex.py nomura_fiot_b
I attempted to exclude "uat" by modifying toapex_pattern variable like this:
toapex_pattern = market + '*2apex*' + client + '(?!uat)' + '*' + '.xml':
However, after that the script did not produce any output.
I also tried this:
toapex_pattern = re.compile(market + '*2apex*' + client + '(?!uat)' + '*' + '.xml')
But this resulted in a type error:
TypeError: object of type '_sre.SRE_Pattern' has no len()
And if I try this:
toapex_pattern = market + '*2apex*' + client + '[^uat]' + '*' + '.xml'
the output is:
fiot_csv2apex_nomuratest.xml
fiot_csv2apex_nomurauat.xml
The desired output is:
fiot_csv2apex_nomura.xml
How should I modify the toapex_pattern variable to achieve the desired output?
An
fnmatchpattern is not a regular expression. Things like(?!...)won't work.Generally, exclusive patterns will not work well with
fnmatch. You can to something like thisto match any three letters that are not "uat"... but that would still mean you'd implicitly require at least 3 letters, and you could not control any further which ones.
Spare yourself the hassle, use
fnmatchto get into the general ballpark, and then use a second step to exclude things you don't want.Alternatively, use regex from the start.
Just throwing it in, you could call the script as
python /test/scripts/regex.py nomura fiot band usesys.argv[1],sys.argv[2]andsys.argv[3]directly, without having to split anything yourself first.