I am confused about this bash script line

86 views Asked by At

I am trying to convert a bash script to python for an intern project; basically, the script parses a table, and prints the information as an HTML document.

This line is confusing me. TMP is a temporary document that is the output of lsload, which outputs a table containing server host info.

# Force header text to lowercase
tr '[:upper:]' '[:lower:]' <${TMP} |head --lines=+1 |sed -e 's/[ \t]\+/ /g' >${H_TMP}

Okay, well the first tr command is converting the header text from uppercase to lowercase. I'm not really sure what the head command is doing. And I am confused as to what the sed is doing as well. Could anyone clarify what is going on in this line?

As a bonus, does anyone have ideas as to how I can convert this to Python?

EDIT: Okay, I seem to understand what sed is doing; it is converting any amount of spaces or tabs to just a single space. Just confused about head now.

1

There are 1 answers

2
tripleee On BEST ANSWER

You should be able to find the documentation for any Unix command easily by searching for its man page.

http://man7.org/linux/man-pages/man1/head.1.html

Any basic introduction to the Unix command line will also reveal that head reads the first n lines of a text file, and tail correspondingly reads the last n lines of a text file.

The entire snippet corresponds to

with open(os.environ['TMP']) as inputfile, open(os.environ['H_TMP'], 'w') as outputfile:
    for line in inputfile:
        # sed 's/[ \t]+/ /g' is re.sub(...)
        # tr ... is lower()
        line = re.sub(r'\s+', ' ' , line).lower()
        outputfile.write(line)
        # head --lines=1 -- quit after a single line
        break

The regex escape \s matches many different whitespace characters; if your input is simply ASCII, it will overlap with the simple character class [ \t]. We can only guess whether you require this to match strictly those two characters if indeed you want to handle Unicode.

For maximum compactness, you could reduce this down to

with open(os.environ['TMP']) as inputfile, open(os.environ['H_TMP'], 'w') as outputfile:
    outputfile.write(re.sub(r'\s+', ' ' , inputfile.readline()).lower())

If you want to read a fixed number of lines where that number is not 1, maybe look at enumerate():

with open(os.environ['TMP']) as inputfile, open(os.environ['H_TMP'], 'w') as outputfile:
    for lineno, line in enumerate(inputfile, 1):
        line = re.sub(r'\s+', ' ' , line).lower()
        outputfile.write(line)
        if lineno == 234:
            break