Output Bash pipes to Python-compatible format

92 views Asked by At

I'm working on text tokenization and lemmatization using UDPipe models. I can complete the task itself by using !echo commands or printing into a file, but I would like to generate a Python data structure to further process the output.

What works

Here is my working command:

!echo 'the text I'm processing' | ./udpipe --tokenize --tag './path/to/my/model'

Out:

Loading UDPipe model: done.
newdoc
newpar
sent_id = 1
text = прывітанне, сусвет
1   прывітанне  прывітанне  NOUN    NN  Animacy=Inan|Case=Nom|Gender=Neut|Number=Sing   _   _   _   SpaceAfter=No
2   ,   ,   PUNCT   PUNCT   _   _   _   _   _
3   сусвет  сусвет  NOUN    NN  Animacy=Inan|Case=Nom|Gender=Masc|Number=Sing   _   _   _   SpacesAfter=\n

This works for printing the output into a file:

!echo 'the text I'm processing' | ./udpipe --tokenize --tag './path/to/my/model' >> filename.txt

./udpipe is the cloned repository of the package

What I tried (without success)

os.system()

import os
text = 'the text I'm processing'
cmd = "echo '{}' | ./udpipe --tokenize --tag './path/to/my/model'".format(text)
os.system(cmd)

Out: 0

subprocess.getoutput()

import subprocess
cmd = "'the text I'm processing' | ./udpipe --tokenize --tag './path/to/my/model'"
output = subprocess.getoutput(cmd, stdout=subprocess.PIPE, shell=True)
print(output)

TypeError: getoutput() got an unexpected keyword argument 'stdout'
1

There are 1 answers

0
Olvin Roght On BEST ANSWER

You've done some research and found subprocess module, which is the most common way to call processes from Python. If you want to use functionality of shell (such as pipe) you need to pass argument shell=True to any function which actually call process, e.g. subprocess.Popen(), the basic one.

from subprocess import Popen, PIPE

text = "the text I'm processing"
cmd = "echo", text, "|", "./udpipe", "--tokenize", "--tag", "./path/to/my/model"
proc = Popen(cmd, stdout=PIPE, stderr=PIPE, text=True, shell=True)
output, _ = proc.communicate()
print(output)

In your example you've also used >> to append output to file, so no output will be produced and you can just wait for process end:

from subprocess import Popen

text = "the text I'm processing"
cmd = "echo", text, "|", "./udpipe", "--tokenize", "--tag", "./path/to/my/model", ">>", "filename.txt"
proc = Popen(cmd, shell=True)
proc.wait()

Or you can apply higher level function subprocess.call():

from subprocess import call

text = "the text I'm processing"
cmd = "echo", text, "|", "./udpipe", "--tokenize", "--tag", "./path/to/my/model", ">>", "filename.txt"
call(cmd, shell=True)

If you want to get process output in code, you can use another higher level function subprocess.check_output():

from subprocess import check_output

text = "the text I'm processing"
cmd = "echo", text, "|", "./udpipe", "--tokenize", "--tag", "./path/to/my/model"
output = check_output(cmd, text=True, shell=True)
print(output)

BUT! you can also use python functionality instead. For example, using Popen() you can pass input to process and (if needed) redirect it directly to file:

from subprocess import Popen, PIPE

text = "the text I'm processing"
cmd = "./udpipe", "--tokenize", "--tag", "./path/to/my/model"
proc = Popen(cmd, stdin=PIPE, stdout=PIPE, stderr=PIPE, text=True)
output, _ = proc.communicate(input=text)
print(output)
# OR write to file directly
with open("filename.txt", "a+") as out:
    proc = Popen(cmd, stdin=PIPE, stdout=out, stderr=out, text=True)
    proc.communicate(input=text)

Same with higher-level check_output():

from subprocess import check_output, STDOUT

text = "the text I'm processing"
cmd = "./udpipe", "--tokenize", "--tag", "./path/to/my/model"
output = check_output(cmd, input=text, stderr=STDOUT, text=True)
print(output)

Last option is what I'd use, but you can apply one you like most.