I'm working on text tokenization and lemmatization using UDPipe models. I can complete the task itself by using !echo
commands or printing into a file, but I would like to generate a Python data structure to further process the output.
What works
Here is my working command:
!echo 'the text I'm processing' | ./udpipe --tokenize --tag './path/to/my/model'
Out:
Loading UDPipe model: done.
newdoc
newpar
sent_id = 1
text = прывітанне, сусвет
1 прывітанне прывітанне NOUN NN Animacy=Inan|Case=Nom|Gender=Neut|Number=Sing _ _ _ SpaceAfter=No
2 , , PUNCT PUNCT _ _ _ _ _
3 сусвет сусвет NOUN NN Animacy=Inan|Case=Nom|Gender=Masc|Number=Sing _ _ _ SpacesAfter=\n
This works for printing the output into a file:
!echo 'the text I'm processing' | ./udpipe --tokenize --tag './path/to/my/model' >> filename.txt
./udpipe
is the cloned repository of the package
What I tried (without success)
os.system()
import os
text = 'the text I'm processing'
cmd = "echo '{}' | ./udpipe --tokenize --tag './path/to/my/model'".format(text)
os.system(cmd)
Out: 0
subprocess.getoutput()
import subprocess
cmd = "'the text I'm processing' | ./udpipe --tokenize --tag './path/to/my/model'"
output = subprocess.getoutput(cmd, stdout=subprocess.PIPE, shell=True)
print(output)
TypeError: getoutput() got an unexpected keyword argument 'stdout'
You've done some research and found
subprocess
module, which is the most common way to call processes from Python. If you want to use functionality of shell (such as pipe) you need to pass argumentshell=True
to any function which actually call process, e.g.subprocess.Popen()
, the basic one.In your example you've also used
>>
to append output to file, so no output will be produced and you can just wait for process end:Or you can apply higher level function
subprocess.call()
:If you want to get process output in code, you can use another higher level function
subprocess.check_output()
:BUT! you can also use python functionality instead. For example, using
Popen()
you can pass input to process and (if needed) redirect it directly to file:Same with higher-level
check_output()
:Last option is what I'd use, but you can apply one you like most.