I want to put some constants in one Python file and import it into another. I created two files, one with constants and one that imports it, and everything runs fine locally:
constants.py:
CONST = "hi guy"
test_constants.py:
from constants import CONST
import sys
for line in sys.stdin:
print(CONST)
local test:
$ echo "dummy" | python test_constants.py
hi guy
Test using Hive (beeline):
hive> add file hdfs://path/.../test_constants.py;
No rows affected (0.191 seconds)
hive> add file hdfs://path/.../constants.py;
No rows affected (0.049 seconds)
hive> list files;
resource
/tmp/bb09f878-7e36-4aa2-8566-a30950072bcb_resources/test_constants.py
/tmp/bb09f878-7e36-4aa2-8566-a30950072bcb_resources/constants.py
2 rows selected (0.179 seconds)
hive> with t as (select 1 as dummy)
select transform (dummy)
using 'python test_constants.py'
as dummy_out
from t;
Error: org.apache.hive.service.cli.HiveSQLException:
Error while processing statement: FAILED:
Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask.
Vertex failed, vertexName=Map 1, vertexId=vertex_1535407036047_170618_1_00, diagnostics=[Task failed, taskId=task_1535407036047_170618_1_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : attempt_1535407036047_170618_1_00_000000_0:
java.lang.RuntimeException: java.lang.RuntimeException: Hive Runtime Error while closing operators
The logs look like this:
Log Type: stderr
Log Upload Time: Mon Oct 29 15:50:42 -0700 2018
Log Length: 251
2018-10-29 15:45:16 Starting to run new task attempt: attempt_1535407036047_170618_1_00_000000_3
Traceback (most recent call last):
File "test_constants.py", line 1, in <module>
from constants import CONST
ImportError: No module named constants
Both files appear to be in the same folder, so the import seems like it should work, but it doesn't.
Added 2018-10-30:
The answer by @serge_k works, however, I initially had trouble, since the path where I had my Python UDFs was not initially available to hive. After moving all of the files into /tmp on HDFS, everything worked as expected.
hive> add file hdfs://dev/tmp/transforms;
No rows affected (0.108 seconds)
hive> list files;
resource
/tmp/61ecb363-ead6-4679-8f58-3611db9487b2_resources/transforms
1 row selected (0.202 seconds)
hive> select transform (col) using 'python transforms/test_constants.py' as dummy_out from dummy.test;
dummy_out
hi guy
hi guy
hi guy
hi guy
hi guy
hi guy
hi guy
hi guy
hi guy
hi guy
10 rows selected (63.734 seconds)
Place your python scripts in one folder, e.g.
files, add the whole folder to distributed cache and call the script aspython files/script_name.py: