Run Luigi task that depends on another task

Question

Run Luigi task that depends on another task

521 views Asked by Nikita Vlasenko At 27 November 2019 at 23:19

I have one task SeqrMTToESTask that depends on another one called SeqrVCFToMTTask. You can see the full code here:

https://github.com/macarthur-lab/hail-elasticsearch-pipelines/blob/master/luigi_pipeline/seqr_loading.py

Now, I ran the first task separately in the terminal and generated the output file - sample.mt. When I launch the second task - SeqrMTToESTask I would expect it to check the output of the first task - sample.mt and if it is present, take the file and go ahead, but it is not what is happening. Instead of that I am getting the error that signifies that certain parameters to the first task are missing, e.g.:

luigi.parameter.MissingParameterException: SeqrVCFToMTTask[args=(), kwargs={}]: requires the 'source_paths' parameter to be set

The full command that I use to run the second task is:

python -u gcloud_dataproc/submit.py --cpu-limit 4 --num-executors 1 --hail-version 0.2 
--run-locally luigi_pipeline/seqr_loading.py SeqrMTToESTask --local-scheduler 
--dest-file hdfs://.../seqr-loading-test/_SUCCESS_TO_ES --source-path hdfs://.../seqr-loading-test/sample.mt 
--spark-home $SPARK_HOME --es-host cp-nodedev1 --es-port 7890 --es-index sample_luigi

So, my question here is the following: how I should run luigi task with spark (gcloud_dataproc/submit.py just constructs the command that uses spark-submit) that depends on other task with its own required parameters?

Original Q&A

There are 1 answers

**Nikita Vlasenko** · Answer 1 · 2019-11-27T23:56:08+00:00

Apparently the right way to go was to just use luigi config file (in my case seqr-loading-local-GRCh37.cfg) file where we specify all of the parameters for all of the tasks. So, after specifying all of the parameters for the tasks I was able to run it in the following way:

LUIGI_CONFIG_PATH=luigi_pipeline/configs/seqr-loading-local-GRCh37.cfg python 
-u gcloud_dataproc/submit.py --cpu-limit 4 --num-executors 1 --hail-version 0.2 
--run-locally luigi_pipeline/seqr_loading.py SeqrMTToESTask --local-scheduler --spark-home $SPARK_HOME

TechQA.

Run Luigi task that depends on another task

There are 1 answers

Related Questions in APACHE-SPARK

Related Questions in LUIGI

Related Questions in HAIL

Popular Questions

Popular Tags

Trending Questions