How to fine Tune Apache Sqoop on Python to run heavy loads?

29 views Asked by At

I am looking for ways to finetune sqoop when used on python so it can handle heavy loads.

So far I have considered num_mappers, split_by, batch and increasing memory on map using something like mapreduce.map.memory.mb.

While the first three are straight forward to define I can not find any documentation or examples on how to increase the memory on the map size. My code looks like the below. Could you advise if this is the correct way to do it?

from airflow.models import DAG
from airflow.contrib.operators.sqoop_operator import SqoopOperator
from airflow.utils.dates import days_ago


Dag_Sqoop_Import = DAG(dag_id="SqoopImport",
                      schedule_interval="* * * * *",
                      start_date=days_ago(2))

sqoop_mysql_import = SqoopOperator(conn_id="sqoop_local",
                                  table="shipmethod",
                                  cmd_type="import",
                                  num_mappers = 8,
                                  target_dir="/airflow_sqoopImport",
                                  driver="com.mysql.jdbc.Driver",
                                  num_mappers=1,
                                  task_id="SQOOP_Import",
                                  properties = {"mapreduce.map.memory.mb":1000}
                                  dag=Dag_Sqoop_Import)

sqoop_mysql_import

0

There are 0 answers