I am using sqoop to import data using free form query. My sqoop import looks like below:
sqoop
--connect jdbc:mysql://mysql/employees
--username root
--password root
--target-dir "/user/Sqoop/employees/"
--delete-target-dir
--query "select e.* from employees e join dept_emp d on e.emp_no = d.emp_no and d.dept_no ='d001' where \$CONDITIONS limit 25"
--split-by emp_no
Sqoop gives me ambiguous column name error as the emp_no
column specified in --split-by
is present in both the tables.
If I change the --split-by
column to e.emp_no
, Sqoop's query to find the boundary values will fail. The sql to find boundary values is:
SELECT MIN(emp_no), MAX(emp_no) FROM (select e.* from employees e join dept_emp d on e.emp_no = d.emp_no and d.dept_no ='d001' where (1 = 1) limit 25) AS t1
I applied a workaround looking at this SQL and changed my import as below:
sqoop
--connect jdbc:mysql://mysql/employees
--username root
--password root
--target-dir "/user/Sqoop/employees/"
--delete-target-dir
--query "select t1.* from employees t1 join dept_emp d on t1.emp_no = d.emp_no and d.dept_no ='d001' where \$CONDITIONS limit 25"
--split-by t1.emp_no
This worked fine. Is there another way to achieve this OR is this a known limitation with Sqoop?
Nope. Prefixing the duplicated column is the only way of accomplishing this.