Ambigious column name in Sqoop

921 views Asked by At

I am using sqoop to import data using free form query. My sqoop import looks like below:

sqoop 
--connect jdbc:mysql://mysql/employees
--username root
--password root
--target-dir "/user/Sqoop/employees/"
--delete-target-dir
--query "select e.* from employees e join dept_emp d on e.emp_no = d.emp_no and d.dept_no ='d001' where \$CONDITIONS limit 25"
--split-by emp_no

Sqoop gives me ambiguous column name error as the emp_no column specified in --split-by is present in both the tables.

If I change the --split-by column to e.emp_no, Sqoop's query to find the boundary values will fail. The sql to find boundary values is:

 SELECT MIN(emp_no), MAX(emp_no) FROM (select e.* from employees e join dept_emp d on e.emp_no = d.emp_no and d.dept_no ='d001' where  (1 = 1)  limit 25) AS t1

I applied a workaround looking at this SQL and changed my import as below:

sqoop 
--connect jdbc:mysql://mysql/employees
--username root
--password root
--target-dir "/user/Sqoop/employees/"
--delete-target-dir
--query "select t1.* from employees t1 join dept_emp d on t1.emp_no = d.emp_no and d.dept_no ='d001' where \$CONDITIONS limit 25"
--split-by t1.emp_no

This worked fine. Is there another way to achieve this OR is this a known limitation with Sqoop?

1

There are 1 answers

0
Keegan On BEST ANSWER

Nope. Prefixing the duplicated column is the only way of accomplishing this.