Removing Duplicate Rows From Advantage Database Server 10.1 Table

620 views Asked by At

I have a large DBF table in ADS that may contain duplicate records. A duplicate record is identified by any record where field2, field3, field5, field4, field8 and field7 match another record. I need to identify duplicate records and delete them. To accomplish this I'm trying to write a set of records to a second table. Even though a duplicate record is identified by six fields, I need to write the entire record, 30 fields, to the second table. I have tried the following code:

insert into table2 select * from table1 where (
  field2, field3, field5, field4, field8, field7
) in (
  select field2, field3, field5, field4, field8, field7
  from table1 where field3 not like '%FOO%' and field3 not like '%BOO%'
  group by field2, field3, field5, field4, field8, field7 having count(*) > 1
)

ADS error message: ERROR IN SCRIPT: poQuery: Error 7200: AQE Error: State = 42000; NativeError = 2115; [iAnywhere Solutions][Advantage SQL Engine]Expected lexical element not found:) There was a problem parsing the WHERE clause in your SELECT statement.

Any and all help is appreciated.

2

There are 2 answers

0
DRapp On

As long as you already have the alternate table available with those columns, you should be good with...

insert into table2 
  ( field2, field3, field5, field4, field8, field7 ) 
select 
      field2, field3, field5, field4, field8, field7
   from 
      table1 
   where 
          field3 not like '%FOO%' 
      and field3 not like '%BOO%'
   group by 
      field2, 
      field3, 
      field5, 
      field4, 
      field8, 
      field7 
   having 
      count(*) > 1  )
0
reinaldo Crespo On

I think the simplest answer would be to just delete all duplicate records while keeping just the first one. Here is what I mean:

DECLARE tbl CURSOR;

//1st find duplicate records
OPEN tbl AS SELECT DISTINCT a.field1, a.field2, a.field3...
              FROM (SELECT d1.field1, d1.field2, d1.field3... 
                      FROM table1 d1 
                    HAVING COUNT(*) > 1 ) a ;

//now traverse cursor containing duplicate records
//deleting all duplicates from original table except 
//1st one

WHILE FETCH tbl DO 

    DELETE FROM table1 
     WHERE RowId > ( 
           SELECT MIN( d3.RowId )
             FROM table1 d3
            WHERE d3.field1 = tbl.field1
              AND d3.field2 = tbl.field2
              AND d3.field3 = tbl.field3 ... ) 
       AND table1.field1 = tbl.field1 
       AND table1.field2 = tbl.field2
       AND table1.field3 = tbl.field3 ;

END WHILE ;
CLOSE tbl ;

If table1 is indexed on some of these fields (field1, field2, field3...) and there aren't complicated "On Delete" triggers declared on table1 then the operation should perform pretty fast.