Remove duplicates SQL while ignoring key and selecting max of specified column

Question

Remove duplicates SQL while ignoring key and selecting max of specified column

98 views Asked by JDE876 At 08 December 2014 at 22:49

I have the following sample data:

| key_id | name  | name_id | data_id |
+--------+-------+---------+---------+
|   1    | jim   |   23    |   098   |
|   2    | joe   |   24    |   098   |
|   3    | john  |   25    |   098   |
|   4    | jack  |   26    |   098   |
|   5    | jim   |   23    |   091   |
|   6    | jim   |   23    |   090   |

I have tried this query:

INSERT INTO temp_table
SELECT
DISTINCT @key_id,
name,
name_id,
@data_id FROM table1,

I am trying to dedupe a table by all fields in a row.

My desired output:

| key_id | name  | name_id | data_id |
+--------+-------+---------+---------+
|   1    | jim   |   23    |   098   |
|   2    | joe   |   24    |   098   |
|   3    | john  |   25    |   098   |
|   4    | jack  |   26    |   098   |

What I'm actually getting:

| key_id | name  | name_id | data_id  |
+--------+-------+---------+----------+
|   1    | jim   |   23    |   NULL   |
|   2    | joe   |   24    |   NULL   |
|   3    | john  |   25    |   NULL   |
|   4    | jack  |   26    |   NULL   |

I am able to dedupe the table, but I am setting the 'data_Id' value to NULL by attempting to override the field with '@'

Is there anyway to select distinct on all fields and while keeping the value for 'data_id'? I will take the highest or MAX data_id # if possible.

Original Q&A

There are 2 answers

AdamMc331 On 09 December 2014 at 15:00

If you only want one row returned for a specific value (in this case, name), one option you have is to group by that value. This seems like a good approach because you also said you wanted the largest data_id for each name, so I would suggest grouping and using the MAX() aggregate function like this:

SELECT name, name_id, MAX(data_id) AS data_id
FROM myTable
GROUP BY name, name_id;

The only thing you should be aware of is the possibility that a name occurs multiple times under different name_ids. If that is possible in your table, you could group by the name_id too, which is what I did.

Since you stated you're not interested in the key_id but only the name, I just excluded it from the query altogether to get this:

| name  | name_id | data_id |
+-------+---------+---------+
| jim   |   23    |   098   |
| joe   |   24    |   098   |
| john  |   25    |   098   |
| jack  |   26    |   098   |

Here is the SQL Fiddle example.

**JDE876** · Accepted Answer · 2014-12-09T16:34:39+00:00

JDE876 On 09 December 2014 at 16:34 BEST ANSWER

RENAME TABLE myTable to Old_mytable,
myTable2 to myTable
INSERT INTO myTable
SELECT *
FROM Old_myTable
GROUP BY name, name_id;

This groups my tables by the values I want to dedupe while still keeping structure and ignoring the 'Data_id' column

TechQA.

Remove duplicates SQL while ignoring key and selecting max of specified column

There are 2 answers

Related Questions in MYSQL

Related Questions in DISTINCT

Related Questions in DEDUPLICATION

Popular Questions

Popular Tags

Trending Questions