My question is: what is the problem in my thinking?

Why is the left join not working as I thought?

A working answer was given (without explanation), here.

More info way down.

I have this sql fiddle displaying the problem I am facing when using left outer join

Structure

CREATE TABLE IF NOT EXISTS `mychanges` (
  `object_id` int(11) unsigned NOT NULL,
  `version_stamp` datetime DEFAULT NULL,
  `object_name` varchar(255) DEFAULT NULL,
  `project` int(11) unsigned DEFAULT NULL,
  PRIMARY KEY (`object_id` ,`version_stamp`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

    INSERT INTO mychanges VALUES (1266, "2014-09-24 09:55:40", "bbbb", 2 );
    INSERT INTO mychanges VALUES (1236, "2014-09-24 09:55:40", "aaa", 2 );
    INSERT INTO mychanges VALUES (1226, "2014-09-24 09:55:40", "zzz", 5 );
    INSERT INTO mychanges VALUES (1216, "2014-09-24 09:55:40", "xxxx", 8 );

    INSERT INTO mychanges VALUES (1256, "2014-09-24 09:51:40", "name1", 10 );
    INSERT INTO mychanges VALUES (1256, "2014-09-24 09:52:40", "name2", 10 );
    INSERT INTO mychanges VALUES (1256, "2014-09-24 09:53:40", "name3", 10 );
    INSERT INTO mychanges VALUES (1256, "2014-09-24 09:54:40", "name4", null );
    INSERT INTO mychanges VALUES (1256, "2014-09-24 09:56:40", "name5", null );

SELECT

SELECT  mychanges.object_id AS objectid1, mychanges.object_name AS objectname1, mychanges.version_stamp AS version_stamp1, change2.project as project2, change2.version_stamp as version_stamp2, change2.object_name as objectname2 FROM mychanges
LEFT JOIN (SELECT * from mychanges AS x WHERE x.project IS NULL) AS change2
ON change2.object_id = mychanges.object_id 
WHERE  mychanges.project = 10

What I want to get:

The values inserted by

INSERT INTO mychanges VALUES (1256, "2014-09-24 09:51:40", "name1", 10 );
INSERT INTO mychanges VALUES (1256, "2014-09-24 09:52:40", "name2", 10 );
INSERT INTO mychanges VALUES (1256, "2014-09-24 09:53:40", "name3", 10 );
INSERT INTO mychanges VALUES (1256, "2014-09-24 09:54:40", "name4", null );
INSERT INTO mychanges VALUES (1256, "2014-09-24 09:56:40", "name5", null );

So in written language: all records that have as project a 10 and all records that have the same object id with a record that has the project id of 10.

Why?

This is a changeset table. A user is changing an object name and hits enter. This generates entries in form of a change. These changes, can, sadly, have a project id of null. The object id is unique though.

What am I thinking?

I thought: right we have all entries with project_id NULL. Left we have all records with project_id with 10. If we join them using object_id, we get both, without duplicates. But instead we get duplicates of name4 and name5 and not name1, name2, and name3.

Why not?

1

There are 1 answers

6
hashbrown On BEST ANSWER

I thought: right we have all entries with project_id NULL. Left we have all records with project_id with 10. If we join them, we get both, without duplicates

No. You can't join that because the values NULL is not comparable with any other value, not even with NULL. If you have null, you need to handle it explicitly by using ifNULL() function.

EDIT

On second look, I realize that you are trying to join with the object_id which is of course not null. So let us see what is happening in your query. If I may break your query in 2 parts, the first part is:

SELECT  
mychanges.object_id AS objectid1, 
mychanges.object_name AS objectname1, 
mychanges.version_stamp AS version_stamp1
FROM mychanges
WHERE  mychanges.project = 10

This produces 3 records with object_id = 1256

And the 2nd part is,

SELECT * from mychanges AS x WHERE x.project IS NULL

This produces 2 records with object_id = 1256.

When you join these 2 parts using condition change2.object_id = mychanges.object_id it produces 3 X 2 = 6 records in the result. The duplicate originates as for each record in left side, there are 2 records in the right side.

This is exactly what the SQL is supposed to behave.