find elements with same charateristics

82 views Asked by At

Supposing I have a table where a material has asignments of different characteristics. A material can have one or more charateristics. Then I would like to find to a certain material similar materials, that means at least 2 characteristics should match. In this example I should find material C when I compare with A and D should find B. Is there any solution in SQL?

material    | character
----------------------  
A           | 2
A           | 5
B           | 1
B           | 3
B           | 4
C           | 2
C           | 5
D           | 3
D           | 1
4

There are 4 answers

0
talegna On

You could use something like the following grouped table to determine all items with more than 2 similar characteristics

SELECT
    material = t1.material
    , similarMaterial = t2.material
FROM
    tableName t1
    INNER JOIN tableName t2 ON t1.character = t2.character AND NOT(t1.material = t2.material)
GROUP BY material
HAVING 
    COUNT(*) >= 2
0
Filipe Silva On

This would give you the results based on a material input:

SELECT b.material
FROM table1 a
INNER JOIN table1 b 
  ON a.character = b.character AND a.material <> b.material
WHERE a.material = 'A'  -- Your input
GROUP BY b.material
HAVING COUNT(*) > 1;

sqlfiddle demo

Or do this to give you the pairs:

SELECT a.material as LEFT_MATERIAL ,b.material AS RIGHT_MATERIAL
FROM table1 a
INNER JOIN table1 b ON a.character = b.character AND a.material <> b.material
GROUP BY a.material,b.material
HAVING COUNT(*) > 1;

sqlfiddle demo

0
max On

Yes, you can find all paired of similar materials with SQL similar to this:

SELECT c1.material, c2.material, COUNT(*) as characterCount
FROM charateristics c1
    CROSS JOIN charateristics c2
WHERE c1.material > c2.material AND c1.character = c2.character
GROUP BY c1.material, c2.material
HAVING characterCount >= 2;
4
MatBailie On

This is an Entity-Attribute-Value table, and it notoriously painful to search. (In this case, the value is implied as being TRUE for has this attribute.)

It involves comparing everything against everything, grouping the results, and checking if the groups match. Virtually no use of indexes or intelligence of any kind.

SELECT
  material_a.material   AS material_a,
  material_b.material   AS material_b
FROM
  material    AS material_a
LEFT JOIN
  material    AS material_b
    ON  material_a.character  = material_b.character
    AND material_a.material  <> material_b.material
GROUP BY
  material_a.material,
  material_b.material
HAVING
  0 = MAX(CASE WHEN material_b.character IS NULL THEN 1 ELSE 0 END)

This gives every material_b that has all of the characteristics that material_a has.
- The HAVING clause will check that every 0 of material a's characteristics are missing from material b.

Changing to an INNER JOIN and changing the HAVING CLAUSE will get the share at least two materials.

SELECT
  material_a.material   AS material_a,
  material_b.material   AS material_b
FROM
  material    AS material_a
INNER JOIN
  material    AS material_b
    ON  material_a.character  = material_b.character
    AND material_a.material  <> material_b.material
GROUP BY
  material_a.material,
  material_b.material
HAVING
  COUNT(*) >= 2

Either way, you still are joining the whole table against the whole table, then filtering out the failures. With 100 materials, that's 9,900 material-material comparison. Imagine when you have 1000 materials and have 999,000 comparisons. Or 1million materials...