How to combine the initial state of a row with the latest state in MySQL?

194 views Asked by At

I need to get the initial state as well as the latest state from a MySQL database. This is over two tables:

customer

id name surname dob email telephone
10 Steve Bobbly 01-01-1970 [email protected] 0123456789
15 James Bond 01-01-1950 [email protected] 0101010999

audit_log

id entity_id property old_value new_value
1 10 name John Steve
2 10 email [email protected] [email protected]
3 10 telephone 0123456789

What I expect is output like this:

id name surname dob email telephone
10 Steve Bobbly 01-01-1970 [email protected] 0123456789
10_1 John Bobbly 01-01-1970 [email protected]
15 James Bond 01-01-1950 [email protected] 0101010999

I initially had a PHP script that runs through all the customer rows, and then matches them to the audit_log rows and generate output from there, but the speed is EXTREMELY slow and resource intensive.

Would something like this be possible directly in MySQL, and how would I do it?

EDIT

I've added additional rows to the customer and the output tables. The output table needs to contain all rows in customer, as well as a copy of the initial row, built from audit_log.

4

There are 4 answers

0
ahmed On BEST ANSWER

Try the following:

SET @rn=0;
SET @cust=null;
SET @pr=null;

SELECT id, name, surname, dob, email, telephone
FROM customer 
  UNION ALL
(
  SELECT CONCAT(T.id, '_', D.rownum) id,
        COALESCE(MAX(CASE D.property WHEN 'name' THEN D.old_value END), MAX(T.name)) name,
        COALESCE(MAX(CASE D.property WHEN 'surname' THEN D.old_value END), MAX(T.surname)) surname,
        COALESCE(MAX(CASE D.property WHEN 'dob' THEN D.old_value END), MAX(T.dob)) dob,
        COALESCE(MAX(CASE D.property WHEN 'email' THEN D.old_value END), MAX(T.email)) email,
        COALESCE(MAX(CASE D.property WHEN 'telephone' THEN D.old_value END), MAX(T.telephone)) telephone
  FROM customer T
  JOIN 
  (
    SELECT id, entity_id, property, old_value, new_value, 
    IF(@cust <> entity_id OR @pr <> property, @rn:=1, @rn:=@rn+1) rownum,
    @cust:=entity_id, @pr:=property
    FROM audit_log 
    ORDER BY entity_id, property, id
  ) D
  ON T.ID = D.entity_id
  WHERE D.rownum=1
  GROUP BY T.id, D.rownum
)
ORDER BY id

See a demo.

This query simulates ROW_NUMBER() OVER (PARTITION BY entity_id, property ORDER BY id) for the audit_log table to get the initial value for each customer/ property (where rownum = 1) .

The COALESCE is used to get the value of a property from the customers table, if this property is not changed, i.e. the name is changed but the email is not, then for email get the lastest value (in this case the latest=initial, which is T.email in this query).

3
Barbaros Özhan On

Considering there's at most one update per each entity-property pairs per each entity_id value, you can use this SQL Select statement

SELECT a.entity_id, a.name, c.surname, c.dob, a.email, a.telephone
  FROM customer AS c
 CROSS JOIN ( SELECT entity_id, 
                     MAX(CASE WHEN property = 'name' THEN new_value END) AS name,
                     MAX(CASE WHEN property = 'email' THEN new_value END) AS email,
                     MAX(CASE WHEN property = 'telephone' THEN new_value END) AS telephone
                FROM audit_log 
               GROUP BY entity_id
               UNION ALL
              SELECT CONCAT(entity_id,'_1'), 
                     MAX(CASE WHEN property = 'name' THEN old_value END) AS name,
                     MAX(CASE WHEN property = 'email' THEN old_value END) AS email,
                     MAX(CASE WHEN property = 'telephone' THEN old_value END) AS telephone
                FROM audit_log 
               GROUP BY entity_id ) AS a
  WHERE a.entity_id = c.id

where changing properties which take place in audit_log table with old vs. new values are unpivoted, and the others(non-changing ones) taken from customer table

Demo

0
Steve Chambers On

Nice question! :-) Here is one way if you don't mind some repetition:

SELECT *
FROM customer
UNION ALL
SELECT
  CONCAT(id, '_1') AS id,
  COALESCE(
    (SELECT old_value
     FROM audit_log
     WHERE id = (
       SELECT MIN(id)
       FROM audit_log WHERE entity_id = c.id AND property = 'name')
    ),
    c.name)
  AS name,
  COALESCE(
    (SELECT old_value
     FROM audit_log
     WHERE id = (
       SELECT MIN(id)
       FROM audit_log WHERE entity_id = c.id AND property = 'surname')
    ),
    c.surname)
  AS surname,
  COALESCE(
    (SELECT old_value
     FROM audit_log
     WHERE id = (
       SELECT MIN(id)
       FROM audit_log WHERE entity_id = c.id AND property = 'dob')
    ),
    c.dob)
  AS dob,
  COALESCE(
    (SELECT old_value
     FROM audit_log
     WHERE id = (
       SELECT MIN(id)
       FROM audit_log WHERE entity_id = c.id AND property = 'email')
    ),
    c.email)
  AS email,
  COALESCE(
    (SELECT old_value
     FROM audit_log
     WHERE id = (
       SELECT MIN(id)
       FROM audit_log WHERE entity_id = c.id AND property = 'telephone')
    ),
    c.telephone)
  AS telephone
FROM customer c
WHERE EXISTS (SELECT * FROM audit_log WHERE entity_id = c.id)

Demo (shamelessly borrowing from the one set up by Barbaros Özhan)

Note: The above will include the _1 entries whenever there are audit entries. But just the presence of audit entries doesn't guarantee anything has changed - e.g. if the surname was changed from "Smith" to "Jones" and then back to "Smith". If this is important to you I may be able to modify it, at the expense of more complexity.

0
Rick James On

Sometimes a tough programming task is best handled by stepping back and rethinking the framework.

I would re-think the schema design. Instead of this complex query, I would have 3 tables, making the query 'trivial':

  • Original -- The values when the person is first put into the database.
  • Audit -- The blow-by-blow -- a historical record of all the changes. (Optionally like you have now. Or possibly a copy of the Current row when the change occurred.)
  • Current - The latest values.

Then the query is essentially a UNION of Original and Audit.