Avoid duplicates by associating to inserted records with CakePHP saveMany

1.4k views Asked by At

I am trying to take advantage of CakePHP's saveMany feature (with associated data feature), however am creating duplicate records. I think it is because the find() query is not finding authors, as the transaction has not yet been committed to the database.

This means that if there are two authors with the same username, for example, in the spreadsheet, then CakePHP will not associate the second with the first, but rather create two. I have made up some code for this post:

 * Foobar user (not in database) entered twice, whereas Existing user 
 * (in database) is associated

$spreadsheet_rows = array(
      'title' => 'New post',
      'author_username' => 'foobar',
      'content' => 'New post'
      'title' => 'Another new post',
      'author_username' => 'foobar',
      'content' => 'Another new post'
      'title' => 'Third post',
      'author_username' => 'Existing user',
      'content' => 'Third post'
      'title' => 'Fourth post', // author_id in this case would be NULL
      'content' => 'Third post'


$posts = array();

foreach ($spreadsheet_rows as $row) {

     * This query doesn't pick up the authors
     * entered automatically (see comment 2.)
     * within the db transaction by CakePHP,
     * so creates duplicate author names

    $author = $this->Author->find('first', array('conditions' => array('Author.username' => $row['author_username'])));

    $post = array(
        'title' => $row['title'],
        'content' => $row['content'],

     * Associate post to existing author

    if (!empty($author)) {
        $post['author_id'] = $author['Author']['id'];
    } else {

         * 2. CakePHP creates and automatically
         * associates new author record if author_username is not blank
         * (author_id is NULL in db if blank)

        if (!empty($ow['author_username'])) {            
             $post['Author']['username'] = $row['author_username'];

    $posts[] = $post;

$this->Post->saveMany($posts, array('deep' => true));

Is there any way that this can be achieved, while also keeping transactions?


There are 3 answers



You new requirement to save also posts that have no associated authors changes the situation a lot, as mentioned in the comments, CakePHPs model save methods are not ment to be able to save data from different models at once if it's not an association, if you need to do this in a transaction, then you'll need to handle this manually.

Save authors and their posts instead of posts and their authors

I would suggest that you save the data the other way around, that is save authors and their associated posts, that way you can easily take care of the duplicate users by simply grouping their data by using the username.

That way around CakePHP will create new authors only when neccessary, and add the appropriate foreign keys to the posts automatically.

The data should then be formatted like this:

    [0] => Array
            [username] => foobar
            [Post] => Array
                    [0] => Array
                            [title] => New post
                    [1] => Array
                            [title] => Another new post
    [1] => Array
            [id] => 1
            [Post] => Array
                    [0] => Array
                            [title] => Third post

And you would save via the Author model:

$this->Author->saveMany($data, array('deep' => true));

Store non associated posts separately and make use of transactions manually

There is no way around this if you want to use the CakePHP ORM, just imagine what the raw SQL query would need to look like if it would need to handle all that logic.

So just split this into two saves, and use DboSource::begin()/commit()/rollback() manually to wrap it all up.

An example

Here's a simple example based on your data, updated for your new requirements:

$spreadsheet_rows = array(
      'title' => 'New post',
      'author_username' => 'foobar',
      'content' => 'New post'
      'title' => 'Another new post',
      'author_username' => 'foobar',
      'content' => 'Another new post'
      'title' => 'Third post',
      'author_username' => 'Existing user',
      'content' => 'Third post'
      'title' => 'Fourth post',
      'content' => 'Fourth post'
      'title' => 'Fifth post',
      'content' => 'Fifth post'

$authors = array();
$posts = array();
foreach ($spreadsheet_rows as $row) {
    // store non-author associated posts separately
    if (!isset($row['author_username'])) {
        $posts[] = $row;
    } else {
        $username = $row['author_username'];

        // prepare an author only once per username
        if (!isset($authors[$username])) {
            $author = $this->Author->find('first', array(
                'conditions' => array(
                    'Author.username' => $row['author_username']

            // if the author already exists use its id, otherwise
            // use the username so that a new author is being created
            if (!empty($author)) {
                $authors[$username] = array(
                    'id' => $author['Author']['id']
            } else {
                $authors[$username] = array(
                    'username' => $username
            $authors[$username]['Post'] = array();

        // group posts under their respective authors
        $authors[$username]['Post'][] = array(
            'title' => $row['title'],
            'content' => $row['content'],

// convert the string (username) indices into numeric ones
$authors = Hash::extract($authors, '{s}');

// manually wrap both saves in a transaction.
// might require additional table locking as
// CakePHP issues SELECT queries in between.
// also this example requires both tables to use
// the default connection
$ds = ConnectionManager::getDataSource('default');

try {
    $result =
        $this->Author->saveMany($authors, array('deep' => true)) &&

    if ($result && $ds->commit() !== false) {
        // success, yay
    } else {
        // failure, buhu
 } catch(Exception $e) {
    // failed hard, ouch
    throw $e;
sepelin On

You need to use saveAll, which is a mix between saveMany and saveAssociated (you will need to do both of them here). Plus, you need to change the structure of each post.

Here is an example of the structures you will need to create inside the loop.

  $posts = array();

  //This is a post for a row with a new author
  $post = array (
    'Post' => array ('title' => 'My Title', 'content' => 'This is the content'),
    'Author' => array ('username' => 'new_author')
  $posts[] = $post;

  //This is a post for a row with an existing author
  $post = array (
    'Post' => array ('title' => 'My Second Title', 'content' => 'This is another content'),
    'Author' => array ('id' => 1)
  $posts[] = $post;

  //This is a post for a row with no author
  $post = array (
    'Post' => array ('title' => 'My Third Title', 'content' => 'This is one more content')
  $posts[] = $post;

  $this->Post->saveAll($posts, array ('deep' => true));

AudioBubble On

Following the "use transactions manually" bit suggested by ndm, this piece of code (written in a unit test!) seemed to do the trick:

public function testAdd() {
    $this->generate('Articles', array());


    $csv_data = array(
            'Article' => array(
                'title' => 'title'
            'Article' => array(
                'title' => 'title'
            'Author' => array(
                'name' => 'foobar'

            'Article' => array(
                'title' => 'title2'
            'Author' => array(
                'name' => 'foobar'
        /* array( */
        /*     'Article' => array( */
        /*         'title' => '' */
        /*     ), */
        /*     'Author' => array( */
        /*         'name' => '' // this breaks our validation */
        /*     ) */
        /* ), */

    $db = $this->controller->Article->getDataSource();


     * We want to inform the user of _all_ validation messages, not one at a time

    $validation_errors = array();

     * Do this by row count, so that user can look through their CSV file

    $row_count = 1;

    foreach ($csv_data as &$row) {

         * If author already exists, don't create new record, but associate to existing

        if (!empty($row['Author'])) {                
            $author = $this->controller->Author->find('first', 
                    'conditions' => array(
                        'name' => $row['Author']['name']

            if (!empty($author)) {
                $row['Author']['id'] = $author['Author']['id'];

        $this->controller->Article->saveAssociated($row, array('validate' => true));

        if (!empty($this->controller->Article->validationErrors)) {
            $validation_errors[$row_count] = $this->controller->Article->validationErrors;

    if (empty($validation_errors)) {
    } else {

