Working with file pointers on a csv

701 views Asked by At

I was wondering how to change the code below to read x number of lines only process the sql insert statement then continue to read the file by x number and process until end of file. I am new to the idea of file pointers but i understand it should be possible using fgets.

I'm hoping to change the below code into a function where I can pass the filename and the number of lines I want read and processed.

I currently have : (from here)

$handle = fopen(dirname(__FILE__)."/files/workorderstest.csv" , "r");

$batch++;

if ($handle) {
    $counter = 0;

    //instead of executing query one by one,
    //let us prepare 1 SQL query that will insert all values from the batch

    $sql ="INSERT INTO workorderstest(id,parentid,f1,f2,f3,f4,f5,f6,f7,f8,f9,f10) VALUES ";

    while (($line = fgets($handle)) !== false) {
       $sql .= "($line),";
       $counter++;
    }

    $sql = substr($sql, 0, strlen($sql) - 1);

    var_dump($sql);

    if ($conn->query($sql) === TRUE) {

    } else {

    }

    fclose($handle);
}

I want to keep the memory footprint to a minimum. I'm thinking this should just be a matter of keeping track of the pointer -> repeat until lines reached -> process sql -> start at pointer -> repeat until eof.

  1. Is fgets() the best to use for this?
  2. Do I need to incorperate a callback or some such to defer the sql processing until all lines are read?
  3. I'm a bit lost on where to start as I am still learning PHP.

**** Updated Answered script below if it helps someone else ...

date_default_timezone_set('Australia/Brisbane');
$date = date('m/d/Y h:i:s a', time());
$timezone = date_default_timezone_get();
$time_start = microtime(true);

$batch_size = 500; // Lines to be read per batch
$batch = 0;
$counter = 0;
$lines = 0;

$conn = new mysqli($servername, $username, $password, $dbname);

if ($conn->connect_error) {
    die("Connection failed: " . $conn->connect_error);
}

// Remove Existing Data from table
$sql = "TRUNCATE TABLE  `workorderstest`";
$conn->query($sql);

$handle = fopen(dirname(__FILE__)."/files/workorders.csv" , "r");

//instead of executing query one by one,
//let us prepare 1 SQL query that will insert all values from the batch

$sql_prefix ="INSERT INTO workorderstest(id,parentid,f1,f2,f3,f4,f5,f6,f7,f8,f9,f10) VALUES ";
$values = "";

while (($line = fgets($handle)) !== false) {
    $values .= "($line),";
    $counter++;
    $lines++;
    if ($counter == $batch_size) {
        $values = substr($values, 0, strlen($values) - 1);
        $conn->query($sql_prefix . $values) or die($conn->error);
        $counter = 0;
        $values ="";
        $batch++;
    }
}
if ($counter > 0) { // Execute the last batch
    $values = substr($values, 0, strlen($values) - 1);
    $conn->query($sql_prefix . $values) or die($conn->error);
}

// Output results
$time_end = microtime(true);
$time = $time_end - $time_start;
echo "Importing Script running at: $date <br/>";
echo "Timezone: $timezone <br/>";
echo "<br/>";
echo "Script Summary:";
echo "Time running script: " . round($time,3) . " seconds <br/>";
echo "Memory: ".memory_get_usage() . " bytes <br/>";
echo "Records Updated: $lines <br/>";
echo "Batches run: $batch <br/>";

?>
2

There are 2 answers

1
Barmar On BEST ANSWER
  1. Is fgets() the best to use for this? It's a fine way to do it. Another option is to read the entire file into an array with file(), then loop over the array with foreach().

  2. Do I need to incorperate a callback? No. Just perform the query after reading every batch of lines from the file.

  3. Where to start? When the counter reaches the batch size, perform the query. Then set the counter back to 0 and set the query string back to the initial value. Finally, at the end of the loop you'll need to perform the query with the remaining values (unless the file size was an exact multiple of the batch size, in which case there won't be anything remaining).

$batch_size = 100;
$counter = 0;

//instead of executing query one by one,
//let us prepare 1 SQL query that will insert all values from the batch

$sql_prefix ="INSERT INTO workorderstest(id,parentid,f1,f2,f3,f4,f5,f6,f7,f8,f9,f10) VALUES ";
$values = "";

while (($line = fgets($handle)) !== false) {
    $values .= "($line),";
    $counter++;
    if ($counter == $batch_size) {
        $values = substr($values, 0, strlen($values) - 1);
        $conn->query($sql_prefix . $values) or die($conn->error);
        $counter = 0;
        $values ="";
    }
}
if ($counter > 0) { // Execute the last batch
    $values = substr($values, 0, strlen($values) - 1);
    $conn->query($sql_prefix . $values) or die($conn->error);
}
2
e4c5 On

This is a bit of a reinvention of he wheel. mysql has a very fast and efficient system to load CSV data into tables. That's LOAD DATA INFILE if you have the right permissions on your user account, you can invoke the code from right inside PHP. And LOAD DATA has built in support for skipping N lines.

$path = dirname(__FILE__)."/files/workorderstest.csv";
$q = "LOAD DATA INFILE ? INTO TABLE workorderstest IGNORE ? LINES";
$stmt = $dbh->prepare($q);
$stmt->bindParam(1,"$dirname");
$stmt->bindParam(2,"$n");
$stmt->execute();

That's precious few lines of code isn't it?

Note that this code uses the IGNORE LINES keywords to skip lines in the CSV. you can also use the IGNORE keyword eg.g

LOAD DATA INFILE ? IGNORE INTO TABLE ....