I have a dataset of in a particular format in CSV

key=value,key2=value,key3=value
key=value,key2=value,key4=value,key3=value

I want to convert it to:

key,key2,key3,key4
value,value,value,null
value,value,value,value

But there are array mismatches in output csv.

I need to sort the header columns, ensure that subsequent row values are in the right columns, and if a given column value is missing use the string null.

This is my coding attempt:

if (($handle = fopen("sheet4.csv", "r")) !== FALSE) {
    while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
        $cols = explode(",", $data[0]);
        $num = count($cols);
        for ($c = 0; $c < $num; $c++) {
            $colData = explode("=", $cols[$c]);
            $outputHeaders[$row][$colData[0]] = $colData[0];
            $output[$row][$colData[0]] = $colData[1];
        }
        $csvOutput[$row] = array_combine($outputHeaders[$row], $output[$row]);
        $row++;
    }
    fclose($handle);
}

foreach ($csvOutput as $row => $rowData) {
    $extractHeaders[] = array_keys($rowData);
}

$mergedHeaders = array_unique(call_user_func_array('array_merge', $extractHeaders));

$fp = fopen('sheet4out.csv', 'wa');
fputcsv($fp, $mergedHeaders);
foreach ($csvOutput as $key => $fields) {
    $rowKeys = array_keys($fields);
    if (count($mergedHeaders) == count($rowKeys)) {
    } else {
        $differntKeys = array_diff($mergedHeaders, $rowKeys);
        $fields = array_merge($fields, array_fill_keys($differntKeys, 'banana'));
    }
    fputcsv($fp, $fields);
}
2

There are 2 answers

0
Barmar On

You shouldn't call explode(',', $data[0]). fgetcsv() already exploded the line, so $data[0] is just the first field.

While you're reading the input CSV, just add each field name to the headers array. You can keep this small by calling array_unique() after each row.

When creating the output row, you can use the null coalescing operator to provide null as a default value for missing keys in a row.

$headers = [];
$rows = [];
if ($handle = fopen("sheet4.csv", "r")) {
    while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
        $row = [];
        foreach ($data as $field) {
            [$key, $value] = explode('=', $field);
            $row[$key] = $value;
        }
        $headers = array_unique(array_merge($headers, array_keys($row)));
        $rows[] = $row;
    }
    fclose($handle);
}

array_sort($headers);

if ($fp = fopen('sheet4out.csv', 'w')) {
    fputcsv($fp, $headers);
    foreach ($rows as $row) {
        $outrow = array_map(fn($field) => $row[$field] ?? 'null', $headers)
        fputcsv($fp, $outrow);
    }
}
0
mickmackusa On
  • Open the data source file in "read" mode.
  • Parse each line and build an indexed array of associative arrays from the delimited key-value pairs AND build a flat associative array of encountered keys -- this will ensure uniqueness.
  • Close the source file because it is no longer needed.
  • Open a new file in "writing" mode. w will overwrite the whole file if it already exists; a will append new data to the end if the file already exists. It doesn't make sense to use wa, use w.
  • Sort the headers (in whatever fashion you desire)
  • Declare an associative array of null values to both order the columnar data of each row as they are iterated, and ensure that all columns have a value.
  • Populate the new values while iterating, then close the file.

Code: (Simplified Demo)

$header = [];
$rows = [];
$fileHandle = fopen("sheet4.csv", "r");
while (($line = fgetcsv($fileHandle)) !== false) {
    foreach ($line as $pair) {
        [$key, $rows[$i][$key]] = explode('=', $line, 2);
        $header[$key] = $key;
    }
}
fclose($fileHandle);

sort($header);
$defaults = array_fill_keys($header, 'null');

$fileHandle = fopen('sheet4out.csv', 'w');
fputcsv($fileHandle, $header);
foreach ($rows as $row) {
    fputcsv($fileHandle, array_replace($defaults, $row));
}
fclose($fileHandle);