I have an analytics script that processes batches of data with similar structure, but different column names. I need to preserve the column names for later ETL scripts, but we want to do do some processing, e.g,:

results <- data.frame();
for (name in names(data[[1]])) {   
    # Start by combining each column into a single matrix
    working <- lapply(data, function(item)item[[name]]);
    working <- matrix(unlist(working), ncol = 50, byrow = TRUE);

    # Dump the data for the archive
    write.csv(working, file = paste(PATH, prefix, name, '.csv', sep = ''), row.names = FALSE);

    # Calculate the mean and SD for each year, bind to the results
    df <- data.frame(colMeans(working), colSds(working));
    names(df) <- c(paste(name, '.mean', sep = ''), paste(name, '.sd', sep = ''));

    # Combine the working df with the processing one

Per the last comment in the example, how can I combine data frames? I've tried rbind and rbind.fill but neither work and their may be 10's to 100's of different column names in the data files.

1 Answers

rjzii On

This might have been more of an issue with searching for the right keyword, but the cbind method was actually the way to go along with a matrix,

# Allocate for the number of rows needed
results <- matrix(nrow = rows)

for (name in names(data[[1]])) {   
    # Data processing

    # Append the results to the working data
    results <- cbind(results, df)   

# Drop the first placeholder column created upon allocation
results <- results[, -1];

Obviously the catch is that the columns need to have the same number of rows, but otherwise it is just a matter of appending the columns to the matrix.