Loops in blocks of files, where for every 10 files, I have to do something

54 views Asked by At

I am using google colab, and I have the files in my drive like this:

M0000.csv
M0001.csv
M0002.csv
.
.
.
M0099.csv

I need to loop in 100 files, where for every 10 files, I have to do something. I need to save all text in 10 files in 1 list array to be like:

all_text[0] = list of text in file from 1 to 10
.
.
all_text[9] = list of text in file from 91 to 100

Here is my code for looping in all files (without looping for each 10--I don't know how):

dir = 'drive/My Drive/Tri/'

pd.options.display.max_colwidth = 5000
#Loop for all file
for file in sorted(glob.glob(dir + "*.csv")):
  print(f"File: {file}")
  # Check the number of columns in the file
  df = pd.read_fwf(file, header=None, on_bad_lines='skip', delimiter="\n")
  
  # Loop inside each file
  for i in range(len(df)): # Loop over the rows ('i')
      
      #code to do

  print("All  Text:", all_text)
2

There are 2 answers

0
John Gordon On BEST ANSWER

Keep track of how many files you have processed. If that number is divisible by ten, do your extra thing.

filenumber = 0
for file in sorted(glob.glob(dir + "*.csv")):
    filenumber += 1
    if filenumber % 10 == 0:
        # do your extra thing
0
TasmanGC On

Overview

If I've read your question correctly you're trying to get all the text from 10 files in one list. You're trying to do this for 100 files, meaning you'll end up with 10 lists.

Grouping Files

My first step would be to group the files into sets of ten something like this.

import numpy as np
import pandas as pd

# number of files to group
n_file = 10

# file name list which is your glob statement
fn_list = [f'dummy_file_{str(i).zfill(3)}.csv' for i in range(100)]

# put it in a numpy array to quickly group into sets of n
fn_group_of_n = np.array(fn_list).reshape(-1, n_file)

Loading and append Behaviour

Then you have sets of 10 files that you can load and append to a list. This will require you rework how the append behaviour works.

# place to store the text list for each selection of n files
text_by_n = []

# for each set of n files
for file_selection in fn_group_of_n:

    list_of_text_in_file_selection = []

    for file in file_selection:
        # your code
        df = pd.read_fwf(file, header=None, on_bad_lines='skip', delimiter="\n")

        # this you'll need to replace with something that works for your df format
        for row in df.iterrows():
            list_of_text_in_file_selection.append(row)

    text_by_n.append(list_of_text_in_file_selection)
    
    ## optional flattening
    # flat = [x for xs in list_of_text_in_file_selection for x in xs]
    # text_by_n.append(flat)

Limitations

It's hard to specify the append behaviour without an example file but this will result in a list with a selection of sublists. You may wish to flatten these lists of lists such as this approach here. I can't confirm the iterrows behaviour without knowing the structure of your data frames, but you may wish to get row.values rather than the row..