I have a .txt file that contains 100 rows and 500 columns. Each row has a single integer value for each column. (ie. 1 0 1 2 0 2 1 0...).

For each row, I want to compare each column to every other column in that row and generate lists of tuples for each comparison. This has been accomplished with the code below.

However, I now need to eliminate certain comparisons. Specifically, each of the 100 columns represents a set, and I need to exclude comparisons within a set. For instance, I need to exclude tuples for (column1, column2), (column1, column3), etc. but not comparisons such as (column1, column101).

The current code works by comparing the first column to every other column, then the second column to every other column excluding the first, then the third column to every other column excluding the first and second, and so on.

I could get to my answer by figuring out which tuples to remove via index (I know, say, that the first 99 tuples would need to be removed because it's the first column compared to the other first 100 columns), but this is tedious and I know there should be another way.

Is there an easy solution to this problem maintaining the structure I have currently?

import itertools as it
import csv
import sys

csv.register_dialect('tab_delim', delimiter="\t", quoting=csv.QUOTE_NONE)

file_name = sys.argv[1] 
#number_sequenced is the number of rows to include
number_sequenced = int(sys.argv[2]) #e.g. 100

# function to enumerate rows

def read_lines(csv_reader, row_list):
    for row_number, row in enumerate(csv_reader):
        if row_number in row_list:
            yield row_number, row

# Read in specified number of rows     

with open(file_name, 'r') as File:
    reader = csv.reader(File, dialect='tab_delim')
    r = list(range(0, number_sequenced))

# Generate tuples of all pairwise window combinations and add to master list

    comparisons = []
    for row_number, row in read_lines(reader, r):
        row_tuples = list(it.combinations(row, 2))

0 Answers