Python: list of strings, change color of character if found (using xlsxwriter)

2.5k views Asked by At

I have several lists I am writing to different columns/rows of an excel spreadsheet using xlsxwriter in python 2.7. For one list of strings (DNA sequences), I want to find certain characters in the string ('a','t','c','g'), change their individual colors, and then write the complete list of strings (multicolored strings, per character) to one column in the spreadsheet.

So far the code I have writen is:

row = 1
col = 1
for i in (seqs):
    worksheet.write(row,1,i,green)
    for char in i:
        if i.__contains__("A") or i.__contains__("T") :
            worksheet.write(row,1,i[char],red)
row += 1

Where seqs is my list of sequences. I want A/T to be red, and G/C to be green and the full sequence written to the spreadsheet. I'm not getting any errors, but I either write the entire sequence per row in excel in green, or one character per row in red. Is there any way to do this/get this code to work?

1

There are 1 answers

3
jmcnamara On BEST ANSWER

You can do this with XlsxWriter's write_rich_string() method.

Here is a small working example:

from xlsxwriter.workbook import Workbook

workbook = Workbook('sequences.xlsx')
worksheet = workbook.add_worksheet()

red = workbook.add_format({'color': 'red'})
green = workbook.add_format({'color': 'green'})

sequences = [
    'ACAAGATG',
    'CCATTGTC',
    'CCCCGGCC',
    'CCTGCTGC',
    'GCTGCTCT',
    'CGGGGCCA',
    'GGCCACCG',
]

worksheet.set_column('A:A', 40)

for row_num, sequence in enumerate(sequences):

    format_pairs = []

    # Get each DNA base character from the sequence.
    for base in sequence.upper():

        # Prefix each base with a format.
        if base == 'A' or base == 'T':
            format_pairs.extend((red, base))

        elif base == 'G' or base == 'C':
            format_pairs.extend((green, base))

        else:
            # Non base characters are unformatted.
            format_pairs.append(base)

    worksheet.write_rich_string(row_num, 0, *format_pairs)

workbook.close()

Output:

enter image description here