Using ord() to convert letters to ints (Very basic)

11.5k views Asked by At

Python beginner here. Trying to learn by reading code here and there. Came across this in a program designed to open Excel files in python. This function does a simple job--converts an Excel column letter label ('Z', or 'BB', or 'CCC') to an int using ord(). I was understanding just fine until I saw this portion of the conversion code:

if clen == 1:
    return ord(column[0]) - 64
elif clen == 2:
    return ((1 + (ord(column[0]) - 65)) * 26) + (ord(column[1]) - 64)

What is the purpose of (1 + (ord(column[0]) - 65) versus just using (ord(column[0]) - 64) again. The "1 +" seems redundant. Does this have a purpose?

This is the full function:

def column_index_from_string(column, fast=False):
    """Convert a column letter into a column number (e.g. B -> 2)"""

    column = column.upper()

    clen = len(column)

    if not fast and not all('A' <= char <= 'Z' for char in column):
        msg = 'Column string must contain only characters A-Z: got %s' % column
        raise ColumnStringIndexException(msg)

    if clen == 1:
        return ord(column[0]) - 64
    elif clen == 2:
        return ((1 + (ord(column[0]) - 65)) * 26) + (ord(column[1]) - 64)
    elif clen == 3:
        return ((1 + (ord(column[0]) - 65)) * 676) + ((1 + (ord(column[1]) - 65)) * 26) + (ord(column[2]) - 64)
2

There are 2 answers

0
Emil Vikström On

No, it does not have a purpose. 1+x-65 = x-64 even in Python :-)

It may be that the original developer thought it was easier to understand what 65 means than 64. Both are magic numbers, though, and you are better off giving names to the the numbers by assigning them to variables.

0
Zeugma On

The purpose of the -65 +1 is mostly the result of a bad optimization try from the original developer. I usually use the following function to convert Excel columns into an integer value:

return reduce(lambda x,y: x*26+ord(y)-ord('A')+1, column.upper(), 0)

The interesting part is ord(y)-ord('A')+1 that gives you the key of your question. Assuming the column variable contains a valid A-Z only Excel column string, the column number is actually the shift of a given char from A char plus one. ord('A') will give you 65 as a result. The developer replaces ord('A') by its final value.

That said, yes it seems to be here an optimization to avoid calling ord, but it is really obfuscating the code and removing readability for I think few time gained. If this function is indeed a critical function that is called millions of times in the program, then this is not this code that has to be written to optimize the code - you would instead create a precalculated dictionary with all Excel column name entries in it mapped to their integer value, or something like this that would be very efficient.

Here, what has been done is a bad choice in the trade-off of performance against readability and code maintenance ; at least you would have expected a comment explaining # 65 = ord('A'), and you wouldn't have asked a question about it here.

Key point: keep the code logic, simple, readable, and easy to maintain and don't change it for bad wannabe optimizations.