How can I change the case of abbreviated ordinals to lower while keeping the rest of the string in title case?

1.1k views Asked by At

I'm working on a python script to convert full uppercase addresses to Title Case. The issue I'm facing is that when I apply .title() to a string like SOUTH 16TH STREET, I get South 16Th Street. The desired conversion would be South 16th Street, where the abbreviation to the ordinal is lowercase.

What is a simple way in python to accomplish this? I was thinking about using some kind of regex.

4

There are 4 answers

0
Alex Riley On BEST ANSWER

It might be easiest to split the string into a list of separate words, capitalize each word and then join them back together:

>>> address = "SOUTH 16TH STREET"
>>> " ".join([word.capitalize() for word in address.split()])
'South 16th Street'

The capitalize() method sets the first character of a string to uppercase and the proceeding characters to lowercase. Since numbers don't have upper/lowercase forms, "16TH" and similar tokens are transformed as required.

1
pascalhein On

Use this Regex-based solution:

import re
convert = lambda s: " ".join([x.lower() if re.match("^\d+(ST|ND|RD|TH)$", x) is not None else x.title() for x in s.split()])

Basically, I split the string and see for each word if it is an ordinal, then apply the appropriate action.

0
Irshad Bhat On
>>> str_='SOUTH 16TH STREET'
>>> ' '.join([i.title() if i.isalpha() else i.lower() for i in str_.split()])
'South 16th Street'
0
Nick Seigal On

To solve your stated problem narrowly, I think you may find string.capwords() useful. It encapsulates the split -> capitalize -> join sequence into a single command.

>>> address = "SOUTH 16TH STREET"
>>> capwords(address)
'South 16th Street'

See more info on that command in Python 3.4 at...

https://docs.python.org/3.4/library/string.html#string-functions

It also exists in earlier versions of Python.

However, broadening your question to address formatting generally, you may run into trouble with this simplistic approach. More complex (e.g. regex-based) approaches may be required. Using an example from my locale:

>>> address = "Highway 99N"  # Wanting'Highway 99N'
>>> capwords(address)
'Hwy 99n'

Address parsing (and formatting) is a wicked problem due to the amount of variation in legitimate addresses as well as the different ways people will write them (abbreviations, etc.).

The pyparsing module might also be a way to go if you don't like the regex approach.