Normalizing street addresses in Django/Python

14.3k views Asked by At

I have a Django form where one of the fields is a TextInput for a street address.

I want to normalize the data. For example:

>> normalize('420 East 24th St.')
'420 E. 24th Street'

>> normalize('221 Amsterdam Av')
'221 Amsterdam Ave.'

>> normalize('221 Amsterdam Avenue')
'221 Amsterdam Ave.'

Or something like that. I'm already using geopy for geocoding. Perhaps this might help?

Also: Where should I normalize? In the database model or in the clean function of the form field?

4

There are 4 answers

2
Michael Whatcott On BEST ANSWER

The most reliable way to do this is to utilize a bona-fide address verification service. Not only will it standardize (normalize) the address components according to USPS standards (see Publication 28) but you will also be certain that the address is real.

Full disclosure: I work for SmartyStreets, which provides just such a service. Here's some really simple python sample code that shows how to use our service via an HTTP GET request:

https://github.com/smartystreets/LiveAddressSamples/blob/master/python/street-address.py

1
Belmin Fernandez On

This is how I ended up addressing this (no pun intended):

### models.py ###

def normalize_address_for_display(address):

    display_address = string.capwords(address)

    # Normalize Avenue
    display_address = re.sub(r'\b(Avenue|Ave.)\b', 'Ave', display_address)

    # Normalize Street
    display_address = re.sub(r'\b(Street|St.)\b', 'St', display_address)

    # ...and other rules...

    return display_address

class Store(models.Model):

    name = models.CharField(max_length=32)
    address = models.CharField(max_length=64)
    city = models.CharField(max_length=32)
    state = models.CharField(max_length=2)
    zipcode = models.CharField(max_length=5)

    @property
    def display_address(self):
        return normalize_address_for_display(self.address)

I then use Place.display_address in templates. This allows me to keep the original user submitted data in the database without modification and just use display_address when I want a normalized display version.

Open for comments/suggestions.

1
Dolan Antenucci On

One option would be to use Geopy to lookup the address on someone like Yahoo or Google Maps, which will then return the full address of the one(s) they match it with. You may have to watch for apartment numbers being truncated off in the returned address (e.g. "221 Amsterdam Av #330" becoming "221 AMSTERDAM AVENUE"). In addition, you will also get the city/state/country information, which the user may have also abbreviated or misspelled.

In the case that there is multiple matches, you could prompt the user for feedback on which is their address. In the case of no matches, you could also let the user know, and possibly allow the address save anyway, depending on how important a valid address is, and how much trust you put in the address-lookup-providers' validity.

Regarding doing this normalization in the form vs. model, I don't know what the preferred Django-way of doing things is, but my preference is in the form, for example:

def clean(self):
    # check address via some self-defined helper function
    matches = my_helper_address_matcher(address, city, state, zip)
    if not matches:
        raise forms.ValidationError("Your address couldn't be found...")
    elif len(matches) > 1:
        # add javascript into error so the user can select 
        # the address that matches? maybe there is a cleaner way to do this
        raise forms.ValidationError('Did you mean...') 

You could throw this lookup function in the model (or some helpers.py file) in case you want to reuse it in other areas

0
png On

I have recently created a street-address python module, and its StreetAddressFormatter can be used to normalize your address.