Introduction
Have you ever come across a dataset having addresses like below? How do you handle the address column? If the dataset is large and if you try to encode the address column, it will result in high cardinality issue.
As part of the feature engineering, you can handle this in many ways. For example, you can extract country, city, and area from the given addresses and use them as features. Another way is to geocode the addresses into geographical coordinates (latitude and longitudes) and use them as features.
Some of the popular packages that are used for geocoding and reverse geocoding in Python are geopy, geocoder, opencage, etc.
In this article, you will first understand what geocoding and reverse geocoding are, and then explore the geopy package to convert addresses into latitudes and longitudes and vice versa. And finally, we will see how to calculate the distance between the two addresses.
Geocoding
Geocoding is the process of converting addresses into geographic coordinates (i.e. latitude and longitude).
Reverse Geocoding
Reverse Geocoding is the process of converting geographic coordinates (latitude & longitude) into a human-readable address.
Geocoding and reverse geocoding are provided by different service providers such as OpenStreetMap, Bing, Google, AzureMaps, etc. These services provide APIs which can be used by anyone. However, each of these incurs costs for using their services and comes with limitations of their own.
Geopy
The geopy package is not geocoding service provider. It just provides an interface to connect to several services under a single package.
Below is the list of all the services that are implemented in geopy. You can use any of these geocoder services but keep in mind that each service comes with its own terms of conditions, pricing, API keys, etc. The OpenStreetMap service is free so we’ll be using the Nominatim service in this article.
If you don’t want to use geopy, then you can directly use the API provided by the above services. For example, you can use Google Geocoding API directly instead of geopy.
Installation
pip install geopy
Syntax
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="<>")
location = geolocator.geocode("<>")
Geocoding (forward geocoding) using geopy
Let’s look at an example for the address Georgia Aquarium, Atlanta, USA. First, we need to create a Nominatim geolocator object called geolocator. As mentioned earlier, you can use any other geolocator of your choice. But we will stick to Nominatim throughout this post as it is free to use without any API keys, etc.
Next, you need to pass the address for which you want to get latitude and longitude. Then the result is stored in the location object using which we can get the required details such as longitude and latitude as below.
geolocator = Nominatim(user_agent="myapp")
location = geolocator.geocode("225 Baker St NW, Atlanta, GA 30313, USA")
print(location.address)
Georgia Aquarium, 225, Baker Street Northwest, Atlanta, Fulton County, Georgia, 30313, United States
print(location.latitude, location.longitude)
(33.76326745, -84.39511726814364)
print(location.raw)
{
"boundingbox":[
"33.7623777",
"33.7643007",
"-84.3960032",
"-84.3939931"
],
"class":"tourism",
"display_name":"Georgia Aquarium, 225, Baker Street Northwest, Atlanta, Fulton County, Georgia, 30313, United States",
"importance":0.9273629297966992,
"lat":"33.76326745",
"licence":"Data © OpenStreetMap contributors, ODbL 1.0. https://osm.org/copyright",
"lon":"-84.39511726814364",
"osm_id":28912103,
"osm_type":"way",
"place_id":97669427,
"type":"aquarium"
}
Reverse Geocoding using geopy
In the above example, from the address, we got latitude and longitude. Let’s now apply reverse geocoding to get the address from the given geographic coordinates (latitude and longitude).
For reverse geocoding, you need to call the reverse() method on the geolocator object by passing latitude and longitude as parameters. Then you can get the address and a lot of other details as you can see below.
location = geolocator.reverse("33.76326745, -84.39511726814364")
print(location.address)
Georgia Aquarium, 225, Baker Street Northwest, Atlanta, Fulton County, Georgia, 30313, United States
Note that other geolocators such as Bing, Google API may require additional parameters. So, keep an eye on these additional parameters.
Geocode has other parameters such as timeout, limit, language, geometry, etc. You can explore these additional parameters to control the geocoder output.
Distance between two coordinates
Sometimes you may want to calculate the distance between two addresses given latitude and longitude. For this, geopy provides two ways to calculate the distances: geodesic distance or great-circle distance.
Let’s consider below two coordinates for calculating the distance between them. Coordinate 1 refers to Georgia Aquarium and coordinate 2 refers to Stone Mountains park.
Coordinate 1: 33.76326745, -84.39511726814364 (225 Baker St NW, Atlanta, GA 30313, USA — Georgia Aquarium)
Coordinate 2: 33.804504, -84.1587461 (1000 Robert E Lee Blvd, Stone Mountain, GA 30083, United States — Stone Mountain Park)
Distance between coordinates using the geodesic method:
from geopy import distance
georgia_aquarium = (33.76326745, -84.39511726814364)
stone_mountain = (33.804504, -84.1587461)
print(distance.distance(georgia_aquarium, stone_mountain).miles)
print(distance.distance(georgia_aquarium, stone_mountain).km)
13.896931316085105
22.36494303195367
Distance between coordinates using the great-circle method:
from geopy import distance
georgia_aquarium = (33.76326745, -84.39511726814364)
stone_mountain = (33.804504, -84.1587461)
print(distance.great_circle(georgia_aquarium, stone_mountain).miles)
print(distance.great_circle(georgia_aquarium, stone_mountain).km)
13.869732545588302
22.321170853847264
Conclusion
In this article, you have understood geocoding and reverse geocoding, how to use geopy package for geocoding and reverse geocoding, and then we saw how to calculate the distance between two coordinates. I hope you found this article useful.