Getting geographic coordinates with Python and Google


Have you wondered how to get geographic coordinates using Python and Google? In this post, I’ll show you how I solved this case with only Brazilian cities and states.
GEOLOCATION
DATA SCIENCE
Author

Naomi Lago

Published

September 6, 2023

   Recently I faced a project that, in the Exploratory Data Analysis (EDA) step, I decided to plot a choropleth map that would show the population distribution by Brazilian states. In order to do so, I prior get the coordinates and follow by defining the libraries for the plot.

   Keeping in mind that there are two main processes, in this post I describe how I went through the process of getting the data by using a Google API at Google Maps Platform.


Table of Contents


  1. Environment preparation
  2. API integration
  3. Coordinates collection
  4. Data Export
  5. Data handling
  6. Conclusion
  7. References


Environment prepatation


   For this content, I’ll be using a generic dataset containing two columns: UF and MUNICIPIO that refers to state and city respectfully.

The shape is (82391, 2).


Sample with 5 entries


   We want our query to the API request to do not contain null values and, for that reason, I’ll be verifying how many are unfilled and delete them in case are a fine amount.


Handling null values


   There were only one value in each column and they were removed. Now, let’s make the API integration.


API integration


   Before starting, it’s important to make sure that the library googlemaps is installed - what I’ll be responsible for the integration. Therefore, I can run the following snippet on my terminal:


%pip install -Uqq googlemaps
Note: you may need to restart the kernel to use updated packages.


   After configuring your cloud environment at Google, grab your API key. I recommend storing it in a file .json instead of inserting directly to your code. Another option is using the environment variables on your machine. For this example, I stored in my path /credentials/api_keys.json with the following format:


JSON format for storing the API key


   By doing that, we can now make our first request. Initially, I’ll be requesting only one static defined request as for ‘São Paulo, SP’ and store the results in the latitude and longitude variables.


Sample request


   In te first lines, after importing the googlemaps library, I read the file where the key was stored and authenticate through the client. I also define the query and wait for the respons - storing in a variable called result. In order to get the proper coordinates, I can filter by getting only these attributes.

   Now that the API was tested and integrated - the core of our task - we can keep going by collecting for all the entries of the dataset.


Coordinates collection


   The idea now consists in get through each line in our entries and use city and state as origins for the queries. In order to do that, there was defined two empty lists to store the responses on latitude and longitude and the same previous logic was set in a loop; that’ll be responsible for getting cities and states of each line, create the query, make the request and return the data we need.


Running through every entry and storing the results


Note that I added a try/except just in case a request on a specific line doesn’t find the coordinates, fill with a null value and don’t stop the run - that it’ll make the code crash.


Data export


   In the end, we’ll have the coordinated stored via the two lists we defined earlier. To prevent this data to be lost and also use them in another place on another time, I’ll be exporting the results in a .json file.


Saving the results


   So, a dictionary as defined containig the two lists. We can now successfully treat our data and have this task done.


Data handling


   Now that we have a json file with the coordinates, we can initially import this file as dataframe and concatenate in our main one:


Joining the results to the main dataframe


   Now, let’s verify if there are any null value and remove them just in case. These values are coming when the request didn’t return any coordinates, falling in the exception and filling with None. We saw this here.


Handling null values from the API


   As we can see, we got 192 cases where it didn’t return any result. They were removed as the proportion is still small compared to the dataframe size.


Conclusion


   In the end, we were able to complete the task and now we have two new columns: latitude and longitude. We can use them to further geographic analyses and ploting te coordinates in maps.


Final dataframe with results


References


   Python: A programming language that allows working fastly and integrate systenms in a optimized way.

   Pandas: A tool fast, powerful, flexible and easy to use for analyses and data wrangling - open source and built on top of Python.

   NumPy: A Python library that offers a multidimensional matrix object, many derived objects and a variety of routines for matrices operations.

   Google Maps Platform: A Cloud Computing platform by Google that offers mapping services, including geolocation APIs.


   Thanks for reading, I’ll see you in the next one ⭐