Use Python to automate data collection from google sheets

Bhushan Ikhar
4 min readAug 16, 2020

How many times we use google sheets for collaboration, and how many times we need to follow up with the person from whom we are collecting data. The conversation goes like this

Corporate Employee, to field person: Hey when can I expect data from you? Can you please enter that in the google sheet I shared asap?

Field person: But I have already entered it, can you please check?

In short, it is a manual task that can be automated

What we will accomplish? at the end of this article, you will be able to write a python code that will connect to the target google sheet and ingest that data to the panda's data frame. Once it is in the data frame you can write that to SQL table or can update the local excel file.

Prerequisite: Anaconda python framework, Google API key, some batch file programming knowledge

Here we go

Step 1: We need to enable google sheet API. from this API key local python file will be able to connect to google sheet using your account credentials.

step by step instruction is available on the below link. This is a one-time task and the same key will be used in other codes.

https://developers.google.com/sheets/api/quickstart/python

After completing the setup you should get a file ‘credentials.json’. Save this file to your working directory. Not that you need to run pip install on Anaconda command prompt and not Python default editor (i.e Idle). Anaconda command prompt is explained in my previous articles.

Step 2: Create a google sheet. I have created a sheet named ‘Sales’.

Assume that multiple persons will be filling this sheet simultaneously. Our code will simply run every hour and pull data from this file and write it to an excel file. We need google sheet id (which is a unique identification of a sheet, our code will recognize the target sheet from this identifier ). To get the sheet id click on the share button and click on the ‘Copy Link’ button

You should get a link like below. Bold part is the sheet id, note it down somewhere

https://docs.google.com/spreadsheets/d/ 16YPX8xMWlOuKJ2n8eH6ZzqZk12SNOF-Bw4/edit?usp=sharing

Step 3: Let the coding begin! Import all the libraries and declare your sheet id and other variables

import pandas as pd from googleapiclient.discovery import build from google_auth_oauthlib.flow import InstalledAppFlow from google.auth.transport.requests import Request import os import pickle SCOPES = ['https://www.googleapis.com/auth/spreadsheets'] # here enter the id of your google sheet SAMPLE_SPREADSHEET_ID_input = '1sZwN6hr-16YPX8xMWlOuKJ2n8eH6ZzqZk12SNOF-Bw4' SAMPLE_RANGE_NAME = 'A1:AA1000'

Now we need to define a function that will contain the authentication mechanism and behavior of the code. This is a standard function and nothing much should change. The reason being all the variables are already declared above. Also, notice the ‘credentials.json’ file is being used here which we got in Step 1.

def main(): global values_input, service creds = None if os.path.exists('token.pickle'): with open('token.pickle', 'rb') as token: creds = pickle.load(token) if not creds or not creds.valid: if creds and creds.expired and creds.refresh_token: creds.refresh(Request()) else: flow = InstalledAppFlow.from_client_secrets_file( 'credentials.json', SCOPES) # here enter the name of your downloaded JSON file creds = flow.run_local_server(port=8080) with open('token.pickle', 'wb') as token: pickle.dump(creds, token) service = build('sheets', 'v4', credentials=creds) # Call the Sheets API sheet = service.spreadsheets() result_input = sheet.values().get(spreadsheetId=SAMPLE_SPREADSHEET_ID_input, range=SAMPLE_RANGE_NAME).execute() values_input = result_input.get('values', [])

The final part is to run the function and saving the output in a data frame and consequently in an excel file

#call the function main() #save the output in dataframe. Append or update method can be defined here df=pd.DataFrame(values_input[1:], columns=values_input[0]) #write the output to an excel file. Use any path and file name df.to_csv("C:\\Users\\Desktop\\gsheet.csv")

If you run this code from within the Spyder, you should get the output excel file on the location you specified in the block above. But we want to automate this task. This is where batch programming comes into the picture.

Step 4: Here a bit of trial and error is needed since this is unknown territory for me too! But not to worry we will get through this and one more thing Mac users, you need to come up with your own mechanism!

So open notepad and type below code. Code syntax is

cmd cd C:\Users\Lenovo cmd /c C:\Users\Lenovo\Anaconda3\condabin\conda.bat run "C:\Users\Lenovo\Anaconda3\python.exe" "C:\Users\Lenovo\Documents\Python Scripts\gsheet.py" pause

save this file somewhere with .bat file extension.

Step 5: Now we need to execute this batch file on a specified interval. We will use Windows Task Scheduler (Finally I am using this utility).

Step by step process is explained in details here

https://datatofish.com/python-script-windows-scheduler/

I know this is a long article but just imagine what you achieved here. The same code can be reused in all the projects. But I guess python and google can make this easier, till then we have to follow all these steps. Let me know where you are planning to use this

Originally published at https://instinctb.com on August 16, 2020.

--

--