How to load data from Google Drive to Pandas running in Google Colaboratory

I like Google Colaboratory for multiple reasons.

First of all, the code runs on someone else’s machine so I can do something else on my laptop when the code is running, and it does not get overheated ;)

The second reason is, of course, effortless code sharing. Just click the share button, copy the link, and send it to someone else.

There is only one little problem, loading data into Colaboratory. Fortunately, you can store your dataset in Google Drive and import it in a pretty easy way.


Most of the setup part is described in the predefined code snippet that lists files in Google Drive. This part we can copy paste:

!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
import os
import pandas as pd
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

What does it do? Imports libraries that allow us to access Google Drive and allows the Google Cloud SDK to access the Google Drive of the currently logged in user. As a result, you can access your files from python code running in Colaboratory.

Would you like to help fight youth unemployment while getting mentoring experience?

Develhope is looking for tutors (part-time, freelancers) for their upcoming Data Engineer Courses.

The role of a tutor is to be the point of contact for students, guiding them throughout the 6-month learning program. The mentor supports learners through 1:1 meetings, giving feedback on assignments, and responding to messages in Discord channels—no live teaching sessions.

Expected availability: 15h/week. You can schedule the 1:1 sessions whenever you want, but the sessions must happen between 9 - 18 (9 am - 6 pm) CEST Monday-Friday.

Check out their job description.

(free advertisement, no affiliate links)

Google Drive id

Unfortunately, I could not find a way to open a file using its full path as we usually do. So if I store a file in directory data/test_dataset and call the file test.csv I cannot use path: /data/test_dataset/test.csv to access it.

Google drive uses file and directory id to identify the location. Hence, to find the id of the file I have to open data/test_dataset directory in my browser and copy the identifier from the URL.

As far as I know, it is not so easy to find the identifier of a file. To find such identifier, we must list the files in the directory:

listed = drive.ListFile({'q': "title contains 'test.csv' and '1ANnCDVS281y486EVBqm_MDadxjkelxZM' in parents"}).GetList()
for file in listed:
  print('title {}, id {}'.format(file['title'], file['id']))

The code prints names and identifiers of the files in the directory. Copy the identifier of the file you want to open. You are going to need it.

Now you have everything you need to load data from Google Drives to Pandas.

Copy data from Google Drive to Colaboratory

First of all, let’s create a local directory to store a copy of the file:

download_path = os.path.expanduser('~/data')

There is one little problem with this code. If you rerun the notebook cell that contains it, the code will fail because the file already exists. If you want to ignore such error, the code should look like this:

download_path = os.path.expanduser('~/data')
except FileExistsError:

Now we have the file id and the output directory. We can copy the file from Google Drive:

output_file = os.path.join(download_path, 'test.csv')
temp_file = drive.CreateFile({'id': 'the_file_id'})

Load the file in Pandas

Now is the time for a thing that looks familiar. Just load the file to a Pandas dataframe:

data = pd.read_csv(output_file)

Remember to share on social media!
If you like this text, please share it on Facebook/Twitter/LinkedIn/Reddit or other social media.

If you want to contact me, send me a message on LinkedIn or Twitter.

Bartosz Mikulski
Bartosz Mikulski * MLOps Engineer / data engineer * conference speaker * co-founder of Software Craft Poznan & Poznan Scala User Group

Subscribe to the newsletter and get access to my free email course on building trustworthy data pipelines.