Parse Table data from a public google doc using Python

Question

I have a URL to a public google doc which is published (It says published using Google Docs at the top). It has a URL in the form of https://docs.google.com/document/d/e/<Some long random string, I think the ID of the document>/pub

Please note that this is not a spreadsheet (Google sheet), but a doc. This doc contains some explanatory text at the beginning and then a table I need to read. How do I accomplish this using Python and only the URL? I don't have much knowledge of Google APIs, etc. I don't want the text at the beginning, but only the table data in some popular format like a Pandas dataframe, etc. The table data could also contain Unicode characters.

I tried following some steps in the Docs API quickstart guide (https://developers.google.com/docs/api/quickstart/python). After I followed the instructions, the given code (copy-pasted as it is) worked. Still, it involved some steps about creating a new Google project, enabling the API, configuring the OAuth screen and then authorizing credentials for a desktop application. However, when I replaced the example document ID (the string inside the quotes

DOCUMENT_ID = "195j9eDD3ccgjQRttHhJPymLJUCOUjs-jmwTrekvdjFE")

with the ID of the document I need to access, I got this error:

<HttpError 404 when requesting https://docs.googleapis.com/v1/documents/<MY_GIVEN_DOCUMENT_ID>?alt=json returned "Requested entity was not found.". Details: "Requested entity was not found.">

I just want a simple solution which uses only the published doc's URL, since the doc is already public. I don't want to go through some authentication steps. I need that even if I send the code to someone else, they can also run the same code and get the same results without any authentication issues. Please help me with this.

Please edit your question and include your code

Linda Lawton - DaImTo
– Linda Lawton - DaImTo

2024-08-24 16:09:35 +00:00
Commented Aug 24, 2024 at 16:09 — Linda Lawton - DaImTo
– Linda Lawton - DaImTo, Commented Aug 24, 2024 at 16:09

Sam · Accepted Answer · 2024-11-22 21:14:43Z

5

I was faced with this same exact problem. I'm going to guess you and I were probably doing the same application challenge!

Using requests, I was able to pull down the raw HTML response from calling the page, then using BeautifulSoup I was able to turn it into a workable, parse-able object:

# Make request
html_response = requests.get(url=url)

# Parse html into a BeautifulSoup object
soup = BeautifulSoup(html_response.text, 'html.parser')

# Collect and return the first table (assuming the first table is what you want)
return soup.find('table')

From there, you can parse the table more precisely to pull out the data you want. Here are a couple examples of how you can work with a BeautifulSoup table to get what you need:

I'm refraining from copy-pasting my exact solution because I know others will use this to fill out the same job application challenge, but this gets you everything you need as long as you have a Python foundation.

answered Nov 22, 2024 at 21:14

Sam

1333 silver badges6 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Sam Dec 1, 2024 at 2:10

The same question was also asked here: stackoverflow.com/questions/78832288/… and someone else had a solution you could explore too.

Scott Fraley Jan 14 at 22:55

When you were working on solving this problem, did you find that the input data / table was wrong and/or missing some characters? Because unless I've had a stroke, I'm pretty darn sure mine is. : (

Sam Jan 16 at 15:29

@ScottFraley no, actually, when I ran the script with the problem data I got a working answer. But that doesn't mean yours isn't messed up!

Scott Fraley Jan 17 at 20:56

The "test" data was definitely borked, but when I ran my script against the "final/actual Url," it worked great! :D

Sam Jan 18 at 21:18

Nice! Glad to hear it.

Collectives™ on Stack Overflow

Parse Table data from a public google doc using Python

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related