Unfolding the universe of possibilities..

Every load time is a step closer to discovery.

How to Read and Write Data from/to the Quip Spreadsheet using Quip Python APIs

We analysts are often asked to provide a solution that can enable the end-user to provide their inputs that can then be used as overrides/additional context in the final analytical solution.

Photo by Chris Ried on Unsplash

Let’s take an example of an e-commerce shopping app. They have a logic in the system that will blacklist a supplier once they get 100 negative customer ratings. Now, there may be scenarios when the negative rating was due to the in-app experience or delivery/return experience. So, to maintain fairness, the supplier is given the option to contest the blacklisting once in six months. For the sake of this article, let’s assume the approval/rejection is recorded in an offline spreadsheet.

All new appeals to retract black listings created in a week are exported to a spreadsheet and sent out to the team for review. The team reviews the data and approves or rejects it. They then send the data back to be updated in the system. This is a weekly exercise.

Now, this manual intervention data needs to be added to the system. There are various ways to do it. The individual users can upload their data to an s3 bucket which can then be scheduled to be read into the database. Or we can use quip so that all individuals can update the same spreadsheet run-time and this can be uploaded in the database on a fixed cadence.

Quip is a collaborative software that lets multiple people edit documents, spreadsheets on-the go allowing them freedom to use any end client — desktop/mobile.

In this article, I am going to show you how I automated a quip spreadsheet to read the user-entered data, upload it to a database table, and then write back to the same spreadsheet with new data. I am going to use Redshift as the database for this exercise.

There are two separate parts this task can be split into. First, read the data from the Quip spreadsheet and store it in a table in the database. Second, we would be doing some data manipulation or checks on this and joining with pre-existing data in the database and then writing that manipulated data to an already existing quip spreadsheet. We will look at these two cases individually so that in case you only want to read or only write then this article will help you in doing that as well. Let’s look at the first part.

Part 1 — Reading the data from Quip Spreadsheet and writing to a table in the database.

Step 1: Getting access token to connect to Quip using Quip APIs.

We need to generate an access token that provides API access to our personal Quip account. To generate a personal access token, visit this page: https://quip.com/dev/token. If you have a corporate SSO-enabled quip account, then the URL will slightly differ like — https://quip-corporate.com/dev/token

Once you click on the Get Personal Access Token button above, you will get a token which we will use in later sections to access the quip spreadsheet using the APIs.

Step 2: Import the libraries

Let’s import the required libraries first. For this part, we are primarily going to be using quipclient and pandas_redshift libraries.

import quipclient as quip
import pandas as pd
import numpy as np
import pandas_redshift as pr
import boto3
from datetime import datetime as dt
import psycopg2
import warnings
warnings.filterwarnings(‘ignore’)
import socket

Step 3: Connecting to Quip using the token ID

The QuipClient API needs the base URL, thread ID, and access token to access any file. The base URL is the URL of the quip server you are trying to read from (or write to). In the case of corporate accounts, this will generally have the name of corporate in the URL. A thread ID is a unique identifier for all files on the Quip server. It is the alphanumeric value after the base URL of the target file, in this case, a spreadsheet.

If the URL for a file looks like — https://platform.quip-XXXXXXXXX.com/abcdefgh1234/, then the base URL will be — https://platform.quip-XXXXXXXXX.com and the thread_id will be — abcdefgh1234.

The access token is the one we just generated in Step 1.

Now, using the QuipClient API, we connect to the URL using the access token and thread_id.

#####################################
# declaring Quip variables
#####################################
baseurl = ‘https://platform.quip-XXXXXXXXX.com’
access_token = “************************************************************************”
thread_id = ‘abcdefgh1234’

##########################################
# connecting to Quip
##########################################
client = quip.QuipClient(access_token = access_token, base_url=baseurl)
rawdictionary = client.get_thread(thread_id)

Step 4: Reading the data from quip in a dataframe

The rawdictionary output from Step 3 above returns a list of HTML’s. Pandas function read_html will help read the HTML portion to the dataframe dfs. So, dfs is a list of dataframes. Each dataframe in this list contains the data from each tab in the quip spreadsheet. In this example, we are considering data from the last tab only. Hence, the index -1 is used to fetch the last dataframe in raw_df.

##########################################
# cleaning the data and creating a dataframe
##########################################
dfs=pd.read_html(rawdictionary[‘html’])
raw_df = dfs[-1]
raw_df.columns=raw_df.iloc[0] #Make first row as column header
raw_df=raw_df.iloc[1:,1:] #After the above step, the 1st two rows become duplicate. Delete the 1st row.
raw_df=raw_df.replace(‘u200b’, np.nan) #Replacing empty cells with nan

Step 5: Connecting to database to write the data to a table

To access a Redshift instance, we need the Redshift Endpoint URL. For example, instances will look something like this:

datawarehouse.some_chars_here.region_name.redshift.amazonaws.com.

We connect to the database and write the dataframe (created in step 4) to a new or an existing table. The pandas_to_redshift function allows you to append the data to an existing table or overwrite it altogether. Please note that if you select append = False, then the table will be dropped and re-created every time this operation is performed. In case you want to maintain the data types or char length of certain columns or user permissions while overwriting the data, it is better to truncate the table before running this operation. You can truncate by issuing a direct Truncate command. SQLAlchemy and psycopg2 are easier options to do this. After truncating the table, you can run the operation with append = True. I generally use append=True for type 2 tables where I need to maintain historical data.

#### Truncating the Table ####

##########################################
# Connecting to DataBase
##########################################
user1=”user”
pswd1=”password”
connection_db=psycopg2.connect(dbname = ‘test_db’,
host=’test_host.com’,
port= ‘1234’,
user= user1,
password= pswd1)

##########################################
# Connection Established
##########################################

df = pd.read_sql_query(“””Select distinct
from test_table
order by 1 asc
“””,connection_db)

result = df.to_markdown(index=False)

cur = connection_db.cursor()
cur.execute(‘TRUNCATE TABLE test_table’)#### Writing to the table ####

##########################################
# connecting to redshift and s3
##########################################

pr.connect_to_redshift(dbname = ‘db’,
host = ‘server.com’,
port = 1234,
user = ‘user’,
password = ‘password’)

pr.connect_to_s3(aws_access_key_id = ‘*************’,
aws_secret_access_key = ‘*************************’,
bucket = ‘test’,
subdirectory = ‘subtest’)

##########################################
# Write the DataFrame to S3 and then to redshift
##########################################
pr.pandas_to_redshift(data_frame = raw_df,
redshift_table_name = ‘test_table’,append = True,
region = ‘xxxxxxx’)

This completes the first part where you read the data from a quip spreadsheet and write to a redshift table. Now, let’s look at the second part.

Part 2 : Writing data to an existing Quip Spreadsheet.

For this part, the first three steps remain the same as Part 1. So, please follow steps 1, 2, and 3 from above. We will start from STEP 4 here.

Step 4: Connecting to the database to read data

We will use psycopg2 here to connect to the Redshift instance and read the data from the Redshift table which needs to be written to Quip Spreadsheet. Here, I am converting the dataframe to markdown to get a clean table which is also a pre-requisite of QuipClient library.

##########################################
# Connecting to DataBase
##########################################
user1=”user”
pswd1=”password”
connection_db=psycopg2.connect(dbname = ‘test_db’,
host=’test_host.com’,
port= ‘1234’,
user= user1,
password= pswd1)

##########################################
# Connection Established
##########################################

df = pd.read_sql_query(“””Select distinct
from test_table
order by 1 asc
“””,connection_db)

result = df.to_markdown(index=False)

Step 5: Writing the data to Quip File

To write the data to Quip Spreadsheet, you can use the edit_document function from the QuipClient library. This function has multiple parameters. Format can be either HTML or markdown. The default is HTML and that is why we converted the dataframe to markdown in Step 4. You need to specify the section_id and location to specify where you want to add data — append, pre-pend, after/before a particular section, etc. For this particular scenario, I wanted to just append the data to a new tab in the existing spreadsheet. You can read more about it here.

Sometimes, the operation is performed but the script still fails due to a delay in the API response. The try-except error block is to catch any timeout errors.

##########################################
# Inserting the data to Quip
##########################################

try:
client.edit_document(thread_id=thread_id,
content = result,
format=’markdown’,
operation=client.APPEND)
print(“Test DB is updated.”)
except socket.timeout:
print(“Error Excepted!”)

And we are done!

I hope you find this article helpful. Please don’t hesitate to reach out in case you have any additional questions.

Thanks for reading!

Until next time…

How to Read and Write Data from/to the Quip Spreadsheet using Quip Python APIs was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

Leave a Comment