0

I am querying an API that lets you request n# of items in a single API call. So I am breaking up the list of items I am querying into n# of "sublists", passing them to a function which returns the API data, and then concatenating the data to a Dataframe.

But when I loop through the "sublists", the final Dataframe only contains the last "sublist", rather than every "sublist". So instead of:

       netIncome sharesOutstanding
BRK.B         20                40
V             50                60
MSFT          30                10
ORCL          12                24
AMZN          33                55
GOOGL         66                88

I get:

       netIncome sharesOutstanding
AMZN          33                55
GOOGL         66                88

Here is the full code, so can someone tell me what I'm doing wrong?

import os
from iexfinance.stocks import Stock
import pandas as pd

# Set IEX Finance API Token (Public Sandbox Version)
os.environ['IEX_API_VERSION'] = 'iexcloud-sandbox'
os.environ['IEX_TOKEN'] = 'XXXXXX'

def fetch_company_info(group):
    """Function to query API data"""
    batch = Stock(group, output_format='pandas')

    # Get income from last 4 quarters, sum it, and store to temp Dataframe
    df_income = batch.get_income_statement(period="quarter", last='4')
    df_income = df_income.T.sum(level=0)
    income_ttm = df_income.loc[:, ['netIncome']]

    # Get number of shares, and store to temp Dataframe
    df_shares = batch.get_key_stats(period="quarter")
    shares_outstanding = df_shares.loc['sharesOutstanding']

    return income_ttm, shares_outstanding

# Full list to query via API
tickers = ['BRK.B', 'V', 'MSFT', 'ORCL', 'AMZN', 'GOOGL']

# Chunk ticker list into n# of lists
n = 2
batch_tickers = [tickers[i * n:(i + 1) * n] for i in range((len(tickers) + n - 1) // n)]

# Loop through each chunk of tickers
for group in batch_tickers:
    company_info = fetch_company_info(group)
    output_df = pd.concat(company_info, axis=1, sort='true')

print(output_df)
4
  • Are you sure you understand how concat works? If you concatenate company_info with nothing in every loop, your dataframe will only contain the results from the last loop. You should start with an empty DataFrame and then append Commented Sep 13, 2019 at 14:17
  • I thought best practice was to avoid append because it's inefficient. Since the real ticker list is going to contain thousands of items, I'd like to make it as performant as possible. Commented Sep 13, 2019 at 14:23
  • You're right. So Lauzloo's solution should work! Commented Sep 13, 2019 at 14:25
  • 1
    You return 2 dataframes and store it in one variable company_info. How's that even possible? Commented Sep 13, 2019 at 14:50

3 Answers 3

1

You need to do another pd.concat. The first one concats the the income_ttm and shares_outstanding column but you then need to use pd.concat in the row direction to add new rows to output_df.

First create output_df, where its first row is the first sublist. Then concat each new sublist to output_df. Also, it should be axis=0 instead of axis=1 because you want to concatenate in row direction, not column direction.

Try something like this at the end of your code:

# Loop through each chunk of tickers
for i in range(len(batch_tickers)):
    group = batch_tickers[i]
    company_info = fetch_company_info(group)
    ## concat income and shares outstanding
    company_df = pd.concat(company_info, axis=1, sort='true')
    # instantiate output_df to be company_info with first row
    if(i==0):
        output_df = company_df
    # for other rows, concat company_df
    else:
        output_df = pd.concat([output_df, company_df], axis=0)
Sign up to request clarification or add additional context in comments.

2 Comments

I get TypeError: cannot concatenate object of type "<class 'tuple'>"; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid
Try again. Just edited post. Should be pd.concat([output_df, company_df], axis=0) not pd.concat([output_df, company_info], axis=0)
1

Try List Comprehension first and concatenate afterward

company_info = [fetch_company_info(group) for group in batch_tickers]

output_df = pd.concat(company_info, axis=1, sort='true')

3 Comments

I get TypeError: cannot concatenate object of type "<class 'tuple'>"; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid
What does the fetch_company_info() function return?
I just added the full code which might be easier. Note that you can use the token as it's meant to be public, and only returns sandbox data.
1
def fetch_company_info(group):
    """Function to query API data"""
    batch = Stock(group, output_format='pandas')

    # Get income from last 4 quarters, sum it, and store to temp Dataframe
    df_income = batch.get_income_statement(period="quarter", last='4')
    df_income = df_income.T.sum(level=0)
    income_ttm = df_income.loc[:, ['netIncome']]

    # Get number of shares, and store to temp Dataframe
    df_shares = batch.get_key_stats(period="quarter")
    shares_outstanding = df_shares.loc['sharesOutstanding']

    df = pd.concat([income_ttm, shares_outstanding], ignore_index=True, axis=1)

    return df

.......

# Loop through each chunk of tickers
dataframes= []
for group in batch_tickers:
    company_info = fetch_company_info(group)
    dataframes.append(company_info )

df = reduce(lambda top, bottom: pd.concat([top, bottom], sort=False), dataframes)

1 Comment

Is there a way to do it without append? I've been advised against using it with dataframes as it tends to be resource heavy.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.