How to refactor code to fix deprecated list '.append' from a 159 lines of python code?

Question

We are quoting a client who wants to migrate data to Hubspot and now we are dealing with data modeling and database issues to plan for.

While planning the migration we RTM for Hubspot data, and a member of my team found this chunk of code, which helps in a few areas. However, the EDIT list .append has been used, which means we have to change how this is written... depending on how the data is structured.

I thought since it said 'DataFrame' object has no attribute 'append' meant it was pandas, sorry for the confusion as I thought all dataframes were pandas dataframes.

With as large as this chunk of code I have a few questions, the whole code is listed here Hubspot Community Data

All questions are welcome, and hopefully this isn't asking a stupid question I'm stumped on how to solve for this.
How do you look at the whole 152 lines of code to deconstruct it into smaller chunks? Preferably, each function does 1 or 2 things but not more, which is what I'm hoping to do.
How do I go about refactoring or adjusting this chunk code below so the data dictionary works now that .append is no longer available? Since the _append is unlikely to be the most efficient choice, I'm unsure how or where to start.

EDIT 1. Ok so I'm updating the shared code, and the error " return object.getattribute(self, name) AttributeError: 'DataFrame' object has no attribute 'append'. Did you mean: '_append'?" comes from line 90 which is in the first for loop of the generate_data function after the data dictionary.

        company_industry = faker.random_element(
            ["Technology", "Healthcare", "Finance", "Real Estate"]
        )

Hubspot Data Faker

# This code has been created by Michael Kupermann ([email protected] or [email protected])
# The purpose of this code is to generate dummy data that simulates a realistic dataset for HubSpot CRM.
# This data can then be used for demonstrations, testing, or other purposes that require a representative dataset.
# You need to amend the HubSpot Sales and Service Pipelines before you import the data.
#
# Required Packages:
# 1. Faker: This package is used to generate the fake data for our dataset.
# 2. Pandas: This package is used to handle the data in a tabular format and to write the data to an Excel file.
# 3. DateTime: This package is used to generate realistic date data for the 'close_date' field.
#
# To install the necessary packages, you can use pip, the Python package installer.
# Open your terminal (or command prompt on Windows), and enter the following commands:
# pip install faker
# pip install pandas
# pip install datetime
#
# If you're using a Jupyter notebook, you can prefix these commands with an exclamation mark:
# !pip install faker
# !pip install pandas
# !pip install datetime

from faker import Faker
import pandas as pd
from datetime import datetime, timedelta


#  Function to generate data for a given country. Here 100 companies with 10 contacts, deals, tickets for each company
def generate_data(
    country,
    company_rows=100,
    contacts_per_company=10,
    deals_per_company=10,
    products_per_deal=10,
):
    # Set the locale for Faker based on the country
    if country == "Germany":
        faker = Faker("de_DE")
    elif country == "United States":
        faker = Faker("en_US")
    elif country == "France":
        faker = Faker("fr_FR")
    elif country == "Italy":
        faker = Faker("it_IT")
    elif country == "Japan":
        faker = Faker("ja_JP")
    elif country == "United Kingdom":
        faker = Faker("en_GB")
    elif country == "Canada":
        faker = Faker("en_CA")
    elif country == "Austria":
        faker = Faker("de_AT")
    elif country == "Switzerland":
        faker = Faker("de_CH")

    # Create a dictionary to hold the data
    data = {
        "company_name": [],
        "company_domain": [],
        "company_industry": [],
        "company_address": [],
        "company_country": [],
        "contact_firstname": [],
        "contact_lastname": [],
        "contact_email": [],
        "contact_phone": [],
        "contact_address": [],
        "contact_country": [],
        "contact_function": [],
        "contact_department": [],
        "deal_name": [],
        "deal_stage": [],
        "deal_amount": [],
        "deal_type": [],
        "deal_source": [],
        "close_date": [],
        "ticket_title": [],
        "ticket_status": [],
        "ticket_priority": [],
        "product_name": [],
        "product_price": [],
        "product_description": [],
        "product_sku": [],
        "product_quantity": [],
    }

    # Loop to generate data for each company
    for _ in range(company_rows):
        company_name = faker.company()
        company_domain = faker.domain_name()
        company_industry = faker.random_element(
            ["Technology", "Healthcare", "Finance", "Real Estate"]
        )
        company_address = faker.address().replace("\n", ", ")
        company_country = country

        # Loop to generate data for each contact
        for _ in range(contacts_per_company):
            contact_firstname = faker.first_name()
            contact_lastname = faker.last_name()
            contact_email = faker.email()
            contact_phone = faker.phone_number()
            contact_address = faker.address().replace("\n", ", ")
            contact_country = country
            contact_function = faker.job()
            contact_department = faker.random_element(
                ["Sales", "Marketing", "Human Resources", "Engineering"]
            )

            # Append generated company and contact data to the lists in the dictionary
            data["company_name"].append(company_name)
            data["company_domain"].append(company_domain)
            data["company_industry"].append(company_industry)
            data["company_address"].append(company_address)
            data["company_country"].append(company_country)

            data["contact_firstname"].append(contact_firstname)
            data["contact_lastname"].append(contact_lastname)
            data["contact_email"].append(contact_email)
            data["contact_phone"].append(contact_phone)
            data["contact_address"].append(contact_address)
            data["contact_country"].append(contact_country)
            data["contact_function"].append(contact_function)
            data["contact_department"].append(contact_department)

            # Generate deal and product data
            data["deal_name"].append(f"Deal-{faker.uuid4()}")
            data["deal_stage"].append(
                faker.random_element(
                    [
                        "Appointment Scheduled",
                        "Qualified To Buy",
                        "Presentation Scheduled",
                        "Decision Maker Brought-In",
                    ]
                )
            )
            data["deal_amount"].append(faker.random_int(min=1000, max=50000))
            data["deal_type"].append(
                faker.random_element(["New Business", "Existing Business"])
            )
            data["deal_source"].append(
                faker.random_element(
                    ["Direct Traffic", "Organic Search", "Paid Search", "Social Media"]
                )
            )
            data["close_date"].append(
                (
                    datetime.today() + timedelta(days=faker.random_int(min=1, max=90))
                ).date()
            )

            # Generate product data
            data["product_name"].append(f"Product-{faker.uuid4()}")
            data["product_price"].append(faker.random_int(min=10, max=1000))
            data["product_description"].append(faker.catch_phrase())
            data["product_sku"].append(faker.random_int(min=10000, max=99999))
            data["product_quantity"].append(faker.random_int(min=1, max=100))

            # Generate ticket data
            data["ticket_title"].append(f"Ticket-{faker.uuid4()}")
            data["ticket_status"].append(
                faker.random_element(
                    ["New", "Waiting on contact", "Waiting on us", "Closed"]
                )
            )
            data["ticket_priority"].append(
                faker.random_element(["Low", "Medium", "High"])
            )

    # Convert the data dictionary to a pandas DataFrame
    df = pd.DataFrame(data)
    return df


# Define the list of countries for which we want to generate data
g7_countries = [
    "Canada",
    "France",
    "Germany",
    "Italy",
    "Japan",
    "United Kingdom",
    "United States",
    "Austria",
    "Switzerland",
]

# Create an empty DataFrame to hold the generated data
result = pd.DataFrame()
for country in g7_countries:
    df = generate_data(country)
    # Append the data for each country to the result DataFrame
    result = result.append(df)

# Write the generated data to an Excel file
result.to_excel(r"C:\~\~\~\hubspot_dummy_data.xlsx", index=False)

nowhere in that code, as far as I can tell, is pd.DataFrame.append used. You didn't provide a minimal reproducible example, but data seems to be a dict with list objects in it (from looking through the code in your link). There is nothing in the code you've shown that needs to be changed because of the deprecation fo pd.DataFrame.append (since it isn't used...) — juanpa.arrivillaga
– juanpa.arrivillaga, Commented Mar 17, 2024 at 18:46

e-motta · Accepted Answer · 2024-03-17 22:25:05Z

1

Addressing the question in your title regarding the use of the deprecated method pandas.DataFrame.append.

As already mentioned in the comments, append in first part of the code you shared in the question is list.append, which works fine, in constant time, and is not deprecated.

The problem is in the following part, which uses pandas.DataFrame.append:

# Create an empty DataFrame to hold the generated data
result = pd.DataFrame()
for country in g7_countries:
    df = generate_data(country)
    # Append the data for each country to the result DataFrame
    result = result.append(df)

To get rid of the AttributeError, you can instead use concat:

result_list = []

for country in g7_countries:
    df = generate_data(country)
    result_list.append(df)

result = pd.concat(result_list)

This code generates fake data, which maybe you want to use just for tests. I wouldn't bother optimizing it.

edited Mar 17, 2024 at 22:25

answered Mar 17, 2024 at 19:14

e-motta

7,5953 gold badges10 silver badges32 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Seattle Python Noobie Over a year ago

Thank you, i've updated to show the full code, removed references to pandas, sorry I thought dataframe=pandas, more to learn.

e-motta Over a year ago

@SeattlePythonNoobie Maybe my answer was unclear. DataFrame is a class in the Pandas library, you are right about that. The issue is you're confusing two methods which have the same name (append) from two different objects. I've edited my answer with links to the documentation of each. In any case, to solve the AttributeError you can just replace the code in the first block in my answer with the code in the second block, that uses concat instead of append. Let us know in case it doesn't work.

Seattle Python Noobie Over a year ago

thank you for rewriting that and I like how you renamed result_list.append as that helps to clarify we are appending a list with a well named variable. I'll try this for my tests and will let you know.

Collectives™ on Stack Overflow

How to refactor code to fix deprecated list '.append' from a 159 lines of python code?

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related