0

This is a problem that took me a long time to solve, and I wanted to share my solution. Here's the problem.

We have 2 pandas DataFrames that need to be outer joined on a very complex condition. Here was mine:

condition_statement = """
    ON (
        A.var0 = B.var0
        OR (
            A.var1 = B.var1
            AND (
                A.var2 = B.var2
                OR A.var3 = B.var3
                OR A.var4 = B.var4
                OR A.var5 = B.var5
                OR A.var6 = B.var6
                OR A.var7 = B.var7
                OR (
                    A.var8 = B.var8
                    AND A.var9 = B.var9
                )
            )
        )
    )
"""

Doing this in pandas would be a nightmare.

I like to do most of my DataFrame massaging with the pandasql package. It lets you run SQL queries on top of the DataFrames in your local environment.

The problem with pandasql is it runs on a SQLite engine, so you can't do RIGHT or FULL OUTER joins.

So how do you approach this problem?

1 Answer 1

0

Well you can achieve a FULL OUTER join with two LEFT joins, a condition, and a UNION.

First, declare a snippet with the columns you want to retrieve:

select_statement = """
    SELECT 
        A.var0
        , B.var1
        , COALESCE(A.var2, B.var2) as var2
"""

Next, build a condition that represents all values in A being NULL. I built mine using the columns in my DataFrame:

where_a_is_null_statement = f"""
    WHERE 
    {" AND ".join(["A." + col + " is NULL" for col in A.columns])}
"""

Now, do the 2-LEFT-JOIN-with-a-UNION trick using all of these snippets:

sqldf(f"""
    {select_statement}
    FROM A
    LEFT JOIN B
    {condition_statement}
    UNION
    {select_statement}
    FROM B
    LEFT JOIN A
    {condition_statement}
    {where_a_is_null_statement}
""")
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.