8

I am trying to connect to snowflake using the python pandas connector.

I use the anaconda distribution on Windows, but uninstalled the existing connector and pyarrow and reinstalled using instructions on this page: https://docs.snowflake.com/en/user-guide/python-connector-pandas.html

I have the following versions

pandas 1.0.4 py37h47e9c7a_0

pip 20.1.1 py37_1

pyarrow 0.17.1 pypi_0 pypi

python 3.7.7 h81c818b_4

snowflake-connector-python 2.2.7 pypi_0 pypi

When running step 2 of this document: https://docs.snowflake.com/en/user-guide/python-connector-install.html, I get: 4.21.2

On attempting to use fetch_pandas_all() I get an error: NotSupportedError: Unknown error

enter image description here

The code I am using is as follows:

import snowflake.connector
import pandas as pd

SNOWFLAKE_DATA_SOURCE = '<DB>.<Schema>.<VIEW>'

query = '''
select * 
from table(%s)
LIMIT 10;
'''
def create_snowflake_connection():
    conn = snowflake.connector.connect(
            user='MYUSERNAME',
            account='MYACCOUNT',
            authenticator = 'externalbrowser',
            warehouse='<WH>',
            database='<DB>',
            role='<ROLE>',
            schema='<SCHEMA>'
    )
    
    return conn

con = create_snowflake_connection()

cur = con.cursor()
temp = cur.execute(query, (SNOWFLAKE_DATA_SOURCE)).fetch_pandas_all()
cur.close()

I am wondering what else I need to install/upgrade/check in order to get fetch_pandas_all() to work?

Edit: After posting an answer below, I have realised that the issue is with the SSO (single sign on) with authenticator='externalbrowser'. When using a stand-alone account I can fetch.

4
  • Did you look at the example in the Snowflake documentation? docs.snowflake.com/en/user-guide/… This might just be an issue of format of the query string. Commented Jun 17, 2020 at 23:21
  • Yes, same error. Commented Jun 17, 2020 at 23:34
  • Running show parameters like '%python_connector_query_result_format%' returns value Arrow, and ROWS_PER_RESULTSET is 0. Commented Jun 20, 2020 at 6:54
  • To rule out packages and environments influenced by Anaconda, have you tried to use the libraries directly installed (outside of anaconda) via pip3? Commented Jun 27, 2020 at 10:01

5 Answers 5

15

I found a workaround that avoids the SSO error by relying on fetchall() instead of fetch_all_pandas():

try: 
    cur.execute(sql)
    all_rows = cur.fetchall()
    num_fields = len(cur.description)
    field_names = [i[0] for i in cur.description]
finally:
    cur.close()

con.close()

df = pd.DataFrame(all_rows)
df.columns = field_names
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, I always prefer to avoid another extra dependency if I can help it. This is similar to "df.columns = cur.keys()" used with other SQL connectors.
Worked perfectly for me. Seems the SSO-bug when fetching with to pandas, is still in effect as of this post.
7

The reason is snowflake-connector-python does not install "pyarrow" which you need to play with pandas.

Either you could install and Import Pyarrow or

Do :

pip install "snowflake-connector-python[pandas]"

and try it .

2 Comments

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.
Did the command you mention i am still getting the same error
1

What happens when you run this code?

from snowflake import connector
import time

import logging
for logger_name in ['snowflake.connector', 'botocore', 'boto3']:
    logger = logging.getLogger(logger_name)
    logger.setLevel(logging.DEBUG)
    ch = logging.FileHandler('test.log')
    ch.setLevel(logging.DEBUG)
    ch.setFormatter(logging.Formatter('%(asctime)s - %(threadName)s %(filename)s:%(lineno)d - %(funcName)s() - %(levelname)s - %(message)s'))
    logger.addHandler(ch)

from snowflake.connector.cursor import CAN_USE_ARROW_RESULT

import  pyarrow
import pandas as pd

print('CAN_USE_ARROW_RESULT', CAN_USE_ARROW_RESULT)

This will output whether CAN_USE_ARROW_RESULT is true and if it's not true, then pandas won't work. When you did the pip install, which of these did you run?

pip install snowflake-connector-python pip install snowflake-connector-python[pandas]

Also, what OS are you running on?

2 Comments

Thank-you for your help. CAN_USE_ARROW_RESULT returns True. I used the following to install: pip install snowflake-connector-python[pandas]. The OS is: Windows 10 Pro.
I've tested the same on windows with the same problem. Apparently it won't work on Windows.
1

I have this working now, but am not sure which part helps - the following steps were taken:

  1. Based on comment by @Kirby, I tried pip3 install --upgrade snowflake-connector-python .. this is based on a historic screenshot .. I should have have [pandas] in brackets, i.e. pip3 install --upgrade snowflake-connector-python[pandas], but regardless, I got the following error message:

Error: Microsoft Visual C++ 14.0 is required. Get it with "Build Tools for Visual Studio": https://visualstudio.microsoft.com/downloads

I therefore downloaded (exact filename: vs_buildtools__121011638.1587963829.exe) and installed VS Build Tools.

  1. This is the tricky part .. I subsequently got admin access to my machine (so hoping it is the visual studio build tools that helped, and not admin access)

  2. I then followed the Snowflake Documentation Python Connector API instructions originally referred to:

    a. Anaconda Prompt (opened as admin): pip install snowflake-connector-python[pandas]

    b. Python:

import snowflake.connector
import pandas as pd

ctx = snowflake.connector.connect(
          user=user,
          account=account,
          password= 'password',
          warehouse=warehouse,
          database=database,
          role = role,
          schema=schema)

# Create a cursor object.
cur = ctx.cursor()

# Execute a statement that will generate a result set.
sql = "select * from t"
cur.execute(sql)

# Fetch the result set from the cursor and deliver it as the Pandas DataFrame.
df = cur.fetch_pandas_all()

Edit I have since realised that I still have the error when executing df = cur.fetch_pandas_all() when using my Okta (single sign on) account, i.e. when I use my username and authenticator = 'externalbrowser'. When I use a different account, I no longer get the error (with password).
NOTE: That I am still able to connect with externalbrowser (and I see the query has executed successfully in Snowflake history); I am just not able to fetch.

2 Comments

as per edit here and in question; the issue remains when using SSO. With SSO, I am able to execute the query successfully but not fetch to Python. With a stand-alone account, I am successfully able to pull the data to Python using fetch_pandas_all()
Any solution to the above? with SSO fetching data
0

Using ..

python -m pip install "snowflake-connector-python[pandas]"

..as in the docs did not fetch the correct version of pyarrow for me (docs says you need 3.0.x).

With my conda (using python 3.8) I had to manually update pyarrow to the specifiv version:

python -m pip install pyarrow=6.0

2 Comments

Why if the docs say you need 3.0.x did you install 6.0? Are the docs wrong?
I dont remember if version "6.0" was trial and error from my side or if I read it somewhere

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.