Issue with pandas: read_sql with multiple statements in one query returned with no rows

Question

I'm trying to get top 10 prescriptions prescribed from the database into a dataframe using pd.read_sql(sql, uri), but it returned with the following error:

~\AppData\Local\Continuum\anaconda3\envs\GISProjects\lib\site-packages\sqlalchemy\engine\result.py in _non_result(self, default)
   1168         if self._metadata is None:
   1169             raise exc.ResourceClosedError(
-> 1170                 "This result object does not return rows. "
   1171                 "It has been closed automatically."
   1172             )

ResourceClosedError: This result object does not return rows. It has been closed automatically.

My query has local variables to track ranking so that it'll return top 10 prescription by practice. It works if I run it in MySql Workbench but not when I use pd.read_sql()

sql = """
SET @current_practice = 0;
SET @practice_rank = 0;
select practice, bnf_code_9, total_items, practice_rank
FROM (select a.practice,
             a.bnf_code_9,
             a.total_items,
             @practice_rank := IF(@current_practice = a.practice, @practice_rank + 1, 1) AS practice_rank,
             @current_practice := a.practice
      FROM (select rp.practice, rp.bnf_code_9, sum(rp.items) as total_items
            from rx_prescribed rp
            where ignore_flag = '0'
            group by practice, bnf_code_9) a
      order by a.practice, a.total_items desc) ranked
where practice_rank <= 10;
"""
df = pd.read_sql(sql, uri)

I expect it to return the data and into pandas dataframe but it returned with error. I assume it was from the first statement, which sets the local variable. The first two statements are necessary so that the data returns with top 10.

It works fine without the first two statements, however, it would return with '1' in all rows for the practice_rank column rather than expected values of 1, 2 ,3 and so on.

Is there a way I can run multiple statements and return the results from the last statement executed?

The error seems to come from SQLAlchemy. What is the underlying database you used? Can you post the URI? — zan
– zan, Commented Jul 16, 2019 at 21:14
There is no error on connection, like I stated above, query works just fine without the first two statements but the data returned isn't in the format it should be. The first two statements fixes that. I'm using MySQL 5.7. — shuki25
– shuki25, Commented Jul 17, 2019 at 0:16

zan · Accepted Answer · 2019-07-20 18:31:07Z

Short answer

The stack of programs that are called in the pandas.read_sql() statement is: pandas > SQLAlchemy > MySQLdb or pymysql > MySql database. The database drivers mysqlclient (mysqldb) and pymysql don't like multiple SQL statements in a single execute() call. Split them up into separate calls.

Solution

import pandas as pd
from sqlalchemy import create_engine

# mysqldb is the default, use mysql+pymysql to use the pymysql driver
# URI format: mysql<+driver>://<user:password@>localhost/database
engine = create_engine('mysql://localhost/test')

# First two lines starting with SET removed
sql = '''
SELECT practice, bnf_code_9, total_items, practice_rank
FROM (
    SELECT
        a.practice,
        a.bnf_code_9,
        a.total_items,
        @practice_rank := IF(@current_practice = a.practice, @practice_rank + 1, 1) AS practice_rank,
        @current_practice := a.practice
    FROM (
        SELECT
            rp.practice, rp.bnf_code_9, sum(rp.items) AS total_items
        FROM rx_prescribed rp
        WHERE ignore_flag = '0'
        GROUP BY practice, bnf_code_9
    ) a
    ORDER BY a.practice, a.total_items DESC
) ranked
WHERE practice_rank <= 10;
'''

with engine.connect() as con:
    con.execute('SET @current_practice = 0;')
    con.execute('SET @practice_rank = 0;')

    df = pd.read_sql(sql, con)

print(df)

Results in:

   practice  bnf_code_9  total_items  practice_rank
0         2           3          6.0              1
1         6           1          9.0              1
2         6           2          4.0              2
3         6           4          3.0              3
4        17           1          0.0              1
5        42          42         42.0              1

I used the following code to create a test database for your problem.

DROP TABLE IF EXISTS rx_prescribed;
CREATE TABLE rx_prescribed (
    id INT AUTO_INCREMENT PRIMARY KEY,
    practice INT,
    bnf_code_9 INT,
    items INT,
    ignore_flag INT
);
INSERT INTO rx_prescribed (practice, bnf_code_9, items, ignore_flag) VALUES (2, 3, 4, 0);
INSERT INTO rx_prescribed (practice, bnf_code_9, items, ignore_flag) VALUES (2, 3, 2, 0);
INSERT INTO rx_prescribed (practice, bnf_code_9, items, ignore_flag) VALUES (6, 1, 9, 0);
INSERT INTO rx_prescribed (practice, bnf_code_9, items, ignore_flag) VALUES (6, 2, 4, 0);
INSERT INTO rx_prescribed (practice, bnf_code_9, items, ignore_flag) VALUES (6, 4, 3, 0);
INSERT INTO rx_prescribed (practice, bnf_code_9, items, ignore_flag) VALUES (9, 11, 1, 1);
INSERT INTO rx_prescribed (practice, bnf_code_9, items, ignore_flag) VALUES (17, 1, 0, 0);
INSERT INTO rx_prescribed (practice, bnf_code_9, items, ignore_flag) VALUES (42, 42, 42, 0);

Tested on MariaDB 10.3.

Collectives™ on Stack Overflow

Issue with pandas: read_sql with multiple statements in one query returned with no rows

1 Answer 1

Short answer

Solution

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Short answer

Solution

Comments

Your Answer

Sign up or log in

Post as a guest

Related