1

I have a schema which contains a list of addresses for companies. For a report I need to do some filtering for those companies and includes a list of all regions they are in. The relevant core part of the query I am trying to write is:

SELECT name, ARRAY(SELECT DISTINCT state
                   FROM location
                   WHERE location.foo_id=foo.id)
FROM foo;

The base code I am using to test this is pretty simple (note that I am using PostgreSQL as my database):

import sqlalchemy as sa
from sqlalchemy.sql import distinct, func
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import relationship, sessionmaker


Base = declarative_base()


class Company(Base):
    __tablename__ = 'company'
    id = sa.Column(sa.Integer, sa.Sequence('company_id_seq'), primary_key=True)
    name = sa.Column(sa.Unicode)


class Location(Base):
    __tablename__ = 'location'
    id = sa.Column(sa.Integer, sa.Sequence('location_id_seq'), primary_key=True)
    company_id = sa.Column(sa.Integer, sa.ForeignKey(Company.id))
    company = relationship(Company, backref='locations')
    state = sa.Column(sa.Unicode, nullable=False)

engine = sa.create_engine('postgresql:///pytest', echo=False)
Base.metadata.create_all(engine)
Session = sessionmaker(bind=engine)

session = Session()
for (name, states) in [
        ('Acme', ['Alabama', 'Georgia']),
        ('Example Inc', ['Florida', 'Florida']),
        ('Empty', [])]:
    session.add(Company(
        name=name,
        locations=[Location(state=s) for s in states]))
session.flush()

# Desired result:
#
#  [('Acme', ['Alabama', 'Georgia']),
#   ('Example Inc', ['Florida']),
#   ('Empty', [])]

states_per_foo = session.query(distinct(Location.state))\
    .filter(Location.company_id == Company.id)
search_query = session.query(Company, func.array(states_per_foo))
print(search_query.all())

All permutations I have tried so far fail with various SQLAlchemy errors or invalid SQL being generated.

1
  • updated my answer with the version that produces exactly the results you are looking for Commented Dec 7, 2016 at 5:42

2 Answers 2

3

Below code:

sq = (
    session
    .query(Location.state.distinct())
    .filter(Location.company_id == Company.id)
    .correlate(Company)
    .as_scalar()
)
q = session .query(Company.name, func.array(sq).label("states"))

produces exactly (ignoring some extra parenthesis) the SQL query you are trying to write:

SELECT company.name,
       array(
               (SELECT DISTINCT location.state AS anon_1
                FROM location
                WHERE location.company_id = company.id)) AS states
FROM company

and the result is:

('Acme', ['Georgia', 'Alabama'])
('Example Inc', ['Florida'])
('Empty', [])

Original version: Slight modification: added .distinct() in order to remove the duplicates.

Not the SQL query you are looking for, but the result of the below query:

q = (
    session
    .query(Company.name, func.array_agg(Location.state.distinct()))
    .outerjoin(Location)
    .group_by(Company.name)
)
for x in q.all():
    print(x)

is (sorted by Company.name because of grouping):

('Acme', ['Alabama', 'Georgia'])
('Empty', [None])
('Example Inc', ['Florida'])

Note the [None] instead of [] for Empty. This can be handled in a special way as well.

Sign up to request clarification or add additional context in comments.

Comments

0
import sqlalchemy as sa
engine = sa.create_engine('postgresql://...')
result = engine.execute('''SELECT name, 
                               ARRAY(SELECT DISTINCT state
                                     FROM location
                                     WHERE location.foo_id=foo.id)
                           FROM foo''')
rows = result.fetchall()
for row in rows:
   print(type(row[0]))  # <class 'str'>
   print(row[0])        # Acme
   print(type(row[1]))  # <class 'list'>
   print(row[1])        # ['Alabama', 'Georgia']

2 Comments

I don't consider this to be a valid solution, since it essentially bypasses all SQLAlchemy logic and removes my ability to build the query piece-wise.
It depends on your final goal. If you wish to make reports on company and regions it could be the best choice, because it doesn't use ORM and thus minimises overhead on big tables. If you need some extra actions, it would be not so handy.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.