12

I'm using flask, sqlalchemy and flask-sqlalchemy. I want to create a full test search index in postgres with gin and to_tsvector. At the moment, I'm trying the following. I think its the closest I've got to what I'm trying to express, but doesn't work.

from sqlalchemy.ext.declarative import declared_attr
from sqlalchemy.schema import Index
from sqlalchemy.sql.expression import func

from app import db


class Post(db.Model):

    id = db.Column(db.Integer, primary_key=True)
    added = db.Column(db.DateTime, nullable=False)
    pub_date = db.Column(db.DateTime, nullable=True)
    content = db.Column(db.Text)

    @declared_attr
    def __table_args__(cls):
        return (Index('idx_content', func.to_tsvector("english", "content"), postgresql_using="gin"), )

This throws the following error...

Traceback (most recent call last):
  File "./manage.py", line 5, in <module>
    from app import app, db
  File "/vagrant/app/__init__.py", line 36, in <module>
    from pep.models import *
  File "/vagrant/pep/models.py", line 8, in <module>
    class Post(db.Model):
  File "/home/vagrant/.virtualenvs/pep/local/lib/python2.7/site-packages/flask_sqlalchemy.py", line 477, in __init__
    DeclarativeMeta.__init__(self, name, bases, d)
  File "/home/vagrant/.virtualenvs/pep/local/lib/python2.7/site-packages/sqlalchemy/ext/declarative/api.py", line 48, in __init__
    _as_declarative(cls, classname, cls.__dict__)
  File "/home/vagrant/.virtualenvs/pep/local/lib/python2.7/site-packages/sqlalchemy/ext/declarative/base.py", line 222, in _as_declarative
    **table_kw)
  File "/home/vagrant/.virtualenvs/pep/local/lib/python2.7/site-packages/sqlalchemy/schema.py", line 326, in __new__
    table._init(name, metadata, *args, **kw)
  File "/home/vagrant/.virtualenvs/pep/local/lib/python2.7/site-packages/sqlalchemy/schema.py", line 393, in _init
    self._init_items(*args)
  File "/home/vagrant/.virtualenvs/pep/local/lib/python2.7/site-packages/sqlalchemy/schema.py", line 63, in _init_items
    item._set_parent_with_dispatch(self)
  File "/home/vagrant/.virtualenvs/pep/local/lib/python2.7/site-packages/sqlalchemy/events.py", line 235, in _set_parent_with_dispatch
    self._set_parent(parent)
  File "/home/vagrant/.virtualenvs/pep/local/lib/python2.7/site-packages/sqlalchemy/schema.py", line 2321, in _set_parent
    ColumnCollectionMixin._set_parent(self, table)
  File "/home/vagrant/.virtualenvs/pep/local/lib/python2.7/site-packages/sqlalchemy/schema.py", line 1978, in _set_parent
    self.columns.add(col)
  File "/home/vagrant/.virtualenvs/pep/local/lib/python2.7/site-packages/sqlalchemy/sql/expression.py", line 2391, in add
    self[column.key] = column
  File "/home/vagrant/.virtualenvs/pep/local/lib/python2.7/site-packages/sqlalchemy/sql/expression.py", line 2211, in __getattr__
    key)
AttributeError: Neither 'Function' object nor 'Comparator' object has an attribute 'key'

I've also tried

return (Index('idx_content', "content", postgresql_using="gin"), )

However, it doesn't work as postgres (9.1 at least, as that's what I run) expects to_tsvector to be called. This line creates the SQL;

CREATE INDEX content_index ON post USING gin (content)

rather than what I want;

CREATE INDEX content_index ON post USING gin(to_tsvector('english', content))

I opened a ticket as I think this may be a bug/limitation. http://www.sqlalchemy.org/trac/ticket/2605

4
  • 1
    What SQLAlchemy version is that? Commented Nov 13, 2012 at 13:42
  • 0.8.0 beta. However, I also tried the 0.7.2 (I think - the latest stable one). Commented Nov 14, 2012 at 8:38
  • 1
    There was a fix related to column.key vs column.name usage that was applied post-0.8.0beta Commented Nov 14, 2012 at 16:00
  • That's good to know thanks - I'll keep a look out for next beta/final release for it. Commented Nov 15, 2012 at 9:34

4 Answers 4

4

For now I've added the following lines to do it manually, but I'd much rather the 'correct' SQLAlchemy approach if there is one.

create_index = DDL("CREATE INDEX idx_content ON pep USING gin(to_tsvector('english', content));")
event.listen(Pep.__table__, 'after_create', create_index.execute_if(dialect='postgresql'))

There was some interesting discussion on the SQLAlchemy bug tracker. It looks like this is a limitation of the current indexing definition. Basically, my requirement is to allow indexes to be expressions rather than just column names but that isn't currently supported. This ticket is tracking this feature request: http://www.sqlalchemy.org/trac/ticket/695 . However, this is waiting for a developer to take forward and do the work (and has been for a while).

Sign up to request clarification or add additional context in comments.

1 Comment

Closed/fixed in 2013.
3

Ran across this old question as I was working on creating some single and multicolumn tsvector GIN indexes. For anyone that is looking for a simple way to create these indexes using string representations of the column names, here is one approach using the SQLAlchemy text() construct.

from sqlalchemy import Column, Index, Integer, String, text
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.sql import func


Base = declarative_base()

def to_tsvector_ix(*columns):
    s = " || ' ' || ".join(columns)
    return func.to_tsvector('english', text(s))

class Example(Base):
    __tablename__ = 'examples'

    id = Column(Integer, primary_key=True)
    atext = Column(String)
    btext = Column(String)

    __table_args__ = (
        Index(
            'ix_examples_tsv',
            to_tsvector_ix('atext', 'btext'),
            postgresql_using='gin'
            ),
        )

Comments

3

Simplified, using ORM approach & TSVectorType helper from sqlalchemy-utils (https://sqlalchemy-utils.readthedocs.io/en/latest/_modules/sqlalchemy_utils/types/ts_vector.html):

Defining a TSVECTOR column (TSVectorType) in your ORM model (declarative) populated automatically from the source text field(s)

import sqlalchemy as sa
from sqlalchemy_utils.types.ts_vector import TSVectorType
# ^-- https://sqlalchemy-utils.readthedocs.io/en/latest/_modules/sqlalchemy_utils/types/ts_vector.html


class MyModel(Base):
    __tablename__ = 'mymodel'
    id = sa.Column(sa.Integer, primary_key=True)
    content = sa.Column(sa.String, nullable=False)
    content_tsv = sa.Column(
        TSVectorType("content", regconfig="english"),
        sa.Computed("to_tsvector('english', \"content\")", persisted=True))
    #      ^-- equivalent for SQL:
    #   COLUMN content_tsv TSVECTOR GENERATED ALWAYS AS (to_tsvector('english', "content")) STORED;

    __table_args__ = (
        # Indexing the TSVector column
        sa.Index("idx_mymodel_content_tsv", content_tsv, postgresql_using="gin"), 
    )

How to query on the TSVector column? (and use ts_rank to return results ordered by TS rank)


tsquery = sa.func.plainto_tsquery("english", "multiple words query here")

stmt = sa.select(MyModel).where(
    MyModel.content_tsv.bool_op("@@")(tsquery)
).order_by(
    sa.func.ts_rank(MyModel.content_tsv, tsquery).desc()
).limit(5)

t, = db.session.execute(stmt).first()

Note important difference when querying with SQLAlchemy 1.4 vs SQLAlchemy 2.0:

If you try in SQLAlchemy 1.4:

stmt = sa.select(MyModel).where(
    MyModel.content_tsv.match("multiple words query here", postgresql_regconfig="english")
)

(as shown here for 1.4), instead of:

tsquery = sa.func.plainto_tsquery("english", "multiple words query here")
#                 ^^^^^^^^^^^^^^^
stmt = sa.select(MyModel).where(
    MyModel.content_tsv.bool_op("@@")(tsquery)
)

..it will translate to to_tsquery("english", "multiple words query here"), which only accepts a single term, and thus raise SyntaxError: ProgrammingError: (psycopg2.errors.SyntaxError) syntax error in tsquery: "multiple words query here".

This behavior is changed in SQLAlchemy 2.0, as mentioned here: https://docs.sqlalchemy.org/en/20/dialects/postgresql.html#full-text-search.

Starting with SQLAlchemy 2.0, TSVColumn.match(...) uses plainto_tsquery(...):

MyModel.content_tsv.match("multiple words query here", postgresql_regconfig="english")

will translate to:

plainto_tsquery("english", "multiple words query here")
^^^^^^^^^^^^^^^
# instead of
# to_tsquery("english", "multiple words query here")
# which is NOT the same

2 Comments

this is great, but for some reason I am getting an empty response when I use the vector column and match, whereas if I run the text column with ilike it returns the row, any ideas where there could be an issue? using SQLAlchemy 2
Could not edit above comment:I was not getting results because match does not handle partial matching, for that you gotta use something like python search_query = "parti:*" # partial session.execute( select(Model) .where(func.to_tsquery("english", search_query).op('@@')(Model. content_tsv)) ).scalars().all()
1

So in sqlalchemy 0.9 and up this works:

class Content(Base, ):
    __tablename__ = 'content'

    id = sa.Column(sa.Integer, primary_key=True)

    description = sa.Column(sa.UnicodeText, nullable=False, server_default='')
    @declared_attr
    def __table_args__(cls):
        return (sa.Index('idx_content',
                     sa.sql.func.to_tsvector("english", cls.description),
                     postgresql_using="gin"), )

Notably, the difference from the first example is a direct reference to the column name, as opposed the the column name being provided in quotes, as that did not work.

1 Comment

I'm trying this on 0.9.8 and it doesn't. Breaks with: sqlalchemy.exc.ArgumentError: Can't add unnamed column to column collection

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.