40

I am trying to run a fulltext query using Postgresql that can cater for partial matches using wildcards.

It seems easy enough to have a postfix wildcard after the search term, however I cannot figure out how to specify a prefix wildcard.

For example, I can perform a postfix search easily enough using something like..

SELECT "t1".* 
FROM "t1" 
WHERE (to_tsvector('simple', "t1"."city") @@ to_tsquery('simple', 'don:*') )

should return results matching "London"

However I cant seem to do a prefix search like...

SELECT "t1".* 
FROM "t1" 
WHERE (to_tsvector('simple', "t1"."city") @@ to_tsquery('simple', ':*don') )

Ideally I'd like to have a wildcard prefixed to the front and end of the search term, something like...

SELECT "t1".* 
FROM "t1" 
WHERE (to_tsvector('simple', "t1"."city") @@ to_tsquery('simple', ':*don:*') )

I can use a LIKE condition however I was hoping to benefit from the performance of the full text search features in Postgres.

0

3 Answers 3

24

Full text search is good for finding words, not substrings.

For substring searches you'd better use like '%don%' with pg_trgm extension available from PostgreSQL 9.1 and using gin (column_name gin_trgm_ops) or using gist (column_name gist_trgm_ops) indexes. But your index would be very big (even several times bigger than your table) and write performance not very good.

There's a very good example of using pg_trgm for substring search on select * from depesz blog.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the answer, we've implemented something similar already in terms of the query so with the addition of the trigrams hopefully this should give us the performance gain we require. Thanks again.
how to do using gist (column_name gist_trgm_ops) on 2 columns instead of one?
11

One wild and crazy way of doing it would be to create a tsvector index of all your documents, reversed. And reverse your queries for postfix search too.

This is essentially what Solr does with its ReversedWildcardFilterFactory

select
reverse('brown fox')::tsvector @@ (reverse('rown') || ':*')::tsquery --true

3 Comments

Unfortunately if you will query row instead of rown it will not return results. The reason is that it will check from end to start, but again only from first (last in this situation) letter, and never from the middle.
@BernardPotocki not in the spec ;) Full text search is hard enough without substrings. If you want to search row and match brown then this is a good use-case for regexp
Worth mentioning that you'd also want to generate a reversed index of course, I gave an full example at: stackoverflow.com/a/79317399/895245 with a generated column.
1

Speeding up suffix search with reverse on the generated column

This answer covers the best suffix search method for modern PostgreSQL. It does not cover the "find any match inside words" case, only exact suffixes.

Given a table:

CREATE TABLE mytable (mycol TEXT);

the currently recommended way to implement GIN seems to be to have a column generated from the texts column:

ALTER TABLE mytable ADD COLUMN mycol_ts tsvector
  GENERATED ALWAYS AS (to_tsvector('english', mycol)) STORED;

and then index that generated column:

CREATE INDEX mycol_ts_gin_idx ON fts USING GIN (mycol_ts)

so for suffix search, you could just add a new generated column that first reverses the text to be indexed:

ALTER TABLE mytable ADD COLUMN mycol_ts tsvector
  GENERATED ALWAYS AS (to_tsvector('english', reverse(mycol))) STORED;

and then you can query as:

SELECT * FROM mytable WHERE mycol_ts @@ to_tsquery('english', 'nod:*');

to find london previously inserted with:

INSERT INTO mytable VALUES ('i like london');

Tested on PostgreSQL 16.6, Ubuntu 24.10.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.