1

I am not sure if I have worded the question correctly. But how can we search a string in POSTGRES such that the following results can be achieved.

String to search:

Google Pvt Ltd

Data in table

symbol, company name
GOOG, Google Ltd
FACEBOOK, Facebook Corp
APPLE, Apple Inc
DELL, Dell Ltd

How do I return search result

GOOG,Google Ltd

, logic being it returns results based on the maximum words matched.

I am looking into the full text search option in POSTGRES and I can understand the tokenization using to_tsvector. But I'm not sure how to proceed after this. Is this type of searching possible?

2 Answers 2

2

You can use the pg_trgm extension.

create extension if not exists pg_trgm;

with my_table(symbol, company_name) as (
values
    ('GOOG', 'Google Ltd'),
    ('FACEBOOK', 'Facebook Corp'),
    ('APPLE', 'Apple Inc'),
    ('DELL', 'Dell Ltd')
)

select *, similarity(company_name, 'Google Pvt Ltd')
from my_table
order by similarity desc;

  symbol  | company_name  | similarity 
----------+---------------+------------
 GOOG     | Google Ltd    |   0.733333
 DELL     | Dell Ltd      |        0.2
 APPLE    | Apple Inc     |  0.0416667
 FACEBOOK | Facebook Corp |          0
(4 rows)

You can define the current similarity threshold and simply use the % operator, e.g.:

select set_limit(0.6);

select *
from my_table
where company_name % 'Google Pvt Ltd'

 symbol | company_name 
--------+--------------
 GOOG   | Google Ltd
(1 row) 
Sign up to request clarification or add additional context in comments.

4 Comments

This fits my requirement. Thanks!
@Sammy . . . This is -- no doubt -- the best solution to your problem. But your actual question is "the maximum words matched", and this doesn't do maximum words; it measures similarity.
Conceptually I feel its the same. Maybe I didn't word it properly. But it gives me the results. But can we set the limit per query instead of globally.
You can use the function and operator interchangeably, e.g. where similarity(company_name, 'Google Pvt Ltd') > 0.6 instead of where company_name % 'Google Pvt Ltd'
1

I am not sure if you need full text search for this -- that depends on performance. There are other methods, such as breaking the columns and input into words and matches directly on them.

Here is one approach that uses regexp_matches():

select v.*,
       (select count(*) from regexp_matches(symbol || ' ' || company, replace('Google Pvt Ltd', ' ', '|'), 'g')) as matches
from (values ('GOOG', 'Google Ltd'),
             ('FACEBOOK', 'Facebook Corp'),
             ('APPLE', 'Apple Inc'),
             ('DELL', 'Dell Ltd')
    ) v(symbol, company)
order by matches desc
fetch first 1 row only;

1 Comment

This works but in a couple of cases it gives me no results or unexpected ones.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.