I'm currently trying to implement a function in Postgres that will calculate the cosine distance between 2 real arrays (real[]). The arrays/vectors are 2000 elements in length.
I'll use that function to do 1 to n searches against 500.000 vectors (for now).
I'm trying to get the best performace without starting to consider throwing hardware / CPUs at the server.
I already have a successful solution outside of Postgres. I cache the data into memory and there I can do the cosine search under 1s (using dotnet core). But, getting that production ready requires a lot of development time. Before going into that I want to be sure I exhaust all Postgres options (Postgres is already being used in a lot of our micro services).
Below are the options I tested and my results:
1) plpgsql function (Postgres 10.3)
It was a big failure - took 5 minutes to search 500.000 rows - with parallelization (2 workers).
2) c function with Postgres 10.3
Huge improvement - Took 10 seconds including 2 worker parallelization
Source
#include "postgres.h"
#include "fmgr.h"
#include "math.h"
#include <utils/array.h>
#ifdef PG_MODULE_MAGIC
PG_MODULE_MAGIC;
#endif
PG_FUNCTION_INFO_V1(cosine_distance_vector);
Datum cosine_distance_vector(PG_FUNCTION_ARGS)
{
ArrayType *input1ValuesArray, *input2ValuesArray;
float4 *input1Values, *input2Values;
float4 result;
float4 dot = 0.0;
float4 denom_a = 0.0;
float4 denom_b = 0.0;
input1ValuesArray = PG_GETARG_ARRAYTYPE_P(0);
input2ValuesArray = PG_GETARG_ARRAYTYPE_P(1);
input1Values = (float4 *) ARR_DATA_PTR(input1ValuesArray);
input2Values = (float4 *) ARR_DATA_PTR(input2ValuesArray);
for(unsigned int i = 0u; i < sizeof(input1Values); ++i) {
dot += input1Values[i] * input2Values[i] ;
denom_a += input1Values[i] * input1Values[i] ;
denom_b += input2Values[i] * input2Values[i] ;
}
result = dot / (sqrt(denom_a) * sqrt(denom_b));
PG_RETURN_FLOAT4(result);
}
3) c function with Postgres 11.1
Another improvement - Took 9 seconds including 2 worker parallelization
My observations about the C function
As far as I see %90 of the time is spent on the PG_GETARG_ARRAYTYPE_P calls;
I tested it by comparing 2 implementations
Implementation 1 took 9 seconds to complete search =>
Datum cosine_distance_vector(PG_FUNCTION_ARGS)
{
ArrayType *input1ValuesArray, *input2ValuesArray;
float4 result = 0.0;
input1ValuesArray = PG_GETARG_ARRAYTYPE_P(0);
input2ValuesArray = PG_GETARG_ARRAYTYPE_P(1);
PG_RETURN_FLOAT4(result);
}
Implementation 2 took 1.5 seconds to execute
Datum cosine_distance_vector(PG_FUNCTION_ARGS)
{
float4 result = 0.0;
PG_RETURN_FLOAT4(result);
}
Is there a faster or more specific way to get the Float4 arrays/pointers into the function rather than using the generic PG_GETARG_ARRAYTYPE_P function?
I also tried to implement this function using Version 0 Calling Convention in Postgres 9.6 (10 & 11 don't support them) as it seems more efficient (low level). But, I was not able to implement the function successfully. Even the samples in the Postgres docs were throwing segmentation faults.
I'll use a separate Dockerized Postgres install for this search functionality, so I'm open for any postgres version and any sort of config trick.
Some additional info based on @LaurenzAlbe's comments.
This is the SQL query I use to find the best result:
SELECT
*
FROM
documents t
ORDER BY cosine_distance_vector(
t.vector,
ARRAY [1,1,1,....]::real[]) DESC
LIMIT 1
The array is huge, so I didn't paste it fully.
Here is the EXPLAIN (ANALYZE, BUFFERS, VERBOSE) result:
2019-01-23 Progress
I dove a little deeper into the source code of Postgres, and focused on why the the cosine function was running slower when the PG_GETARG_ARRAYTYPE_P function was being called.
So, I came across this function being called at some point in fmgr.c:
struct varlena *
pg_detoast_datum(struct varlena *datum)
{
if (VARATT_IS_EXTENDED(datum))
return heap_tuple_untoast_attr(datum);
else
return datum;
}
If your column's storage type is extended that adds a significant over head.
The row size of the vector table in total was more than 8192 bytes, which is Postgres default Block Size. That's why the vector column storage type was automatically selected as EXTENDED. I tried to convert it to PLAIN and without any error it worked. I tried the query and 500ms!
But, because my row size was now more than 8192 (although the storage type was successfully converted PLAIN) I couldn't add new rows to the table anymore, on INSERTs it started complaining about row size being to big.
Next step, I compiled postgres with 16KB blocksize (took me some time). At the end, I was able to create the perfect table with PLAIN vector storage with INSERTs working.
I tested the query with 100K rows increments. First 100K rows, it took 50ms to run. At 200K rows it took 4 seconds! - Now, I think because of the 16K block size I need to find the prefect balance of .conf file settings. I cannot optimise the function anymore.

ORDER BY length(t.vector::text) * random()and compare? It may be worth experimenting withjit = onon PostgreSQL v11.for(unsigned int i = 0u; i < sizeof(input1Values); ++i) {...}:: sizeof(pointer)will probably be 4 or 8. (the bad news: the real thing will possibly be a bit slower ;-)