1

Designing storage architecture for Petabyte-scale geospatial data; starting from scratch. Creating a MinIo cluster to store the objects in S3 buckets. To store the metadata, I’m considering the Apache Parquet format managed by PostgresSQL, extended by PostGIS. Using Parquet implies resourcing to a PostgresSQL FDW (foreign data wrapper) - ParquetS3. No doubts regarding Parquet’s main advantages when compared to "pure" PostgresSQL: higher query speed, higher compression rate, and the ability to store in S3, which for us is a significant advantage. However, I cannot find documentation or a use case of Parquet with geospatial data.

Should I worry about losing geospatial query features (the PostGIS support) due to the use of Parquet or it’s wrapper for PostgreSQL - ParquetS3?

According to PostgreSQL’s wiki, the wrapper ParquetS3 is valid for PostgreSQL, but I see no mention to the extension PostGIS. On this same PostgreSQL FDW list, there are some geo data wrappers, but these aren’t meant to use use S3/MinIO. According to this official PostGIS post, PostGIS ships with two FWD (Oracle FDW and OGR FDW), but this is a post from 2014.

The code repository for OGR FDW reads:

OGR is the vector half of the GDAL spatial data access library. It allows access to a large number of GIS data formats using a simple C API for data reading and writing. Since OGR exposes a simple table structure and PostgreSQL foreign data wrappers allow access to table structures, the fit seems pretty perfect.

This is followed by a list of limitations of the implementation.

Does the OGR FDW work below the ParquetS3 one, despite the limitations listed? Does anyone know of a FDW for geo data (PostGIS), that also allows sourcing from MinIO S3 buckets? With Parquet or equivalent? Or has anyone tested the configuration above?

I always have the alternative of creating the data store of the PostgreSQL/PostGIS outside of S3, but I’d rather have it in the MinIO cluster.

Thanks.

3
  • You'll have to try it. Note that neither the PostgreSQL Wiki nor Paul Ramsey's article are documentation. All those FDWs are individual projects, and you'll have to assess the quality of the code and whether they support your version of PostgreSQL or not. Commented May 16, 2022 at 6:24
  • @LaurenzAlbe. Noted re docs. With MinIO, you do not store the files (e.g. rasters) in PostgreSQL, but instead links to the objects, which are stored in a bucket. So I won't have PostGIS functionality (e.g. ST_Within) over objects that are not stored in Postgres (raster2pgsql) correct? Commented May 17, 2022 at 17:18
  • I'm out of my depth with regard to that specific software, but if you can query it in PostgreSQL, you will be able to use PostGIS. Commented May 17, 2022 at 22:04

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.