Database

AWS S3


AWS S3 is an object storage service offering industry-leading scalability, data availability, security, and performance. It is read-only and supports below file formats:

The S3 Wrapper allows you to read data of below formats from S3 within your Postgres database.

  1. CSV - with or without header line
  2. JSON Lines
  3. Parquet

The S3 Wrapper also supports below compression algorithms:

  1. gzip
  2. bzip2
  3. xz
  4. zlib

Note for CSV and JSONL files: currently all columns in S3 files must be defined in the foreign table and their types must be text type.

Note for Parquet files: the whole Parquet file will be loaded into local memory if it is compressed, so keep the file size as small as possible.

Supported Data Types For Parquet File

The S3 Wrapper uses Parquet file data types from arrow_array::types, below are their mappings to Postgres data types.

Postgres TypeParquet Type
booleanBooleanType
charInt8Type
smallintInt16Type
realFloat32Type
integerInt32Type
double precisionFloat64Type
bigintInt64Type
numericFloat64Type
textByteArrayType
dateDate64Type
timestampTimestampNanosecondType
timestamptzTimestampNanosecondType

Preparation

Before you can query S3, you need to enable the Wrappers extension and store your credentials in Postgres.

Enable Wrappers

Make sure the wrappers extension is installed on your database:


_10
create extension if not exists wrappers with schema extensions;

Enable the S3 Wrapper

Enable the s3_wrapper FDW:


_10
create foreign data wrapper s3_wrapper
_10
handler s3_fdw_handler
_10
validator s3_fdw_validator;

Store your credentials (optional)

By default, Postgres stores FDW credentials inside pg_catalog.pg_foreign_server in plain text. Anyone with access to this table will be able to view these credentials. Wrappers is designed to work with Vault, which provides an additional level of security for storing credentials. We recommend using Vault to store your credentials.


_14
-- Save your AWS credential in Vault and retrieve the `key_id`
_14
insert into vault.secrets (name, secret)
_14
values (
_14
'vault_access_key_id',
_14
'<access key id>'
_14
)
_14
returning key_id;
_14
_14
insert into vault.secrets (name, secret)
_14
values (
_14
'vault_secret_access_key',
_14
'<secret access key>'
_14
)
_14
returning key_id;

Connecting to S3

We need to provide Postgres with the credentials to connect to S3, and any additional options. We can do this using the create server command:


_10
create server s3_server
_10
foreign data wrapper s3_wrapper
_10
options (
_10
vault_access_key_id '<your vault_access_key_id from above>',
_10
vault_secret_access_key '<your vault_secret_access_key from above>',
_10
aws_region 'us-east-1'
_10
);

The full list of options are below:

  • aws_access_key_id (required) - Your access key
  • aws_secret_access_key (required) - Your secret key
  • aws_region (required) - The region of your bucket (if providing an endpoint URL with a region in it, make sure that they are the same)
  • endpoint_url (optional) - An optional URL to allow connection to S3-compliant providers (i.e. Wasabi, Cloudflare R2, Backblaze B2, DigitalOcean Spaces)
  • path_style_url (optional) - Whether to use path-style URL access. This is required by some S3-compliant providers. true or false, default is false.

Required S3 permissions

Below S3 permissions are needed:

  • s3:GetObject
  • s3:GetObjectAttributes

If the bucket is versioned, we also need:

  • s3:GetObjectVersion
  • s3:GetObjectVersionAttributes

Connecting to S3-compliant Providers - Supabase Storage


_10
create server s3_server
_10
foreign data wrapper s3_wrapper
_10
options (
_10
aws_access_key_id '<supabase_storage_access_key>',
_10
aws_secret_access_key '<supabase_storage_secret_key>',
_10
aws_region 'eu-central-1',
_10
endpoint_url 'https://<project_ref>.supabase.co/storage/v1/s3',
_10
path_style_url 'true'
_10
);

Connecting to S3-compliant Providers - Wasabi


_10
create server s3_server
_10
foreign data wrapper s3_wrapper
_10
options (
_10
aws_access_key_id '<your_wasabi_access_key>',
_10
aws_secret_access_key '<your_wasabi_secret_key>',
_10
aws_region 'eu-central-1',
_10
endpoint_url 'https://s3.eu-central-1.wasabisys.com'
_10
);

Create a schema

We recommend creating a schema to hold all the foreign tables:


_10
create schema if not exists s3;

Options

The following options are available when creating S3 foreign tables:

  • uri - S3 URI, required. For example, s3://bucket/s3_table.csv
  • format - File format, required. csv, jsonl, or parquet
  • has_header - If the CSV file has header, optional. true or false, default is false
  • compress - Compression algorithm, optional. One of gzip, bzip2, xz, zlib, default is no compression

Entities

CSV Files

This is an object representing CSV files in S3.

Ref: AWS S3 docs

Operations

ObjectSelectInsertUpdateDeleteTruncate
CSV

Usage


_13
create foreign table s3.table_csv (
_13
name text,
_13
sex text,
_13
age text,
_13
height text,
_13
weight text
_13
)
_13
server s3_server
_13
options (
_13
uri 's3://bucket/s3_table.csv',
_13
format 'csv',
_13
has_header 'true'
_13
);

Notes

  • All columns must be defined in the foreign table
  • All column types must be text
  • Optional header support via has_header option
  • Supports compression (gzip, bzip2, xz, zlib)

JSON Lines Files

This is an object representing JSONL files in S3.

Ref: JSONL docs

Operations

ObjectSelectInsertUpdateDeleteTruncate
JSONL

Usage


_12
create foreign table s3.table_jsonl (
_12
name text,
_12
sex text,
_12
age text,
_12
height text,
_12
weight text
_12
)
_12
server s3_server
_12
options (
_12
uri 's3://bucket/s3_table.jsonl',
_12
format 'jsonl'
_12
);

Notes

  • All columns must be defined in the foreign table
  • All column types must be text
  • Each line must be a valid JSON object
  • Supports compression (gzip, bzip2, xz, zlib)

Parquet Files

This is an object representing Parquet files in S3.

Ref: Parquet docs

Operations

ObjectSelectInsertUpdateDeleteTruncate
Parquet

Usage


_13
create foreign table s3.table_parquet (
_13
id integer,
_13
bool_col boolean,
_13
bigint_col bigint,
_13
float_col real,
_13
date_string_col text,
_13
timestamp_col timestamp
_13
)
_13
server s3_server
_13
options (
_13
uri 's3://bucket/s3_table.parquet',
_13
format 'parquet'
_13
);

Notes

  • Not all columns need to be defined in foreign table
  • Column names must match between Parquet file and foreign table
  • Supports various Postgres data types (see Data Types section)
  • Compressed files are loaded entirely into memory
  • Supports compression (gzip, bzip2, xz, zlib)

Query Pushdown Support

This FDW doesn't support query pushdown.

Examples

Basic Example

This will create some "foreign table" inside your Postgres database can read data from S3:


_75
-- CSV file, no compression
_75
create foreign table s3.table_csv (
_75
name text,
_75
sex text,
_75
age text,
_75
height text,
_75
weight text
_75
)
_75
server s3_server
_75
options (
_75
uri 's3://bucket/s3_table.csv',
_75
format 'csv',
_75
has_header 'true'
_75
);
_75
_75
-- JSON line file, no compression
_75
create foreign table s3.table_jsonl (
_75
name text,
_75
sex text,
_75
age text,
_75
height text,
_75
weight text
_75
)
_75
server s3_server
_75
options (
_75
uri 's3://bucket/s3_table.jsonl',
_75
format 'jsonl'
_75
);
_75
_75
-- GZIP compressed CSV file
_75
create foreign table s3.table_csv_gzip (
_75
name text,
_75
sex text,
_75
age text,
_75
height text,
_75
weight text
_75
)
_75
server s3_server
_75
options (
_75
uri 's3://bucket/s3_table.csv.gz',
_75
format 'csv',
_75
has_header 'true',
_75
compress 'gzip'
_75
);
_75
_75
-- Parquet file, no compression
_75
create foreign table s3.table_parquet (
_75
id integer,
_75
bool_col boolean,
_75
bigint_col bigint,
_75
float_col real,
_75
date_string_col text,
_75
timestamp_col timestamp
_75
)
_75
server s3_server
_75
options (
_75
uri 's3://bucket/s3_table.parquet',
_75
format 'parquet'
_75
);
_75
_75
-- GZIP compressed Parquet file
_75
create foreign table s3.table_parquet_gz (
_75
id integer,
_75
bool_col boolean,
_75
bigint_col bigint,
_75
float_col real,
_75
date_string_col text,
_75
timestamp_col timestamp
_75
)
_75
server s3_server
_75
options (
_75
uri 's3://bucket/s3_table.parquet.gz',
_75
format 'parquet',
_75
compress 'gzip'
_75
);

Read from Supabase Storage

This example will read a CSV file stored on Supabase Storage. The access information can be found on Supabase Storage settings page.


_20
create server s3_server
_20
foreign data wrapper s3_wrapper
_20
options (
_20
aws_access_key_id '<access key ID>',
_20
aws_secret_access_key '<secret access key>',
_20
aws_region '<region>',
_20
endpoint_url 'https://<project_ref>.supabase.co/storage/v1/s3',
_20
path_style_url 'true'
_20
);
_20
_20
create foreign table s3.supabase_table_csv (
_20
id text,
_20
name text
_20
)
_20
server s3_server
_20
options (
_20
uri 's3://mybucket/myfile.csv',
_20
format 'csv',
_20
has_header 'true'
_20
);