AI & Vectors

Structured and Unstructured

Supabase is flexible enough to associate structured and unstructured metadata with embeddings.


Most vector stores treat metadata associated with embeddings like NoSQL, unstructured data. Supabase is flexible enough to store unstructured and structured metadata.

Structured


_11
create table docs (
_11
id uuid primary key,
_11
embedding vector(3),
_11
content text,
_11
url text
_11
);
_11
_11
insert into docs
_11
(id, embedding, content, url)
_11
values
_11
('79409372-7556-4ccc-ab8f-5786a6cfa4f7', array[0.1, 0.2, 0.3], 'Hello world', '/hello-world');

Notice that we've associated two pieces of metadata, content and url, with the embedding. Those fields can be filtered, constrained, indexed, and generally operated on using the full power of SQL. Structured metadata fits naturally with a traditional Supabase application, and can be managed via database migrations.

Unstructured


_14
create table docs (
_14
id uuid primary key,
_14
embedding vector(3),
_14
meta jsonb
_14
);
_14
_14
insert into docs
_14
(id, embedding, meta)
_14
values
_14
(
_14
'79409372-7556-4ccc-ab8f-5786a6cfa4f7',
_14
array[0.1, 0.2, 0.3],
_14
'{"content": "Hello world", "url": "/hello-world"}'
_14
);

An unstructured approach does not specify the metadata fields that are expected. It stores all metadata in a flexible json/jsonb column. The tradeoff is that the querying/filtering capabilities of a schemaless data type are less flexible than when each field has a dedicated column. It also pushes the burden of metadata data integrity onto application code, which is more error prone than enforcing constraints in the database.

The unstructured approach is recommended:

  • for ephemeral/interactive workloads e.g. data science or scientific research
  • when metadata fields are user-defined or unknown
  • during rapid prototyping

Client libraries like python's vecs use this structure. For example, running:


_14
#!/usr/bin/env python3
_14
import vecs
_14
_14
# In practice, do not hard-code your password. Use environment variables.
_14
DB_CONNECTION = "postgresql://<user>:<password>@<host>:<port>/<db_name>"
_14
_14
# create vector store client
_14
vx = vecs.create_client(DB_CONNECTION)
_14
_14
docs = vx.get_or_create_collection(name="docs", dimension=1536)
_14
_14
docs.upsert(vectors=[
_14
('79409372-7556-4ccc-ab8f-5786a6cfa4f7', [100, 200, 300], { url: '/hello-world' })
_14
])

automatically creates the unstructured SQL table during the call to get_or_create_collection.

Note that when working with client libraries that emit SQL DDL, like create table ..., you should add that SQL to your migrations when moving to production to maintain a single source of truth for your database's schema.

Hybrid

The structured metadata style is recommended when the fields being tracked are known in advance. If you have a combination of known and unknown metadata fields, you can accommodate the unknown fields by adding a json/jsonb column to the table. In that situation, known fields should continue to use dedicated columns for best query performance and throughput.


_18
create table docs (
_18
id uuid primary key,
_18
embedding vector(3),
_18
content text,
_18
url string,
_18
meta jsonb
_18
);
_18
_18
insert into docs
_18
(id, embedding, content, url, meta)
_18
values
_18
(
_18
'79409372-7556-4ccc-ab8f-5786a6cfa4f7',
_18
array[0.1, 0.2, 0.3],
_18
'Hello world',
_18
'/hello-world',
_18
'{"key": "value"}'
_18
);

Choosing the right model

Both approaches create a table where you can store your embeddings and some metadata. You should choose the best approach for your use-case. In summary:

  • Structured metadata is best when fields are known in advance or query patterns are predictable e.g. a production Supabase application
  • Unstructured metadata is best when fields are unknown/user-defined or when working with data interactively e.g. exploratory research

Both approaches are valid, and the one you should choose depends on your use-case.