pgcollection is a memory optimized data type for PostgreSQL. The primary usage is a high performance data structure inside of plpglsql functions. Like other PostgreSQL data types, a collection can be a column of a table, but there are no operators.
A collection is a set of key-value pairs. Each key is a unique string of type
text
. Entries are stored in creation order. A collection can hold an
unlimited number of elements, constrained by the memory available to the
database. A collection is stored as a PostgreSQL varlena
limiting the
maximum size to 1GB if the structure was persisted to a column in a table.
The value of an element can be any PostgreSQL type including composite types
with a default of type text
. All elements in a collection must be of the
same type.
NOTE: pgcollection is currently in beta and not recommended for production use. Please test and report any issues or concerns so they can be addressed.
DO
$$
DECLARE
t_capital collection;
BEGIN
t_capital['USA'] := 'Washington, D.C.';
t_capital['United Kingdom'] := 'London';
t_capital['Japan'] := 'Tokyo';
t_capital := first(t_capital);
WHILE NOT isnull(t_capital) LOOP
RAISE NOTICE 'The capital of % is %', key(t_capital), value(t_capital);
t_capital := next(t_capital);
END LOOP;
END
$$;
Compile and install the extension (supports PostgreSQL 14+)
cd /tmp
git clone https://github.com/aws/pgcollection.git
cd pgcollection
make
make install
Enable the extension (do this once in each database where you want to use it)
CREATE EXTENSION collection;
Function Name | Return Type | Description |
---|---|---|
add(collection, text, text) | collection | Adds an text item to a collection |
add(collection, text, anyelement) | collection | Adds an anyelement item to a collection |
count(collection) | int4 | Returns the number of items in a collection |
delete(collection, text) | collection | Deletes an item from a collection |
find(collection, text) | text | Returns a text item from a collection if it exists |
find(collection, text, anyelement) | anyelement | Returns an anyelement item from a collection if it exists |
first(collection) | collection | Sets the collection iterator to the first item |
next(collection) | collection | Sets the collection iterator to the next item |
prev(collection) | collection | Stes the collection iterator to the previous item |
copy(collection) | collection | Returns a copy of a collection without a context switch |
sort(collection) | collection | Sorts a collection by the keys in the default collation order and points to the first item |
isnull(collection) | bool | Returns true if the current location of the iterator is null |
key(collection) | text | Returns the key of the item the collection is pointed at |
value(collection) | text | Returns the value as text of the item the collection is pointed at |
value(collection, anyelement) | anyelement | Returns the value as anyelement of the item the collection is pointed at |
keys_to_table(collection) | SETOF text | Returns all of the keys in the collection |
values_to_table(collection) | SETOF text | Returns all of the values as text in the collection |
values_to_table(collection, anyelement) | SETOF anyelement | Returns all of the values as anyelement in the collection |
to_table(collection) | TABLE(text, text) | Returns all of the keys and values as text in the collection |
to_table(collection, anyelement) | TABLE(text, anyelement) | Returns all of the keys and values as anyelement in the collection |
In addition to these functions, collections can be subscripted, allowing them
to act like associative arrays. Only a single subscript of type text
can be
specified. It is interpreted as a key and the corresponding value is fetched
or assigned.
DO
$$
DECLARE
t_capital collection;
BEGIN
t_capital['USA'] := 'Washington, D.C.';
RAISE NOTICE 'The capital of USA is %', t_capital['USA'];
END
$$;
If the subscript is null
, the current element will be fetched.
DO
$$
DECLARE
t_capital collection;
BEGIN
t_capital['USA'] := 'Washington, D.C.';
RAISE NOTICE 'The current capital is %', t_capital[null];
END
$$;
The default type of a collection's element is text
, however it can contain
any valid PostgreSQL type. The type can be set in two ways. The first is by
explicitly setting it as the type modifier when declaring the collection. If
no type modifier is defined, the type of the first element added to the
collection will define the type of the collection.
DO
$$
DECLARE
c1 collection('date');
BEGIN
c1['k1'] := '1999-12-31';
RAISE NOTICE 'The next value of c1 is %', c1['k1'] + 1;
END
$$;
In addition to finding specific elements in a collection, a collection can be a traversed by iterating over it. When a collection is initially defined, the position of the implicit iterator is pointing at the first element added to the collection.
The key
and value
functions act on the current position of the iterator and
will return the current key and value respectively. Using a null
subscript will
have the same results of calling the value
function.
DO
$$
DECLARE
t_capital collection;
r record;
BEGIN
t_capital['USA'] := 'Washington, D.C.';
t_capital['United Kingdom'] := 'London';
t_capital['Japan'] := 'Tokyo';
RAISE NOTICE 'The current element is %', key(t_capital);
END
$$;
Before starting to iterate over a collection, it is good practice the ensure
that the iterator is positioned at the start of the collection. The first
function will return a reference to the collection at the first element. If
there is a need to iterate over a collection sorted by the value of the keys
instead of the order the elements were added, the sort
function will sort
the collection and return a reference to the first sorted element.
DO
$$
DECLARE
t_capital collection;
r record;
BEGIN
t_capital['USA'] := 'Washington, D.C.';
t_capital['United Kingdom'] := 'London';
t_capital['Japan'] := 'Tokyo';
t_capital := sort(t_capital);
RAISE NOTICE 'The current element is %', key(t_capital);
END
$$;
When iterating over a collection, control may be needed on how to move over
the collection. The next
function will move the reference to the collection
to the next element in either added order or sorted order depending on if
the sort
function was called. The prev
function will move the reference
to the previous element in the collection.
DO
$$
DECLARE
t_capital collection;
r record;
BEGIN
t_capital['USA'] := 'Washington, D.C.';
t_capital['United Kingdom'] := 'London';
t_capital['Japan'] := 'Tokyo';
t_capital := next(t_capital);
RAISE NOTICE 'The current element is %', key(t_capital);
END
$$;
While iterating over a collection, an indicator is needed for when the end of
the collection is reached. The isnull
function will return a true value once
the iterator reaches the end of the collection.
DO
$$
DECLARE
t_capital collection;
BEGIN
t_capital['USA'] := 'Washington, D.C.';
t_capital['United Kingdom'] := 'London';
t_capital['Japan'] := 'Tokyo';
t_capital := sort(t_capital);
WHILE NOT isnull(t_capital) LOOP
RAISE NOTICE 'The current element is %', key(t_capital);
t_capital := next(t_capital);
END LOOP;
END
$$;
In addition to the iterator functions, the entirety of the collection can be leveraged through the use of set returning functions. There are three variations that will return a list of keys, values or both in a result set.
DO
$$
DECLARE
t_capital collection;
r record;
BEGIN
t_capital['USA'] := 'Washington, D.C.';
t_capital['United Kingdom'] := 'London';
t_capital['Japan'] := 'Tokyo';
FOR r IN SELECT * FROM keys_to_table(t_capital) AS k
LOOP
RAISE NOTICE 'The current element is %', r.k;
END LOOP;
END
$$;
A common practice is to load data into a collection at the start of a function
so it can be repeatedly used without needing to access the source tables
again. For the best performance, use the plpgsql FOR
construct with the
query
defined in the loop. This will create an implicit cursor and utilize prefetching
on the query results.
DO
$$
DECLARE
r pg_tablespace%ROWTYPE;
c collection('pg_tablespace');
BEGIN
FOR r IN SELECT pg_tablespace.*
FROM pg_tablespace
LOOP
c[r.spcname] = r;
END LOOP;
RAISE NOTICE 'The owner of pg_default is %', c['pg_default'].spcowner::regrole;
END
$$;
Data contained in a collection can be used to perform DML operations for each element in the collection. While the collection can be iterated and the DML performed inside of a loop, that pattern can be a performance bottleneck for large collections. A more efficient method is to use the set returning functions as part of the DML. The eliminates the context switching between the function context and the SQL engine beyond the initial statement call.
DO
$$
DECLARE
t_capital collection;
r record;
BEGIN
t_capital['USA'] := 'Washington, D.C.';
t_capital['United Kingdom'] := 'London';
t_capital['Japan'] := 'Tokyo';
UPDATE countries
SET capital = col.value
FROM to_table(t_capital) AS col
WHERE countries.name = col.key;
END
$$;
pgcollection
is a performance feature so having observaility into how things
are operating can assist in finding bottlenecks or wider performance problems.
PostgreSQL 17 introduced custom wait events enabling extensions to define
events specific to the extension. pgcollection
introduces a number of
custom wait events allowing detailed monitoring. Prior to PostgreSQL 17,
all of the wait events are rolled up into the common Extension:extension
wait event.
Wait Event | Description |
---|---|
CollectionCalculatingFlatSize | Calculating the size of the flat collection format before a context switch |
CollectionFlatten | Converting a collection to the flat format to move to a new context |
CollectionExpand | Expanding a flat collection to an optimized expanded format |
CollectionCast | Casting a collection to or from a typemod specified collection |
CollectionAdd | Adding an item to a collection |
CollectionCount | Returning the number of items in a collection |
CollectionFind | Finding an item in a collection |
CollectionDelete | Deleting an item from a collection |
CollectionSort | Sorting a collection |
CollectionCopy | Copying a collection |
CollectionValue | Returning the value at the current location of a collection |
CollectionToTable | Converting a collection to a table format |
CollectionFetch | Fetching an item in a collection using a subscript |
CollectionAssign | Assigning a new item to a collection using a subscript |
CollectionInput | Converting a collection from a string format to an optimized expanded format |
CollectionOutput | Converting a collection to a string format for output |
The collection_stats
view provides the following fields for a session:
fieldname | description |
---|---|
add | The number of adds or assigns to a collection for a session |
context_switch | The number of times a collection switches to a different memory context |
delete | The number of deletes from a collection for a session |
find | The number of finds or fetches from a collection for a session |
sort | The number of collection sorts for a session |
The collection_stats_reset()
function removes all stored statistics for the session
We welcome and encourage contributions to pgcollection
!
See our contribution guide for more information on how to report issues, set up a development environment, and submit code.
We adhere to the Amazon Open Source Code of Conduct.
See CONTRIBUTING for more information.
This project is licensed under the Apache-2.0 License.
pgcollection makes use of the following open source project: