GitXplorerGitXplorer
a

pgcollection

public
1 stars
0 forks
0 issues

Commits

List of commits on branch main.
Unverified
54f5eeca8ab4b05ab7bc80d124c037d0fa14cc53

Add an exludes file for a pgindent run to account for the change in error handling in pgindent

jjim-mlodgenski committed 8 days ago
Unverified
c18a0bdbef1267f9121431317061c6274164133d

Clean up after running pgindent

jjim-mlodgenski committed 2 months ago
Unverified
70c6c523f404c5b65b44bf7cf29fa0ab52e86a73

Fix path issue in the test workflow

jjim-mlodgenski committed 2 months ago
Unverified
78371851a032d5a2809484fea2b284bf6db4635b

Add a test workflow

jjim-mlodgenski committed 2 months ago
Verified
19e94c6e0b83308ec9f2aacd755fa1fef2748c42

Merge pull request #3 from sharmay/MiscChanges

jjim-mlodgenski committed 2 months ago
Verified
bbe5a0ac7a99a4330969fcbf72f322eef785213c

Merge pull request #2 from sharmay/FixGcc14Compile

jjim-mlodgenski committed 2 months ago

README

The README file for this repository.

pgcollection

pgcollection is a memory optimized data type for PostgreSQL. The primary usage is a high performance data structure inside of plpglsql functions. Like other PostgreSQL data types, a collection can be a column of a table, but there are no operators.

A collection is a set of key-value pairs. Each key is a unique string of type text. Entries are stored in creation order. A collection can hold an unlimited number of elements, constrained by the memory available to the database. A collection is stored as a PostgreSQL varlena limiting the maximum size to 1GB if the structure was persisted to a column in a table.

The value of an element can be any PostgreSQL type including composite types with a default of type text. All elements in a collection must be of the same type.

NOTE: pgcollection is currently in beta and not recommended for production use. Please test and report any issues or concerns so they can be addressed.

DO
$$
DECLARE
  t_capital  collection;
BEGIN
  t_capital['USA']            := 'Washington, D.C.';
  t_capital['United Kingdom'] := 'London';
  t_capital['Japan']          := 'Tokyo';

  t_capital := first(t_capital);
  WHILE NOT isnull(t_capital) LOOP
    RAISE NOTICE 'The capital of % is %', key(t_capital), value(t_capital);
    t_capital := next(t_capital);
  END LOOP;
END
$$;

Installation

Linux and Mac

Compile and install the extension (supports PostgreSQL 14+)

cd /tmp
git clone https://github.com/aws/pgcollection.git
cd pgcollection
make
make install

Getting Started

Enable the extension (do this once in each database where you want to use it)

CREATE EXTENSION collection;

List of Functions

Function Name Return Type Description
add(collection, text, text) collection Adds an text item to a collection
add(collection, text, anyelement) collection Adds an anyelement item to a collection
count(collection) int4 Returns the number of items in a collection
delete(collection, text) collection Deletes an item from a collection
find(collection, text) text Returns a text item from a collection if it exists
find(collection, text, anyelement) anyelement Returns an anyelement item from a collection if it exists
first(collection) collection Sets the collection iterator to the first item
next(collection) collection Sets the collection iterator to the next item
prev(collection) collection Stes the collection iterator to the previous item
copy(collection) collection Returns a copy of a collection without a context switch
sort(collection) collection Sorts a collection by the keys in the default collation order and points to the first item
isnull(collection) bool Returns true if the current location of the iterator is null
key(collection) text Returns the key of the item the collection is pointed at
value(collection) text Returns the value as text of the item the collection is pointed at
value(collection, anyelement) anyelement Returns the value as anyelement of the item the collection is pointed at
keys_to_table(collection) SETOF text Returns all of the keys in the collection
values_to_table(collection) SETOF text Returns all of the values as text in the collection
values_to_table(collection, anyelement) SETOF anyelement Returns all of the values as anyelement in the collection
to_table(collection) TABLE(text, text) Returns all of the keys and values as text in the collection
to_table(collection, anyelement) TABLE(text, anyelement) Returns all of the keys and values as anyelement in the collection

Using Subscripts

In addition to these functions, collections can be subscripted, allowing them to act like associative arrays. Only a single subscript of type text can be specified. It is interpreted as a key and the corresponding value is fetched or assigned.

DO
$$
DECLARE
  t_capital  collection;
BEGIN
  t_capital['USA'] := 'Washington, D.C.';

  RAISE NOTICE 'The capital of USA is %', t_capital['USA'];
END
$$;

If the subscript is null, the current element will be fetched.

DO
$$
DECLARE
  t_capital  collection;
BEGIN
  t_capital['USA'] := 'Washington, D.C.';

  RAISE NOTICE 'The current capital is %', t_capital[null];
END
$$;

Setting The Element Type

The default type of a collection's element is text, however it can contain any valid PostgreSQL type. The type can be set in two ways. The first is by explicitly setting it as the type modifier when declaring the collection. If no type modifier is defined, the type of the first element added to the collection will define the type of the collection.

DO 
$$
DECLARE
  c1   collection('date');
BEGIN
  c1['k1'] := '1999-12-31';

  RAISE NOTICE 'The next value of c1 is %', c1['k1'] + 1;
END
$$;

Iterating Over a Collection

In addition to finding specific elements in a collection, a collection can be a traversed by iterating over it. When a collection is initially defined, the position of the implicit iterator is pointing at the first element added to the collection.

Key and Value Functions

The key and value functions act on the current position of the iterator and will return the current key and value respectively. Using a null subscript will have the same results of calling the value function.

DO
$$
DECLARE
  t_capital  collection;
  r          record;
BEGIN
  t_capital['USA']            := 'Washington, D.C.';
  t_capital['United Kingdom'] := 'London';
  t_capital['Japan']          := 'Tokyo';

  RAISE NOTICE 'The current element is %', key(t_capital);
END
$$;

First and Sort Functions

Before starting to iterate over a collection, it is good practice the ensure that the iterator is positioned at the start of the collection. The first function will return a reference to the collection at the first element. If there is a need to iterate over a collection sorted by the value of the keys instead of the order the elements were added, the sort function will sort the collection and return a reference to the first sorted element.

DO
$$
DECLARE
  t_capital  collection;
  r          record;
BEGIN
  t_capital['USA']            := 'Washington, D.C.';
  t_capital['United Kingdom'] := 'London';
  t_capital['Japan']          := 'Tokyo';

  t_capital := sort(t_capital);
  RAISE NOTICE 'The current element is %', key(t_capital);
END
$$;

Next and Prev Functions

When iterating over a collection, control may be needed on how to move over the collection. The next function will move the reference to the collection to the next element in either added order or sorted order depending on if the sort function was called. The prev function will move the reference to the previous element in the collection.

DO
$$
DECLARE
  t_capital  collection;
  r          record;
BEGIN
  t_capital['USA']            := 'Washington, D.C.';
  t_capital['United Kingdom'] := 'London';
  t_capital['Japan']          := 'Tokyo';

  t_capital := next(t_capital);
  RAISE NOTICE 'The current element is %', key(t_capital);
END
$$;

Isnull Function

While iterating over a collection, an indicator is needed for when the end of the collection is reached. The isnull function will return a true value once the iterator reaches the end of the collection.

DO
$$
DECLARE
  t_capital  collection;
BEGIN
  t_capital['USA']            := 'Washington, D.C.';
  t_capital['United Kingdom'] := 'London';
  t_capital['Japan']          := 'Tokyo';

  t_capital := sort(t_capital);
  WHILE NOT isnull(t_capital) LOOP
    RAISE NOTICE 'The current element is %', key(t_capital);
    t_capital := next(t_capital);
  END LOOP;
END
$$;

Set Returning Functions

In addition to the iterator functions, the entirety of the collection can be leveraged through the use of set returning functions. There are three variations that will return a list of keys, values or both in a result set.

DO
$$
DECLARE
  t_capital  collection;
  r          record;
BEGIN
  t_capital['USA']            := 'Washington, D.C.';
  t_capital['United Kingdom'] := 'London';
  t_capital['Japan']          := 'Tokyo';

  FOR r IN SELECT * FROM keys_to_table(t_capital) AS k 
  LOOP
    RAISE NOTICE 'The current element is %', r.k;
  END LOOP;
END
$$;

Bulk Loading Collections

A common practice is to load data into a collection at the start of a function so it can be repeatedly used without needing to access the source tables again. For the best performance, use the plpgsql FOR construct with the query defined in the loop. This will create an implicit cursor and utilize prefetching on the query results.

DO
$$
DECLARE
  r       pg_tablespace%ROWTYPE;
  c       collection('pg_tablespace');
BEGIN
  FOR r IN SELECT pg_tablespace.* 
             FROM pg_tablespace 
  LOOP
    c[r.spcname] = r;
  END LOOP;

  RAISE NOTICE 'The owner of pg_default is %', c['pg_default'].spcowner::regrole;
END
$$;

Bulk DML Operations

Data contained in a collection can be used to perform DML operations for each element in the collection. While the collection can be iterated and the DML performed inside of a loop, that pattern can be a performance bottleneck for large collections. A more efficient method is to use the set returning functions as part of the DML. The eliminates the context switching between the function context and the SQL engine beyond the initial statement call.

DO
$$
DECLARE
  t_capital  collection;
  r          record;
BEGIN
  t_capital['USA']            := 'Washington, D.C.';
  t_capital['United Kingdom'] := 'London';
  t_capital['Japan']          := 'Tokyo';

  UPDATE countries
     SET capital = col.value
    FROM to_table(t_capital) AS col
   WHERE countries.name = col.key;
END
$$;

Observability

pgcollection is a performance feature so having observaility into how things are operating can assist in finding bottlenecks or wider performance problems.

Wait Events

PostgreSQL 17 introduced custom wait events enabling extensions to define events specific to the extension. pgcollection introduces a number of custom wait events allowing detailed monitoring. Prior to PostgreSQL 17, all of the wait events are rolled up into the common Extension:extension wait event.

Wait Event Description
CollectionCalculatingFlatSize Calculating the size of the flat collection format before a context switch
CollectionFlatten Converting a collection to the flat format to move to a new context
CollectionExpand Expanding a flat collection to an optimized expanded format
CollectionCast Casting a collection to or from a typemod specified collection
CollectionAdd Adding an item to a collection
CollectionCount Returning the number of items in a collection
CollectionFind Finding an item in a collection
CollectionDelete Deleting an item from a collection
CollectionSort Sorting a collection
CollectionCopy Copying a collection
CollectionValue Returning the value at the current location of a collection
CollectionToTable Converting a collection to a table format
CollectionFetch Fetching an item in a collection using a subscript
CollectionAssign Assigning a new item to a collection using a subscript
CollectionInput Converting a collection from a string format to an optimized expanded format
CollectionOutput Converting a collection to a string format for output

Usage Statistics

The collection_stats view provides the following fields for a session:

fieldname description
add The number of adds or assigns to a collection for a session
context_switch The number of times a collection switches to a different memory context
delete The number of deletes from a collection for a session
find The number of finds or fetches from a collection for a session
sort The number of collection sorts for a session

The collection_stats_reset() function removes all stored statistics for the session

Contributing

We welcome and encourage contributions to pgcollection!

See our contribution guide for more information on how to report issues, set up a development environment, and submit code.

We adhere to the Amazon Open Source Code of Conduct.

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

Acknowledgements

pgcollection makes use of the following open source project: