GitXplorerGitXplorer
e

dbml-to-fides

public
7 stars
0 forks
0 issues

Commits

List of commits on branch main.
Unverified
54586785681b52c2982a9925c9e61c78a0062f42

make the key ordering with `--include-fides-keys` more friendly

eewdurbin committed 2 years ago
Unverified
ba6c4b729d948dd02ce0d6e9a890cc0256215f76

update README

eewdurbin committed 2 years ago
Unverified
93d0ee660d21576713ba49513e2d78d8d11e7aed

temporarily guard from https://github.com/pydantic/pydantic/issues/545

eewdurbin committed 2 years ago
Unverified
214e5868f67957b3785cd2d91ae871001e00b7df

include support for column notes

eewdurbin committed 2 years ago
Unverified
0a9e74f6ebbf8e3583b0f108849744bd93aee1d4

add option to include the full fides dataset keyspace

eewdurbin committed 2 years ago
Unverified
d728567ebbc84452fa9d5a8491a286d4a0fd115f

tidy up documentation

eewdurbin committed 2 years ago

README

The README file for this repository.

dbml-to-fides

This tool converts DBML to Fides dataset manifests.

It optionally has support for merging the result from DBML into an existing Fides dataset manifest.

Combined, this can be used in automation to ensure that datasets are kept up-to-date with the latest schema changes in continuous integration.

Usage

Basic

Given a sample DBML in sample.dbml:

Table users {
  id integer [primary key]
  username varchar
  role varchar
  created_at timestamp
}

Table posts {
  id integer [primary key]
  title varchar
  body text [note: 'Content of the post']
  user_id integer
  status post_status
  created_at timestamp
}

Enum post_status {
  draft
  published
  private [note: 'visible via URL only']
}

Ref: posts.user_id > users.id // many-to-one

dbml-to-fides will output what it can infer from the DBML file as a Fides dataset:

$ dbml-to-fides sample.dbml
dataset:
- name: public
  collections:
  - name: users
    description: Users
    fields:
    - name: id
      fides_meta:
        primary_key: true
    - name: username
    - name: role
    - name: created_at
  - name: posts
    description: All the content you crave
    fields:
    - name: id
      fides_meta:
        primary_key: true
    - name: title
    - name: body
      description: Content of the post
    - name: user_id
      fides_meta:
        references:
        - dataset: public
          field: users.id
          direction: to
    - name: status
    - name: created_at

Merging with existing Fides dataset

If you have an existing Fides dataset in .fides/sample_dataset.yml:

dataset:
- fides_key: sample_dataset
  organization_fides_key: default_organization
  name: public
  description: Sample dataset for my system
  meta: null
  data_categories: []
  data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified
  retention: 30 days after account deletion
  collections:
  - name: users
    description: User information
    fields:
    - name: id
      fides_meta:
        primary_key: true
      description: User's unique ID
      data_categories:
      - user.unique_id
      data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified
    - name: username
      description: User's username
      data_categories:
      - user.name
      data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified
      retention: Account termination
    - name: role
      description: User's system level role/privilege
      data_categories:
      - system.operations
      data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified
    - name: created_at
      description: User's creation timestamp
      data_categories:
      - system.operations
      data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified
  - name: posts
    description: Post information
    fields:
    - name: id
      fides_meta:
        primary_key: true
      description: Post's unique ID
      data_categories:
      - system.operations
      data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified
    - name: title
      description: Post's title
      data_categories:
      - system.operations
      data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified
    - name: body
      description: Post's body
      data_categories:
      - system.operations
      data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified
    - name: user_id
      fides_meta:
        references:
        - dataset: public
          field: users.id
          direction: to
      description: Post creator's unique User ID
      data_categories:
      - user.unique_id
      data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified
    - name: status
      description: User's creation timestamp
      data_categories:
      - system.operations
      data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified
    - name: created_at
      description: Post's creation timestamp
      data_categories:
      - system.operations
      data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified

dbml-to-fides can be used with the --base-dataset option to merge the results together. But, in this case there are no differences:

$ diff -u .fides/sample_dataset.yml <(dbml-to-fides sample.dbml --base-dataset .fides/sample_dataset.yml)
$

If we introduce a change to the DBML:

@@ -3,6 +3,7 @@ Table users {
   username varchar
   role varchar
   created_at timestamp
+  social_security_number varchar
 }
 
 Table posts {

Then running our diff again will add the field to our Fides dataset:

$ diff -u .fides/sample_dataset.yml <(dbml-to-fides sample.dbml --base-dataset .fides/sample_dataset.yml)
--- .fides/sample_dataset.yml	2023-05-22 15:39:24
+++ /dev/fd/63	2023-05-22 15:40:07
@@ -34,6 +34,7 @@
       data_categories:
       - system.operations
       data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified
+    - name: social_security_number
   - name: posts
     description: Post information
     fields:

File output

If we wanted to write the output to a file, we would add the --output-file flag:

$ dbml-to-fides sample.dbml --base-dataset .fides/sample_dataset.yml --output-file .fides/sample_dataset.yml
$ git diff
diff --git a/.fides/sample_dataset.yml b/.fides/sample_dataset.yml
index 594cee4..edc3141 100644
--- a/.fides/sample_dataset.yml
+++ b/.fides/sample_dataset.yml
@@ -34,6 +34,7 @@ dataset:
       data_categories:
       - system.operations
       data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified
+    - name: social_security_number
   - name: posts
     description: Post information
     fields:

Initial generation

If you do not have an existing Fides dataset, the --include-fides-keys flag will create a more "fleshed out" version of a Fides dataset including all keys. See the docs for what each field can/should be populated with.

$ dbml-to-fides sample.dbml --include-fides-keys
dataset:
- fides_key: null
  name: public
  description: null
  organization_fides_key: null
  meta: {}
  third_country_transfers: []
  joint_controller: []
  retention: null
  data_categories: []
  data_qualifiers: []
  collections:
  - name: users
    description: Users
    data_categories: []
    data_qualifiers: []
    retention: null
    fields:
    - name: id
      description: null
      data_categories: []
      data_qualifier: null
      retention: null
      fides_meta:
        primary_key: true
    - name: username
      description: null
      data_categories: []
      data_qualifier: null
      retention: null
    - name: role
      description: null
      data_categories: []
      data_qualifier: null
      retention: null
    - name: created_at
      description: null
      data_categories: []
      data_qualifier: null
      retention: null
    - name: social
      description: null
      data_categories: []
      data_qualifier: null
      retention: null
  - name: posts
    description: All the content you crave
    data_categories: []
    data_qualifiers: []
    retention: null
    fields:
    - name: id
      description: null
      data_categories: []
      data_qualifier: null
      retention: null
      fides_meta:
        primary_key: true
    - name: title
      description: null
      data_categories: []
      data_qualifier: null
      retention: null
    - name: body
      description: Content of the post
      data_categories: []
      data_qualifier: null
      retention: null
    - name: user_id
      description: null
      data_categories: []
      data_qualifier: null
      retention: null
      fides_meta:
        references:
        - dataset: public
          field: users.id
          direction: to
    - name: status
      description: null
      data_categories: []
      data_qualifier: null
      retention: null
    - name: created_at
      description: null
      data_categories: []
      data_qualifier: null
      retention: null