GitXplorerGitXplorer
e

dbml-to-fides

public
7 stars
0 forks
0 issues

Commits

List of commits on branch main.
Verified
f63a31ea536f2b75e2bfc5d0a69fa2d209801c96

versioneer -> setuptools_scm

eewdurbin committed 2 years ago
Verified
a862fed507a73bf27d832d6a599e0d34c9de8bc4

unpin typing_extensions

eewdurbin committed 2 years ago
Verified
d41348ed9aaaddaba6a6825fb1f193027ee4577e

Signed commit

eewdurbin committed 2 years ago
Unverified
a7a844c139845020932337d0a0b64ec498f10e04

test coverage for table notes

eewdurbin committed 2 years ago
Unverified
1990f52a0f48eb14e953050cb91bde5e63a5dab2

format with black

eewdurbin committed 2 years ago
Unverified
00225bc911d664acbf5534f6073c8c52ee80a1f6

set version of pydbml

eewdurbin committed 2 years ago

README

The README file for this repository.

dbml-to-fides

This tool converts DBML to Fides dataset manifests.

It optionally has support for merging the result from DBML into an existing Fides dataset manifest.

Combined, this can be used in automation to ensure that datasets are kept up-to-date with the latest schema changes in continuous integration.

Usage

Basic

Given a sample DBML in sample.dbml:

Table users {
  id integer [primary key]
  username varchar
  role varchar
  created_at timestamp
}

Table posts {
  id integer [primary key]
  title varchar
  body text [note: 'Content of the post']
  user_id integer
  status post_status
  created_at timestamp
}

Enum post_status {
  draft
  published
  private [note: 'visible via URL only']
}

Ref: posts.user_id > users.id // many-to-one

dbml-to-fides will output what it can infer from the DBML file as a Fides dataset:

$ dbml-to-fides sample.dbml
dataset:
- name: public
  collections:
  - name: users
    description: Users
    fields:
    - name: id
      fides_meta:
        primary_key: true
    - name: username
    - name: role
    - name: created_at
  - name: posts
    description: All the content you crave
    fields:
    - name: id
      fides_meta:
        primary_key: true
    - name: title
    - name: body
      description: Content of the post
    - name: user_id
      fides_meta:
        references:
        - dataset: public
          field: users.id
          direction: to
    - name: status
    - name: created_at

Merging with existing Fides dataset

If you have an existing Fides dataset in .fides/sample_dataset.yml:

dataset:
- fides_key: sample_dataset
  organization_fides_key: default_organization
  name: public
  description: Sample dataset for my system
  meta: null
  data_categories: []
  data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified
  retention: 30 days after account deletion
  collections:
  - name: users
    description: User information
    fields:
    - name: id
      fides_meta:
        primary_key: true
      description: User's unique ID
      data_categories:
      - user.unique_id
      data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified
    - name: username
      description: User's username
      data_categories:
      - user.name
      data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified
      retention: Account termination
    - name: role
      description: User's system level role/privilege
      data_categories:
      - system.operations
      data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified
    - name: created_at
      description: User's creation timestamp
      data_categories:
      - system.operations
      data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified
  - name: posts
    description: Post information
    fields:
    - name: id
      fides_meta:
        primary_key: true
      description: Post's unique ID
      data_categories:
      - system.operations
      data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified
    - name: title
      description: Post's title
      data_categories:
      - system.operations
      data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified
    - name: body
      description: Post's body
      data_categories:
      - system.operations
      data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified
    - name: user_id
      fides_meta:
        references:
        - dataset: public
          field: users.id
          direction: to
      description: Post creator's unique User ID
      data_categories:
      - user.unique_id
      data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified
    - name: status
      description: User's creation timestamp
      data_categories:
      - system.operations
      data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified
    - name: created_at
      description: Post's creation timestamp
      data_categories:
      - system.operations
      data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified

dbml-to-fides can be used with the --base-dataset option to merge the results together. But, in this case there are no differences:

$ diff -u .fides/sample_dataset.yml <(dbml-to-fides sample.dbml --base-dataset .fides/sample_dataset.yml)
$

If we introduce a change to the DBML:

@@ -3,6 +3,7 @@ Table users {
   username varchar
   role varchar
   created_at timestamp
+  social_security_number varchar
 }
 
 Table posts {

Then running our diff again will add the field to our Fides dataset:

$ diff -u .fides/sample_dataset.yml <(dbml-to-fides sample.dbml --base-dataset .fides/sample_dataset.yml)
--- .fides/sample_dataset.yml	2023-05-22 15:39:24
+++ /dev/fd/63	2023-05-22 15:40:07
@@ -34,6 +34,7 @@
       data_categories:
       - system.operations
       data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified
+    - name: social_security_number
   - name: posts
     description: Post information
     fields:

File output

If we wanted to write the output to a file, we would add the --output-file flag:

$ dbml-to-fides sample.dbml --base-dataset .fides/sample_dataset.yml --output-file .fides/sample_dataset.yml
$ git diff
diff --git a/.fides/sample_dataset.yml b/.fides/sample_dataset.yml
index 594cee4..edc3141 100644
--- a/.fides/sample_dataset.yml
+++ b/.fides/sample_dataset.yml
@@ -34,6 +34,7 @@ dataset:
       data_categories:
       - system.operations
       data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified
+    - name: social_security_number
   - name: posts
     description: Post information
     fields:

Initial generation

If you do not have an existing Fides dataset, the --include-fides-keys flag will create a more "fleshed out" version of a Fides dataset including all keys. See the docs for what each field can/should be populated with.

$ dbml-to-fides sample.dbml --include-fides-keys
dataset:
- fides_key: null
  name: public
  description: null
  organization_fides_key: null
  meta: {}
  third_country_transfers: []
  joint_controller: []
  retention: null
  data_categories: []
  data_qualifiers: []
  collections:
  - name: users
    description: Users
    data_categories: []
    data_qualifiers: []
    retention: null
    fields:
    - name: id
      description: null
      data_categories: []
      data_qualifier: null
      retention: null
      fides_meta:
        primary_key: true
    - name: username
      description: null
      data_categories: []
      data_qualifier: null
      retention: null
    - name: role
      description: null
      data_categories: []
      data_qualifier: null
      retention: null
    - name: created_at
      description: null
      data_categories: []
      data_qualifier: null
      retention: null
    - name: social
      description: null
      data_categories: []
      data_qualifier: null
      retention: null
  - name: posts
    description: All the content you crave
    data_categories: []
    data_qualifiers: []
    retention: null
    fields:
    - name: id
      description: null
      data_categories: []
      data_qualifier: null
      retention: null
      fides_meta:
        primary_key: true
    - name: title
      description: null
      data_categories: []
      data_qualifier: null
      retention: null
    - name: body
      description: Content of the post
      data_categories: []
      data_qualifier: null
      retention: null
    - name: user_id
      description: null
      data_categories: []
      data_qualifier: null
      retention: null
      fides_meta:
        references:
        - dataset: public
          field: users.id
          direction: to
    - name: status
      description: null
      data_categories: []
      data_qualifier: null
      retention: null
    - name: created_at
      description: null
      data_categories: []
      data_qualifier: null
      retention: null