GitXplorerGitXplorer
d

databrickslabs

Labs projects to accelerate use cases on the Databricks Unified Analytics Platform

36 repositories
1059 followers

Repositories

Select a repository to view its commits, contributors, and more.
public

dolly

Databricks’ Dolly, a large language model trained on the Databricks Machine Learning Platform

Python
10809
1153
5
Updated 10 hours ago
public

dbx

🧱 Databricks CLI eXtensions - aka dbx is a CLI tool for development and advanced Databricks workflows management.

Python
439
120
96
Updated 2 days ago
public

dbldatagen

Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines

Python
311
59
30
Updated 11 hours ago
public

tempo

API for manipulating time series on top of Apache Spark: lagged time values, rolling statistics (mean, avg, sum, count, etc), AS OF joins, downsampling, and interpolation

Jupyter Notebook
306
51
30
Updated 7 days ago
public

mosaic

An extension to the Apache Spark framework that allows easy and fast processing of very large geospatial datasets.

Jupyter Notebook
269
66
73
Updated a month ago
public

overwatch

Capture deep metrics on one or all assets within a Databricks workspace

Scala
226
64
160
Updated 7 days ago
public

ucx

Automated migrations to Unity Catalog

Python
219
75
132
Updated 8 hours ago
public

cicd-templates

Manage your Databricks deployments and CI with code.

Python
202
100
4
Updated 16 days ago
public

automl-toolkit

Toolkit for Apache Spark ML for Feature clean-up, feature Importance calculation suite, Information Gain selection, Distributed SMOTE, Model selection and training, Hyper parameter optimization and selection, Model interprability.

HTML
191
43
10
Updated a month ago
public

migrate

Old scripts for one-off ST-to-E2 migrations. Use "terraform exporter" linked in the readme.

Python
184
127
51
Updated a day ago
public

dlt-meta

Metadata driven Databricks Delta Live Tables framework for bronze/silver pipelines

Python
143
60
14
Updated a day ago
public

dataframe-rules-engine

Extensible Rules Engine for custom Dataframe / Dataset validation

Scala
134
30
12
Updated 3 months ago