databrickslabs
Labs projects to accelerate use cases on the Databricks Unified Analytics Platform
Repositories
Select a repository to view its commits, contributors, and more.dolly
Databricks’ Dolly, a large language model trained on the Databricks Machine Learning Platform
dbx
🧱 Databricks CLI eXtensions - aka dbx is a CLI tool for development and advanced Databricks workflows management.
dbldatagen
Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines
tempo
API for manipulating time series on top of Apache Spark: lagged time values, rolling statistics (mean, avg, sum, count, etc), AS OF joins, downsampling, and interpolation
mosaic
An extension to the Apache Spark framework that allows easy and fast processing of very large geospatial datasets.
overwatch
Capture deep metrics on one or all assets within a Databricks workspace
ucx
Automated migrations to Unity Catalog
cicd-templates
Manage your Databricks deployments and CI with code.
automl-toolkit
Toolkit for Apache Spark ML for Feature clean-up, feature Importance calculation suite, Information Gain selection, Distributed SMOTE, Model selection and training, Hyper parameter optimization and selection, Model interprability.
migrate
Old scripts for one-off ST-to-E2 migrations. Use "terraform exporter" linked in the readme.
dlt-meta
Metadata driven Databricks Delta Live Tables framework for bronze/silver pipelines
dataframe-rules-engine
Extensible Rules Engine for custom Dataframe / Dataset validation