GitXplorerGitXplorer
m

Orion

public
0 stars
0 forks
0 issues

Commits

List of commits on branch main.
Verified
55e2984ef39990eb8ea0abc1983d0c7728a71506

Handling for console and Neo4j visitors

mmanojkumarvohra committed 3 years ago
Verified
8386450ad4b40d95474749668d7f0fd5488453b8

initial release

mmanojkumarvohra committed 3 years ago
Verified
d1dd6fa9478f5a356ee046ff197e34d4884ea479

Update .gitignore

mmanojkumarvohra committed 3 years ago
Verified
05efa2a635abdd3108824178c31fb0fa80b44d6e

Added Neo4j lineage configuration

mmanojkumarvohra committed 3 years ago
Verified
24d714514dca7db3e04bb843ea066f002bf54bad

Icon Image

mmanojkumarvohra committed 3 years ago
Unverified
c7da0fd290e3acf40228e398486c6e1b4ad6c3ac

First release

mmanojkumarvohra committed 3 years ago

README

The README file for this repository.

Orion

Configurable data lineage solution for Apache Spark

About Project

The project aims at providing pluggable solution for Apache spark projects to capture the lineage of the data assets being sourced and created in the application. The project supports both Dataframe APIs as well as raw queries submitted using spark sql APIs.

Event Handling

The captured lineage events can be forwarded to variety of supported backends.

The supported backends are:

  • Console: It is primarily used for debugging purpose and it will log the captured events
  • Neo4J: It will push the lineage events towards Neo4j and store the lineage graph as Neo4j graphs
  • HTTP: It will push the events to an API server.
  • Kafka: It will push the events to a Kafka topic.

Configuration

Orion can be plugged into any Spark application using the below configuration which can be set at cluster level or individual Spark conf level:

spark.sql.queryExecutionListeners=com.mkv.ds.orion.core.listeners.OrionQueryInterceptor

  • Console backend can be configured with spark configuration: spark.orion.lineage.backend.handler=Console
  • Neo4J backend can be configured with spark configuration: spark.orion.lineage.backend.handler=Neo4J
  • HTTP backend can be configured with spark configuration: spark.orion.lineage.backend.handler=HTTP
  • Kafka backend can be configured with spark configuration: spark.orion.lineage.backend.handler=KAFKA

[Kafka & HTTP backend implementations to be added in a future release]

Neo4J Configuration

To push event towards a Neo4J database below configurations have to be set:

spark.orion.lineage.backend.handler="NEO4J"
spark.orion.lineage.neo4j.backend.uri="bolt://localhost:7687"
spark.orion.lineage.neo4j.backend.username="neo4j"
spark.orion.lineage.neo4j.backend.password="password"

Running OrionTestDriver class with Neo4j backend will capture the below lineage in Neo4j