GitXplorerGitXplorer
o

rodimus

public
28 stars
4 forks
0 issues

Commits

List of commits on branch develop.
Unverified
a2f77b240cea3ba979b089b715fe8542e2304211

Merge pull request #5 from nevern02/updates

oopenmailbox committed 9 years ago
Unverified
5037d21185c4dc2d14843f1301f2a02c0235e6e6

Version bump to 1.3.1

oopenmailbox committed 9 years ago
Unverified
ffdc53ea80f158ccaf575f13df46674b4cb03b3c

Bump development gems to new versions.

oopenmailbox committed 9 years ago
Unverified
213e58426d2848bdf7f005fe4c6c0db902922299

Update dev ruby version and CI ruby versions.

oopenmailbox committed 9 years ago
Unverified
5fb9a7ec9eab2bad9b0f8cb3be3efaaf81e275d4

Update readme.

oopenmailbox committed 10 years ago
Unverified
69ab9283fa97d03e90f71b559323aefda3c2f771

Version bump to 1.3

oopenmailbox committed 10 years ago

README

The README file for this repository.

Rodimus

Gem Version Build Status

ETL stands for Extract-Transform-Load. Sometimes, you have data in Source A that needs to be moved to Destination B. Along the way, it needs to be manipulated in some way. This is a common scenario when working with a data warehouse. There are lots of ETL solutions in the wild, but very few of them are open source. None of them (that I know of) are Ruby. So, I started hacking on one for my own use.

Why the name? Rodimus Prime is one of the leaders of the Autobots, and he has a cool name. Naming a data transformation library after a Transformer increases the coolness factor. It's science.

Installation

Add this line to your application's Gemfile:

gem 'rodimus'

And then execute:

$ bundle

Or install it yourself as:

$ gem install rodimus

Usage

tl;dr: See the examples directory for the quickest path to success.

require 'rodimus'
require 'csv'
require 'json'

class CsvInput < Rodimus::Step
  def before_run_set_incoming
    @incoming = CSV.open('examples/worldbank-sample.csv')
    @incoming.readline # skip the headers
  end

  def process_row(row)
    row.to_json
  end
end

class FormattedText < Rodimus::Step
  def before_run_set_stdout
    @outgoing = STDOUT.dup
  end

  def process_row(row)
    data = JSON.parse(row)
    "In #{data.first} during #{data[1]}, CO2 emissions were #{data[2]} metric tons per capita." 
  end
end

t = Rodimus::Transformation.new
s1 = CsvInput.new
s2 = FormattedText.new
t.steps << s1
t.steps << s2
t.run
puts "Transformation complete!"

A transformation is an operation that consists of many steps. Each step may manipulate the data in some way. Typically, the first step is reserved for reading from your data source, and the last step is used to write to the new destination.

In Rodimus, you create a transformation object, and then you add one or more steps to its array of steps. You typically create steps by writing your own classes that inherit from Rodimus::Step. When the transformation is subsequently run, a new process is forked for each step. On platforms that support native threads (JRuby, Rubinius), threads are used instead of forking processes. All processes are connected together using pipes except for the first and last steps (those being the source and destination steps). Each step then consumes rows of data from its incoming pipe and performs some operation on it before writing it to the outgoing pipe.

There are several methods on the Rodimus::Step class that are able to be overridden for custom processing behavior before, during, or after the each row is handled. If those aren't enough, you're also free to manipulate the input/output objects (i.e. to redirect to standard out).

The Rodimus approach is to provide a minimal, flexible framework upon which custom ETL solutions can be built. ETL is complex, and there tend to be many subtle differences between projects which can make things like establishing conventions and encouraging code reuse difficult. Rodimus is an attempt to codify those things which are probably useful to a majority of ETL projects with as little overhead as possible.

If you'd like to know the thought process behind Rodimus, check out this blog post.

Contributing

  1. Fork it ( http://github.com/nevern02/rodimus/fork )
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create new Pull Request