GitXplorerGitXplorer
b

word-to-markdown

public
1464 stars
155 forks
10 issues

Commits

List of commits on branch main.
Verified
582256077f021d1721d28296acaa3e8fda98e3ba

Ask about beta

bbenbalter committed 7 months ago
Verified
b425c1bdb5823801ac06556de227cb718e79a314

monthly

bbenbalter committed a year ago
Verified
270cc35c1141c328ad5ff4d4fcaabd95aa7fc9ab

Merge pull request #177 from benbalter/dependabot/docker/ruby-3.2.2

ggithub-actions[bot] committed a year ago
Verified
c96a227de6e4b90431c0d28d39c00d23deacd92b

Bump ruby from 3.2.1 to 3.2.2

ddependabot[bot] committed a year ago
Verified
64b70ef6f60d0474841c321c8ef92784517e9ae6

Merge pull request #175 from benbalter/dependabot/docker/ruby-3.2.1

ggithub-actions[bot] committed 2 years ago
Verified
3a07d554262fcf87948810beb830717ac2bbd7a4

Bump ruby from 3.2.0 to 3.2.1

ddependabot[bot] committed 2 years ago

README

The README file for this repository.

Word to Markdown converter

A Ruby gem to liberate content from the jail that is Word documents

CI Gem Version Inline docs Build status Maintainability Test Coverage

The problem

Our default content publishing workflow is terribly broken. We've all been trained to make paper, yet today, content authored once is more commonly consumed in multiple formats, and rarely, if ever, does it embody physical form. Put another way, our go-to content authoring workflow remains relatively unchanged since it was conceived in the early 80s.

I'm asked regularly by government employees — knowledge workers who fire up a desktop word processor as the first step to any project — for an automated pipeline to convert Microsoft Word documents to Markdown, the lingua franca of the internet, but as my recent foray into building just such a converter proves, it's not that simple.

Markdown isn't just an alternative format. Markdown forces you to write for the web.

Read more

Just want to convert a Microsoft Word (or Google) document to Markdown?

You can use this hosted service (or check out its source).

Install

You'll need to install LibreOffice. Then:

gem install word-to-markdown

Usage

file = WordToMarkdown.new("/path/to/document.docx")
=> <WordToMarkdown path="/path/to/document.docx">

file.to_s
=> "# Test\n\n This is a test"

file.document.tree
=> <Nokogiri Document>

Command line usage

Once you've installed the gem, it's just:

$ w2m path/to/document.docx

Outputs the resulting markdown to stdout

Supports

  • Paragraphs
  • Numbered lists
  • Unnumbered lists
  • Nested lists
  • Italic
  • Bold
  • Explicit headings (e.g., selected as "Heading 1" or "Heading 2")
  • Implicit headings (e.g., text with a larger font size relative to paragraph text)
  • Images
  • Tables
  • Hyperlinks

Requirements and configuration

Word-to-markdown requires soffice a command line interface to LibreOffice that works on Linux, Mac, and Windows. To install soffice, see the LibreOffice documentation.

Testing

script/cibuild

Docker

First, create the Gemfile.lock by installing the dependencies:

bundle install

Everything you need to run the executable locally:

docker-compose build
docker-compose run --rm app bundle exec w2m --help
docker-compose run --rm app bundle exec w2m test/fixtures/em.docx

Hosted service

Word-to-markdown-server contains a lightweight server for converting Word Documents as a service. A live version runs at word2md.com.