GitXplorerGitXplorer
T

youtube-audiobook-chapter-identifier

public
0 stars
0 forks
3 issues

Commits

List of commits on branch main.
Verified
201f8272d69b0563ab6e28d7ee2f17f941a61554

Update README.md

TThisIsDjonathan committed 7 months ago
Unverified
b5b24e52292a5462cc65cab2322300942e65c710

updating readme

TThisIsDjonathan committed 7 months ago
Unverified
ed0b26767fbdb911f7fd71919fcccf396ea59baa

updating readme

TThisIsDjonathan committed 7 months ago
Unverified
3c39db35d6f96812e95aabd7f311bb55d170fd88

creating explicit vars to improve readbility

TThisIsDjonathan committed 7 months ago
Unverified
4a49d3cf5e64579b1ff08ba2a212b3eead1d0356

adding some validations + improving youtube comment

TThisIsDjonathan committed 7 months ago
Unverified
c31531423ccc632a532ae96d9acc59b549278ec6

starting readme

TThisIsDjonathan committed 7 months ago

README

The README file for this repository.

Welcome to youtube-audiobook-chapter-identifier 🎧📖

This bot's goal is to identify the chapters in an audiobook hosted on YouTube 🔎🕵🏻‍♂️📋

bot-image

Result Example

__________________
Result for The Animal Farm
https://www.youtube.com/watch?v=iosHzNmVYbA

Chapter 1        0:00:07         Duration: 0:00:07
Chapter 2        0:16:51         Duration: 0:16:43
Chapter 3        0:33:05         Duration: 0:16:14
Chapter 4        0:47:13         Duration: 0:14:07
Chapter 5        0:57:48         Duration: 0:10:34
Chapter 6        1:17:14         Duration: 0:19:26
Chapter 7        1:34:49         Duration: 0:17:35
Chapter 8        1:57:52         Duration: 0:23:03
Chapter 9        2:22:48         Duration: 0:24:55
Chapter 10       2:45:23         Duration: 0:22:34


by https://github.com/ThisIsDjonathan/youtube-audiobook-chapter-identifier
__________________

How I Built This

This is done in 3 steps:

  1. The YoutubeVideoHelper.py will download the YouTube content as a .mp4;
  2. Then the AudioToTextHelper.py will use the OpenAI whisper to transcribe the audio to text;
  3. The last step is done by the Audiobook.py which will find where each chapter starts based on the result text from the step above.

The script will create a folder inside the ./audiobooks/ directory for each audiobook.

This is the file structure: 📦 youtube-audiobook-chapter-identifier
┣ 📂 audiobooks
┃ ┗ 📂 Audio Book 1
┃     ┗ 🎧 youtube-content.mp4
┃     ┗ 📋 audio-to-text.json

How to use it

First, install the Python dependencies:

pip install -r requirements.txt

Then update the main.py setting the Audiobook title and its Youtube URL.

def main():
    audiobook_title = 'The Animal Farm'
    youtube_url = 'https://www.youtube.com/watch?v=iosHzNmVYbA'

And finally run the script: python main.py

How it Works

YoutubeVideoHandler 🎧📖

We are using the pytube library to download the Youtube data. We download the audio only and save the file as youtube-content.mp4.

The Speech to Text 🗣️👂✍🏻

After download the audio file from YouTube we use the OpenAI whisper to transcribe the audio to text. The result of this process is a JSON file saved as audio-to-text.json

Chapter Finder 🕵🏻‍♂️📋

The chapter finder (Audiobook.find_chapters()) will loop through each segment resulted in the whisper transcription and look for the word "chapter". This should be done in a better way since currently I'm using a simple and dumb if statement to do so 😅

Contributing

Check the open issues 😁

Author

👤 Djonathan Krause

Show your support

Please ⭐️ this repository if this project helped you!

Buy Me A Coffee