This bot's goal is to identify the chapters in an audiobook hosted on YouTube 🔎🕵🏻♂️📋
__________________
Result for The Animal Farm
https://www.youtube.com/watch?v=iosHzNmVYbA
Chapter 1 0:00:07 Duration: 0:00:07
Chapter 2 0:16:51 Duration: 0:16:43
Chapter 3 0:33:05 Duration: 0:16:14
Chapter 4 0:47:13 Duration: 0:14:07
Chapter 5 0:57:48 Duration: 0:10:34
Chapter 6 1:17:14 Duration: 0:19:26
Chapter 7 1:34:49 Duration: 0:17:35
Chapter 8 1:57:52 Duration: 0:23:03
Chapter 9 2:22:48 Duration: 0:24:55
Chapter 10 2:45:23 Duration: 0:22:34
by https://github.com/ThisIsDjonathan/youtube-audiobook-chapter-identifier
__________________
This is done in 3 steps:
- The
YoutubeVideoHelper.py
will download the YouTube content as a.mp4
; - Then the
AudioToTextHelper.py
will use the OpenAI whisper to transcribe the audio to text; - The last step is done by the
Audiobook.py
which will find where each chapter starts based on the result text from the step above.
The script will create a folder inside the ./audiobooks/
directory for each audiobook.
This is the file structure:
📦 youtube-audiobook-chapter-identifier
┣ 📂 audiobooks
┃ ┗ 📂 Audio Book 1
┃ ┗ 🎧 youtube-content.mp4
┃ ┗ 📋 audio-to-text.json
First, install the Python dependencies:
pip install -r requirements.txt
Then update the main.py
setting the Audiobook title and its Youtube URL.
def main():
audiobook_title = 'The Animal Farm'
youtube_url = 'https://www.youtube.com/watch?v=iosHzNmVYbA'
And finally run the script: python main.py
We are using the pytube library to download the Youtube data.
We download the audio only and save the file as youtube-content.mp4
.
After download the audio file from YouTube we use the OpenAI whisper to transcribe the audio to text.
The result of this process is a JSON file saved as audio-to-text.json
The chapter finder (Audiobook.find_chapters()
) will loop through each segment resulted in the whisper transcription and look for the word "chapter". This should be done in a better way since currently I'm using a simple and dumb if
statement to do so 😅
Check the open issues 😁
👤 Djonathan Krause
- Website: djonathan.com
- Github: @ThisIsDjonathan