This project highlights the capabilities of Haystack in solving real-world problems through a set of ten unique challenges. Developed by deepset, Haystack is a robust framework for building pipelines that integrate retrieval-augmented generation (RAG), metadata processing, and other advanced NLP techniques. The event takes place during December 2024, blending technical problem-solving with the festive theme of Christmas.
This project is a series of ten challenges designed to explore and enhance skills in using the Haystack framework for building advanced pipelines. Each challenge presents a scenario requiring the design, implementation, and optimization of pipelines using Haystack's tools and capabilities. The tasks focus on leveraging components from HayStack to solve practical problems.
In addition to showcasing Haystack's features, the challenges explore its application in realistic environments, working with technologies such as Weaviate, AssemblyAI, NVIDIA, Arize, and MongoDB, demonstrating how Haystack integrates seamlessly with these systems to create powerful, end-to-end solutions for information retrieval and natural language processing tasks.
[!IMPORTANT]
Please note that this project has usedhaystack-ai
package version 2.8.0 and functionalities might (have) change(d) over time.
The first challenge involves building a pipeline capable of fetching content from URLs, processing it for relevance, and enabling a seamless Q&A system. The objective was to configure the pipeline to identify the ten most relevant chunks of information from the content and ensure efficient query handling.
This challenge focuses on utilizing the integration Weaviate, a vector database optimized for semantic search, to solve a fictional mystery. Goal is to design and implement a pipeline using Haystack and Weaviate that enables efficient retrieval of relevant information from a dataset answering the key question of the mystery.
This task focuses on creating a Retrieval-Augmented Generation (RAG) pipeline that integrates multi-query retrieval techniques. The goal was to improve recall by retrieving highly relevant answers from external data sources, such as news feeds. Custom components were implemented to enhance the pipeline's performance.
This challenge concentrates on combining the capabilities of Haystack with AssemblyAI to process audio data and transform it into meaningful text outputs. The goal is to build a pipeline that can handle the following tasks: transcribing an audio file into text, summarizing the content for simplicity, and rewriting it in a creative style tailored for a specific audience.
This project involves leveraging deepset Studio, a user-friendly platform for building and managing Haystack pipelines, to streamline the development of a Retrieval-Augmented Generation (RAG) pipeline. The task involves utilizing the platform's features, such as its drag-and-drop interface, pipeline templates, and deployment tools, to create an efficient indexing and query pipeline.
This challenge focuses on leveraging NVIDIA Inference Microservices (NIMs) with Haystack to build two key functionalities: task delegation optimization and multilingual document organization. It demonstrates the practical application of NVIDIA's AI models through microservices for efficient workflow management.
This challenge involves building an end-to-end system that automates matching and evaluation processes using advanced NLP techniques. It integrates Haystack for retrieval and generation, an LLM-based judge for evaluation, and Arize Phoenix for monitoring and tracing.
This task demonstrates the implementation of an intelligent agent system using Haystack's experimental components, specifically designed for automated inventory management. Along with custom-built tools, the system enables real-time inventory tracking, automated price comparisons, and natural language interactions. The solution showcases how Haystack's tool-calling capabilities can be leveraged to create sophisticated AI agents that handle complex operational tasks while maintaining user-friendly interfaces.
This project demonstrates the implementation of a self-reflecting AI agent using Haystack's RAG capabilities integrated with MongoDB Atlas vector search. The system combines advanced retrieval techniques with recursive self-evaluation to optimize recommendations based on multiple criteria. By leveraging MongoDB's vector search capabilities and Haystack's RAG pipeline architecture, the solution enables semantic matching while incorporating budget constraints, age appropriateness, and preference optimization. The implementation showcases how Haystack can be used to create sophisticated recommendation systems that continuously improve their suggestions through self-reflection mechanisms.
This challenge demonstrates the implementation of systematic evaluation methodologies for RAG pipelines using Haystack's EvaluationHarness. By integrating specialized evaluators for faithfulness, context relevance, and overall performance metrics, the system enables automated assessment and comparison of different pipeline configurations. The implementation showcases how Haystack's evaluation framework can be used to measure and optimize RAG pipeline performance in production environments.
Follow these steps to set up the project environment with Python and necessary tools:
-
Download the Anaconda installer from the official Anaconda website.
-
Install Anaconda following the graphical or command-line instructions for your operating system and add it to the PATH.
-
Verify the installation by running the following commands in the terminal (Linux/macOS) or Anaconda Prompt (Windows):
python --version
-
Create a new Conda environment named
haystack
with Python (verified for version 3.11):conda create --n haystack python=3.11
-
Activate the environment:
conda activate haystack
-
Ensure you have a CUDA-compatible device if using GPU acceleration.
-
Install PyTorch with CUDA (verified for version 12.1) support using pip:
pip install torch --index-url https://download.pytorch.org/whl/cu121
-
Install the necessary packages to work with Jupyter notebooks in Visual Studio Code:
pip install notebook jupyterlab ipykernel
-
If you haven't already, install the Jupyter extension in Visual Studio Code from the Extensions Marketplace.
-
Install the additional dependencies listed in a
requirements.txt
file by running:pip install -r requirements.txt
-
The
requirements.txt
file will download the latest versions of the defined packages. -
After installing packages. open Visual Studio Code, select your Conda environment as the interpreter (Python: Select Interpreter from the Command Palette).
-
Open the Jupyter notebook corresponding to the specific project.
-
To use OpenAI's API, you need to save your secret key securely in a configuration file.
-
Create a
config.json
file in your project directory and add the following content:"sk-proj-...."
-
Replace
"sk-proj-...."
with your actual OpenAI secret key, but make sure it is contained in the string"..."
-
Ensure this file is not shared or pushed to version control systems like Git by adding it to
.gitignore
.
- Open Visual Studio Code, select your Conda environment as the interpreter (Python: Select Interpreter from the Command Palette).
- Open the Jupyter notebook corresponding to the specific project.