Build a RAG pipeline to fetch live Hacker News posts and summarize them with a local LLM endpointWelcome to this step-by-step tutorial where we’ll build a simple Retrieval-Augmented Generation (RAG) pipeline using Haystack and OPEA. We’ll fetch the newest Hacker News posts, feed them to a lightweight LLM endpoint (OPEAGenerator), and generate concise one-sentence summaries (based on this notebook). Let’s dive in! 🎉In modern GenAI applications, having a flexible, performant, and scalable platform is essential. OPEA (Open Platform for Enterprise AI) is an open, model-agnostic framework for building and operating composable GenAI solutions. It provides:In this demo, we’ll use an OPEA LLM endpoint in a Haystack pipeline, giving you:In this tutorial, we’ll build a simple RAG pipeline that fetches the newest Hacker News posts, sends them to a local OPEA endpoint running a Qwen/Qwen2.5-7B-Instruct demo model, and produces concise one-sentence summaries. Of course, you can replace our example model with any other OPEA-served model, making this pattern both lightweight for prototyping and powerful for real-world deployments. Let’s get started! 🚀Make sure you have:NOTE: As a reference, here is a Docker Compose recipe to get you started. OPEA LLM service can be configured to use a variety of model serving backends like TGI, vLLM, ollama, OVMS… and offers validated runtime settings for good performance on various hardware’s including Intel Gaudi. In this example, it creates an OPEA LLM service with a TGI backend. See the documentation for LLM Generation. The code is based on OPEA LLM example and OPEA TGI example.To run, call LLM_MODEL_ID=Qwen/Qwen2.5-7B-Instruct docker compose up.We’ll create a custom Haystack component, HackernewsNewestFetcher, that:We use the OPEAGenerator to call our LLM over HTTP. Here, we point to a local endpoint serving the Qwen/Qwen2.5-7B-Instruct model:Using PromptBuilder, we define a Jinja-style template that:We wire up the components in a Pipeline:Fetch and summarize the top 2 newest Hacker News posts:Beautiful, concise summaries in seconds! ✨In this tutorial, we built a full RAG pipeline:Feel free to extend this setup with more advanced retrieval, caching, or different LLM backends. Happy coding! 🛠️🔥 Building products, technology and solutions for LLM-enabled applications.