Building AI-Powered Web Applications with RAG

 

Building AI-Powered Web Applications with RAG

Retrieval-Augmented Generation (RAG) combines the power of large language models (LLMs) with external knowledge bases, enabling more accurate and grounded answers to user queries. In this article, we’ll explore how to build an end-to-end RAG-based web application, covering everything from data ingestion to deploying a functional AI-powered interface.


---

What Is RAG?

RAG is a technique where LLMs are augmented with external sources of information. Instead of relying solely on the knowledge encoded in the model’s parameters, RAG retrieves relevant data chunks from a knowledge base and passes them to the model during query processing.

Why Use RAG?

Improved Accuracy: Provides grounded responses based on up-to-date information.

Reduced Hallucinations: Limits the LLM’s tendency to generate false but confident-sounding answers.

Customization: Tailor responses using domain-specific knowledge bases (e.g., company policies, product manuals).



---

Prerequisites

Before starting, ensure you have the following:

1. Python 3.8+ installed.


2. Familiarity with Flask and basic web development.


3. Access to OpenAI’s API or a local LLM (e.g., Llama 2).




---

Step 1: Setting Up Your Environment

Create a virtual environment and install dependencies:

# Create a virtual environment
python3 -m venv venv
source venv/bin/activate # On Windows, use: venv\Scripts\activate

# Install dependencies
pip install flask langchain openai faiss-cpu python-dotenv

Prepare a requirements.txt file for easier replication:

flask==2.2.5
langchain==0.0.305
openai==0.27.8
faiss-cpu==1.7.4
python-dotenv==1.0.0


---

Step 2: Preparing the Knowledge Base

Your RAG system can ingest various types of data, such as text files, PDFs, or website content. For this example, we’ll start with a simple .txt file.

Loading and Chunking Text

Use langchain.text_splitter.RecursiveCharacterTextSplitter to split large text into manageable chunks for retrieval.

Create a file called data_utils.py:

from langchain.text_splitter import RecursiveCharacterTextSplitter

def load_text(file_path: str) -> str:
"""Load raw text from a file."""
with open(file_path, "r", encoding="utf-8") as f:
return f.read()

def chunk_text(text: str, chunk_size=500, overlap=50) -> list:
"""Split text into chunks for better indexing."""
splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=overlap)
return splitter.split_text(text)

Place your text file (e.g., knowledge_base.txt) in a data/ folder, and load and chunk it using the functions above.


---

Step 3: Creating a Vector Store

Use FAISS to embed and store the text chunks for efficient similarity search.

Embedding and Storing Data

Create a file called embeddings_utils.py:

from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS

def build_faiss_index(chunks, api_key, save_path="faiss_index.pkl"):
"""Embed text chunks and store them in a FAISS index."""
embeddings = OpenAIEmbeddings(api_key)
vectorstore = FAISS.from_texts(chunks, embeddings)
vectorstore.save_local(save_path)
return vectorstore

def load_faiss_index(api_key, save_path="faiss_index.pkl"):
"""Load an existing FAISS index."""
embeddings = OpenAIEmbeddings(api_key)
return FAISS.load_local(save_path, embeddings)


---

Step 4: Integrating RAG with LangChain

LangChain’s RetrievalQA chain makes it easy to query the vector store and retrieve relevant chunks for generating answers.

Setting Up the RAG Chain

Create a file called rag_pipeline.py:

from langchain.chains import RetrievalQA
from langchain.llms import OpenAI

def create_rag_chain(vectorstore, api_key):
"""Create a RAG pipeline using a vectorstore and OpenAI."""
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
chain = RetrievalQA.from_chain_type(
llm=OpenAI(api_key=api_key),
retriever=retriever,
chain_type="stuff"
)
return chain


---

Step 5: Building the Flask Backend

Now, integrate everything into a Flask web application.

Setting Up Flask

In app.py:

from flask import Flask, request, jsonify
import os
from src.data_utils import load_text, chunk_text
from src.embeddings_utils import build_faiss_index, load_faiss_index
from src.rag_pipeline import create_rag_chain

# Load environment variables
from dotenv import load_dotenv
load_dotenv()
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

app = Flask(__name__)

# Initialize components
vectorstore = None
rag_chain = None

@app.before_first_request
def setup():
"""Initialize the vector store and RAG chain."""
global vectorstore, rag_chain

# Load and process text
text = load_text("data/knowledge_base.txt")
chunks = chunk_text(text)

# Build or load FAISS index
if os.path.exists("faiss_index.pkl"):
vectorstore = load_faiss_index(OPENAI_API_KEY)
else:
vectorstore = build_faiss_index(chunks, OPENAI_API_KEY)

# Create the RAG chain
rag_chain = create_rag_chain(vectorstore, OPENAI_API_KEY)

@app.route("/ask", methods=["POST"])
def ask():
"""Handle user queries."""
query = request.json.get("query", "")
if not query:
return jsonify({"error": "No query provided"}), 400

answer = rag_chain.run(query)
return jsonify({"answer": answer})

if __name__ == "__main__":
app.run(host="0.0.0.0", port=5000, debug=True)


---

Step 6: Adding the Front-End

HTML Interface

Create a simple front-end in templates/index.html:

<!DOCTYPE html>
<html>
<head>
<title>RAG System</title>
</head>
<body>
<h1>Ask Our Knowledge Base</h1>
<textarea id="query" placeholder="Type your question..."></textarea>
<button onclick="ask()">Ask</button>
<p id="answer"></p>

<script>
function ask() {
const query = document.getElementById("query").value;
fetch("/ask", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ query }),
})
.then((res) => res.json())
.then((data) => {
document.getElementById("answer").innerText = data.answer;
});
}
</script>
</body>
</html>


---

Step 7: Testing the Application

1. Run the Flask app:

python app.py


2. Visit http://localhost:5000 in your browser.


3. Ask questions about your knowledge base, such as:

What does the document say about refund policies?




---

Step 8: Scaling and Deployment

Containerize the app with Docker for deployment.

Use Gunicorn with Nginx for production scalability.

For large-scale deployments, consider cloud-hosted vector databases like Pinecone or Weaviate.



---

Conclusion

This guide walked through the core steps of creating a retrieval-augmented generation (RAG) system, from data ingestion and embedding to serving user queries via a web interface. With this foundation, you can expand the system to include multi-source knowledge bases, advanced front-end designs, and more scalable deployments.

Stay tuned for more in-depth articles on advanced RAG topics, including integrating enterprise data lakes and multi-language support.


---

This is the full content for the first article. Let me know if you want the full content for the other articles, or if you'd like tailored examples for specific sections!

Back to blog
  • ChatGPT Uncovered Podcast

    ChatGPT Uncovered Podcast

    Pedro Martins

    ChatGPT Uncovered Podcast ChatGPT Uncovered Podcast Exploring the Frontiers of AI Conversational Models Episode 1: Understanding ChatGPT Published on: May 15, 2023 Your browser does not support the audio element....

    ChatGPT Uncovered Podcast

    Pedro Martins

    ChatGPT Uncovered Podcast ChatGPT Uncovered Podcast Exploring the Frontiers of AI Conversational Models Episode 1: Understanding ChatGPT Published on: May 15, 2023 Your browser does not support the audio element....

  • Power Apps In-Depth Podcast

    Power Apps In-Depth Podcast

    Pedro Martins

    Power Apps In-Depth Podcast Power Apps In-Depth Podcast Exploring the Capabilities of Microsoft Power Apps Episode 1: Introduction to Power Apps Published on: April 20, 2023 Your browser does not...

    Power Apps In-Depth Podcast

    Pedro Martins

    Power Apps In-Depth Podcast Power Apps In-Depth Podcast Exploring the Capabilities of Microsoft Power Apps Episode 1: Introduction to Power Apps Published on: April 20, 2023 Your browser does not...

  • Exploring Power Pages Podcast

    Exploring Power Pages Podcast

    Pedro Martins

    Exploring Power Pages Podcast Exploring Power Pages Podcast Delving into the World of Microsoft Power Pages Episode 1: Getting Started with Power Pages Published on: March 10, 2023 Your browser...

    Exploring Power Pages Podcast

    Pedro Martins

    Exploring Power Pages Podcast Exploring Power Pages Podcast Delving into the World of Microsoft Power Pages Episode 1: Getting Started with Power Pages Published on: March 10, 2023 Your browser...

1 of 3