cr.https://levelup.gitconnected.com/building-mcp-powered-agentic-rag-application-step-by-step-guide-2-2-2afd5254ab4e

5. Define MCP Tools

Our MCP server is now instantiated, but it’s like a skilled worker with an empty toolbox. To make it useful, we need to give it tools. In the MCP world, a tool is simply a Python function that we “register” with the server, making it a capability the agent can choose to use. This registration is done with a simple decorator: @mcp_server.tool().

Crucially, the agent relies on the function’s docstring to understand what the tool does and when it should be used. A well-written docstring is not just for developers; it’s a direct instruction to the AI, guiding its decision-making process.

Let’s start by defining our first tool: the one that connects to our private vector database. This tool will be responsible for searching our internal Python FAQ knowledge base.

Add the following code to your mcp_server.py file.

@mcp_server.tool()
def python_faq_retrieval_tool(query: str) -> str:
    """
    Retrieve the most relevant documents from the Python FAQ collection. 
    Use this tool when the user asks about general Python programming concepts.
    
    Args:
        query (str): The user query to retrieve the most relevant documents.
        
    Returns:
        str: The most relevant documents retrieved from the vector DB.
    """
    if not isinstance(query, str):
        raise TypeError("Query must be a string.")
    
    # Use the single, pre-initialized faq_engine instance for efficiency
    return faq_engine.answer_question(query)

Let’s break down this powerful piece of code:

@mcp_server.tool(): This is the magic decorator. It tells our mcp_server instance, “Hey, here is a new capability you can use!” The server automatically inspects the function’s name, its parameters, and — most importantly — its docstring to learn about this new skill.
The Docstring: This is the most critical part for the agentic behavior. When the agent receives a user query, it will read this description to decide if this tool is the right one for the job. We’ve explicitly told it to use this tool when the user asks about general Python programming concepts. This clear instruction is what allows the agent to distinguish between needing to search our internal FAQ versus, say, searching the web.
python_faq_retrieval_tool(query: str) -> str: This is a standard Python function with type hints. The MCP framework uses these hints to validate inputs and understand the data types the tool works with.
faq_engine.answer_question(query): This is the line that executes the tool’s core logic. It calls a method on an faq_engine object (which we will build in the “Building RAG Pipeline” section) to perform the actual vector search in our Qdrant database and return the results as a string.

With this single function, we’ve given our agent its first sense — the ability to look up information in its own specialized memory. Next, we’ll give it the ability to look outward to the web.

Now, what if the user asks about a very recent Python library, a current event, or a topic not covered in our internal knowledge base? For that, our agent needs to be able to browse the internet.

We’ll equip our agent with a second tool that uses the FireCrawl API to perform a live web search. This gives our agent a way to find real-time, public information, making it vastly more versatile.

Add the final tool to your mcp_server.py:

@mcp_server.tool()
def firecrawl_web_search_tool(query: str) -> List[str]:
    """
    Search for information on a given topic using Firecrawl.
    Use this tool when the user asks a specific question not related to the Python FAQ.

    Args:
        query (str): The user query to search for information.

    Returns:
        List[str]: A list of the most relevant web search results.
    """
    if not isinstance(query, str):
        raise TypeError("Query must be a string.")

    url = "https://api.firecrawl.dev/v1/search"
    api_key = os.getenv('FIRECRAWL_API_KEY')

    if not api_key:
        return ["Error: FIRECRAWL_API_KEY environment variable is not set."]

    payload = {"query": query, "timeout": 60000}
    headers = {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    }

    try:
        response = requests.post(url, json=payload, headers=headers)
        response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
        # Assuming the API returns JSON with a key like "data" or "results"
        # Adjust .get("data", ...) if the key is different
        return response.json().get("data", ["No results found from web search."])
    except requests.exceptions.RequestException as e:
        return [f"Error connecting to Firecrawl API: {e}"]

This firecrawl_web_search_tool function is another powerful capability for our agent. Registered with the same @mcp_server.tool() decorator, its docstring provides the critical instruction for when it should be used: for any question not related to the Python FAQ.

This creates a clear distinction from our first tool, enabling the agent to make an intelligent choice. The function’s logic is straightforward: it securely retrieves the Firecrawl API key from our environment variables, constructs a POST request containing the user’s query, and sends it to the Firecrawl API.

To ensure our application is resilient and doesn’t crash on network failures, the entire API call is wrapped in a try…except block, allowing it to handle connection issues gracefully. Upon a successful request, it parses the JSON response and returns a list of search results, effectively giving our agent access to the vast information of the internet.

With both tools now defined, our agent has a choice: look inward into its private knowledge base or look outward to the public web. Next, we will build the engine that powers our internal knowledge search.

6. Building RAG Pipeline

Our python_faq_retrieval_tool is currently just a promise; it relies on an faq_engine object to do the heavy lifting of vector search, but we haven’t built this engine yet. This component is the heart of the “Retrieval” part of our RAG system. It will be responsible for two critical tasks:

Indexing: Taking our plain text data, converting it into numerical representations (embeddings), and storing them efficiently in our Qdrant vector database.
Searching: Providing a method that can take a user’s query, convert it into an embedding, and search the database for the most relevant pieces of stored information.

To accomplish this, we’ll orchestrate several powerful libraries. We’ll use llama-index for its robust embedding capabilities, qdrant-client to communicate with our database, and a few standard Python utilities to handle the data processing efficiently.

Let’s begin by creating the rag_app.py file and adding the necessary imports that will form the backbone of our pipeline.

import uuid
from itertools import islice
from typing import List, Dict, Any, Generator

from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from tqdm import tqdm
from qdrant_client import models, QdrantClient

The first step in building our pipeline is to define the knowledge base itself. This is the proprietary data our agent will search through to answer questions related to Python.

In our rag_app.py file, add our source text just below the imports:

PYTHON_FAQ_TEXT = """
Question: What is the difference between a list and a tuple in Python?
Answer: Lists are mutable, meaning their elements can be changed, while tuples are immutable. Lists use square brackets `[]` and tuples use parentheses `()`.

Question: What are Python decorators?
Answer: Decorators are a design pattern in Python that allows a user to add new functionality to an existing object without modifying its structure. They are often used for logging, timing, and access control.

Question: How does Python's Global Interpreter Lock (GIL) work?
Answer: The GIL is a mutex that protects access to Python objects, preventing multiple threads from executing Python bytecode at the same time. This means that even on a multi-core processor, only one thread can execute Python code at once.

Question: What is the difference between `==` and `is` in Python?
Answer: `==` checks for equality of value (do two objects have the same content?), while `is` checks for identity (do two variables point to the same object in memory?).

Question: Explain list comprehensions and their benefits.
Answer: List comprehensions provide a concise way to create lists. They are often more readable and faster than using traditional `for` loops and `.append()` calls.

Question: What is `*args` and `**kwargs` in function definitions?
Answer: `*args` allows you to pass a variable number of non-keyword arguments to a function, which are received as a tuple. `**kwargs` allows you to pass a variable number of keyword arguments, received as a dictionary.

Question: What are Python's magic methods (e.g., `__init__`, `__str__`)?
Answer: Magic methods, or dunder methods, are special methods that you can define to add "magic" to your classes. They are invoked by Python for built-in operations, like `__init__` for object creation or `__add__` for the `+` operator.

Question: How does error handling work in Python?
Answer: Python uses `try...except` blocks to handle exceptions. Code that might raise an error is placed in the `try` block, and the code to handle the exception is placed in the `except` block.

Question: What is the purpose of the `if __name__ == "__main__":` block?
Answer: This block ensures that the code inside it only runs when the script is executed directly, not when it is imported as a module into another script.

Question: What are generators in Python?
Answer: Generators are a simple way to create iterators. They are functions that use the `yield` keyword to return a sequence of values one at a time, saving memory for large datasets.
"""

For this guide, we’re embedding our knowledge directly into the script as a multi-line string. In a production application, you would typically load this data from a collection of documents, like text files, PDFs, or a database, but the principle remains the same.

This PYTHON_FAQ_TEXT variable contains a curated list of common Python questions and answers. Each Q&A pair is a self-contained chunk of information, making it an ideal format for our vector database. When we index this data, each chunk will become a searchable unit that our agent can retrieve.

With our data defined, the next step is to create a class that can process it.

With our data defined, we now need to build the engine that can process, index, and query it. This is where the FAQEngine class comes in. It’s the central component of our RAG pipeline, encapsulating all the logic for interacting with both the embedding model and the Qdrant vector database.

First, we’ll add a small helper function to our rag_app.py. This function will help us process our data in manageable chunks, which is crucial for memory efficiency and performance when dealing with large datasets.

# Helper function for batching
def batch_generator(data: List[Any], batch_size: int) -> Generator[List[Any], None, None]:
    """Yields successive n-sized chunks from a list."""
    for i in range(0, len(data), batch_size):
        yield data[i : i + batch_size]

Now, let’s define the main class that puts everything together.


class FAQEngine:
    """
    An engine for setting up and querying a FAQ database using Qdrant and HuggingFace embeddings.
    """
    def __init__(self,
                 qdrant_url: str = "http://localhost:6333",
                 collection_name: str = "python-faq",
                 embed_model_name: str = "nomic-ai/nomic-embed-text-v1.5"):
        
        self.collection_name = collection_name
        
        # Initialize the embedding model
        print("Loading embedding model...")
        self.embed_model = HuggingFaceEmbedding(
            model_name=embed_model_name,
            trust_remote_code=True
        )
        
        # Dynamically get the vector dimension from the model
        self.vector_dim = len(self.embed_model.get_text_embedding("test"))
        print(f"Embedding model loaded. Vector dimension: {self.vector_dim}")

        # Initialize the Qdrant client
        self.client = QdrantClient(url=qdrant_url, prefer_grpc=True)
        print("Connected to Qdrant.")

    @staticmethod
    def parse_faq(text: str) -> List[str]:
        """Parses the raw FAQ text into a list of Q&A strings."""
        return [
            qa.replace("\n", " ").strip()
            for qa in text.strip().split("\n\n")
        ]

    def setup_collection(self, faq_contexts: List[str], batch_size: int = 64):
        """
        Creates a Qdrant collection (if it doesn't exist) and ingests the FAQ data.
        """
        # Check if collection exists, create if not
        try:
            self.client.get_collection(collection_name=self.collection_name)
            print(f"Collection '{self.collection_name}' already exists. Skipping creation.")
        except Exception:
            print(f"Creating collection '{self.collection_name}'...")
            self.client.create_collection(
                collection_name=self.collection_name,
                vectors_config=models.VectorParams(
                    size=self.vector_dim,
                    distance=models.Distance.DOT
                )
            )

        print(f"Embedding and ingesting {len(faq_contexts)} documents...")
        
        # Process data in batches
        for batch in tqdm(batch_generator(faq_contexts, batch_size), 
                          total=(len(faq_contexts) // batch_size) + 1,
                          desc="Ingesting FAQ data"):
            
            # 1. Get embeddings for the batch
            embeddings = self.embed_model.get_text_embedding_batch(batch, show_progress_bar=False)
            
            # 2. Create Qdrant points with unique IDs and payloads
            points = [
                models.PointStruct(
                    id=str(uuid.uuid4()),  # Generate a unique ID for each point
                    vector=vector,
                    payload={"context": context}
                )
                for context, vector in zip(batch, embeddings)
            ]
            
            # 3. Upload points to the collection
            self.client.upload_points(
                collection_name=self.collection_name,
                points=points,
                wait=False # Asynchronous upload for speed
            )
            
        print("Data ingestion complete.")
        print("Updating collection indexing threshold...")
        self.client.update_collection(
            collection_name=self.collection_name,
            optimizer_config=models.OptimizersConfigDiff(indexing_threshold=20000)
        )
        print("Collection setup is finished.")

    def answer_question(self, query: str, top_k: int = 3) -> str:
        """
        Searches the vector database for a given query and returns the most relevant contexts.
        """
        # 1. Create an embedding for the user's query
        query_embedding = self.embed_model.get_query_embedding(query)

        # 2. Search Qdrant for the most similar vectors
        search_result = self.client.search(
            collection_name=self.collection_name,
            query_vector=query_embedding,
            limit=top_k,
            score_threshold=0.5 # Optional: filter out less relevant results
        )

        # 3. Format the results into a single string
        if not search_result:
            return "I couldn't find a relevant answer in my knowledge base."

        relevant_contexts = [
            hit.payload["context"] for hit in search_result
        ]
        
        # Combine the contexts into a final, readable output
        formatted_output = "Here are the most relevant pieces of information I found:\n\n---\n\n".join(relevant_contexts)
        return formatted_output

The constructor, __init__, is responsible for setting up all the necessary components. It initializes the HuggingFaceEmbedding model from llama-index, which will handle the conversion of text to vectors.

A key detail here is that we dynamically determine the vector_dim by embedding a test string; this makes our code robust and adaptable to different embedding models. Finally, it establishes a connection to our Qdrant database. The print statements provide helpful feedback to the user when the script is run.

Before we can index our data, we need to clean it up. The parse_faq static method is a simple utility for this. It takes our raw PYTHON_FAQ_TEXT and splits it into a clean list of individual Question/Answer strings, which is the perfect format for our database.

The setup_collection method is where the indexing magic happens. It first checks if a Qdrant collection with our specified name already exists, preventing us from wastefully re-indexing data every time we start the application. If it doesn’t exist, it creates a new one, configuring it with the correct vector size and distance metric.

Then, it begins the main ingestion loop, using our batch_generator helper to process the FAQ data in efficient chunks. Inside this loop, a three-step process occurs for each batch:

Embed: It converts the batch of text chunks into a list of numerical vectors.
Structure: It packages each vector into a Qdrant PointStruct, which includes a unique ID and a payload containing the original text. This payload is crucial, as it allows us to retrieve the human-readable context after a search.
Upload: It sends these points to the Qdrant collection.

Finally, the answer_question method is what our agent’s tool will call. This is the “retrieval” step. It takes a user’s query, converts it into a vector using the same embedding model, and then uses the client.search function to find the most similar vectors in our database.

It then extracts the original text from the payload of the top results and formats it into a single, clean string. This formatted text is the “context” that will ultimately be used to generate a high-quality answer.

7. Putting Everything Together and Testing the System

We have all the individual components: the server, the tools, and the RAG engine. Now, let’s wire them together in mcp_server.py and launch our application.

# This should be at the end of the mcp_server.py file

if __name__ == "__main__":
    # 1. Initialize our RAG Engine
    print("Initializing FAQ Engine and setting up Qdrant collection...")
    faq_engine = FAQEngine(qdrant_url=QDRANT_URL, collection_name=COLLECTION_NAME)
    
    # 2. Ingest our data into the vector database
    # This will create embeddings and store them
    faq_engine.setup_collection(PYTHON_FAQ_TEXT)
    
    # 3. Start the MCP server
    print(f"Starting MCP server at http://{HOST}:{PORT}")
    mcp_server.run()

This final block does three things in order:

Creates an instance of our FAQEngine.
Calls setup_collection() to process our text and load it into Qdrant.
Starts the mcp_server, which will now listen for incoming requests.

It is important to make sure your Docker container for Qdrant is still running. Then, open your terminal in the project directory and run the server:

python mcp_server.py

You should see output indicating that the FAQ Engine is being set up and that the server is running:

Initializing FAQ Engine and setting up Qdrant collection...
Collection not found. Creating and indexing...
Indexing complete.
Starting MCP server at http://127.0.0.1:8080
INFO:     Started server process [12345]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:8080 (Press CTRL+C to quit)

The server is live! Now, let’s act as a client and send it requests. Open a new terminal window and use curl to interact with our agent.

Test Case 1: Querying the Python FAQ

Let’s ask a question that should be answered by our internal knowledge base.

curl -X POST http://127.0.0.1:8080/mcp \
-H "Content-Type: application/json" \
-d '{
  "messages": [
    {
      "role": "user",
      "content": "Can you explain list comprehensions in Python?"
    }
  ]
}'

The agent should correctly choose the python_faq_retrieval_tool. The tool will query Qdrant, and you will get a response containing the relevant context from your PYTHON_FAQ_TEXT. The response from the server will be a JSON object that includes the tool call and the result:

{
  "messages": [
    //... original message ...
    {
      "role": "assistant",
      "tool_calls": [
        {
          "id": "call_abc123",
          "type": "function",
          "function": {
            "name": "python_faq_retrieval_tool",
            "arguments": "{\"query\": \"list comprehensions in Python\"}"
          }
        }
      ]
    },
    {
      "role": "tool",
      "tool_call_id": "call_abc123",
      "name": "python_faq_retrieval_tool",
      "content": "Retrieved Context from FAQ:\nWhat is a list comprehension in Python?\nA: A list comprehension is a concise way to create lists. It consists of brackets containing an expression followed by a for clause, then zero or more for or if clauses. The expressions can be anything, meaning you can put in all kinds of objects in lists."
    }
  ]
}

Test Case 2: Querying the Web

Now, let’s ask something our FAQ doesn’t know. Make sure you have your FIRECRAWL_API_KEY in the .env file.

curl -X POST http://127.0.0.1:8080/mcp \
-H "Content-Type: application/json" \
-d '{
  "messages": [
    {
      "role": "user",
      "content": "What is the new Polars DataFrame library?"
    }
  ]
}'

This time, the agent will see that “Polars” isn’t in its FAQ tool’s docstring. It will instead choose the web_search_tool. The response will look similar but will contain the markdown content fetched by FireCrawl.

Congratulations! You have successfully built a sophisticated agentic RAG application from the ground up. By leveraging the Model Context Protocol, you’ve created a system that is modular, intelligent, and resourceful.

We’ve seen how MCP allows us to define distinct tools and how the agent uses simple docstrings to intelligently decide which capability to use for a given query. This powerful pattern allows you to create applications that can reason about their own abilities and dynamically pull information from private knowledge bases (via RAG) and the public internet, leading to far more accurate and helpful responses.

From here, you could expand the system by adding more tools (e.g., a calculator, a database query tool), ingesting more complex documents into your RAG pipeline, or connecting it to a real user interface. The MCP-powered foundation you’ve built is ready for whatever you want to build next.

วันอังคารที่ 22 กรกฎาคม พ.ศ. 2568

Building MCP-Powered Agentic RAG Application: Step-by-Step Guide (2/2)

5. Define MCP Tools

6. Building RAG Pipeline

7. Putting Everything Together and Testing the System

ไม่มีความคิดเห็น:

แสดงความคิดเห็น