วันพฤหัสบดีที่ 24 กรกฎาคม พ.ศ. 2568

Filament / Laravel on AwardSpace.com

 Filament / Laravel on AwardSpace.com

1.create App / .htaccess

# To prevent access to .env and other files
<Files .*>
# Apache 2.2
<IfModule !mod_authz_core.c>
Order deny,allow
Deny from all
</IfModule>
# Apache 2.4
<IfModule mod_authz_core.c>
Require all denied
</IfModule>
</Files>

<IfModule mod_alias.c>
RedirectMatch 301 ^/$ http://chet7.atwebpages.com/public
</IfModule>

2.copy vendor\livewire\livewire\dist\*.*  -> /livewire
3.routes/web.php  add

Livewire::setUpdateRoute(function ($handle) {
 return Route::post('/public/livewire/update', $handle)
  ->middleware('web');
});

4.vendor/filament/filament/resources/views/components/theme-switcher/index.blade.php

@php
$theme='system';
$icon='heroicon-m-computer-desktop';
    $label = __("filament-panels::layout.actions.theme_switcher.{$theme}.label");
@endphp

5.set Permission /storage -> drwxrwxrwx(777)

วันอังคารที่ 22 กรกฎาคม พ.ศ. 2568

Building MCP-Powered Agentic RAG Application: Step-by-Step Guide (2/2)

 cr.https://levelup.gitconnected.com/building-mcp-powered-agentic-rag-application-step-by-step-guide-2-2-2afd5254ab4e

5. Define MCP Tools

Our MCP server is now instantiated, but it’s like a skilled worker with an empty toolbox. To make it useful, we need to give it tools. In the MCP world, a tool is simply a Python function that we “register” with the server, making it a capability the agent can choose to use. This registration is done with a simple decorator: @mcp_server.tool().

Crucially, the agent relies on the function’s docstring to understand what the tool does and when it should be used. A well-written docstring is not just for developers; it’s a direct instruction to the AI, guiding its decision-making process.

Let’s start by defining our first tool: the one that connects to our private vector database. This tool will be responsible for searching our internal Python FAQ knowledge base.

Add the following code to your mcp_server.py file.

@mcp_server.tool()
def python_faq_retrieval_tool(query: str) -> str:
"""
Retrieve the most relevant documents from the Python FAQ collection.
Use this tool when the user asks about general Python programming concepts.

Args:
query (str): The user query to retrieve the most relevant documents.

Returns:
str: The most relevant documents retrieved from the vector DB.
"""

if not isinstance(query, str):
raise TypeError("Query must be a string.")

# Use the single, pre-initialized faq_engine instance for efficiency
return faq_engine.answer_question(query)

Let’s break down this powerful piece of code:

  • @mcp_server.tool(): This is the magic decorator. It tells our mcp_server instance, “Hey, here is a new capability you can use!” The server automatically inspects the function’s name, its parameters, and — most importantly — its docstring to learn about this new skill.
  • The Docstring: This is the most critical part for the agentic behavior. When the agent receives a user query, it will read this description to decide if this tool is the right one for the job. We’ve explicitly told it to use this tool when the user asks about general Python programming concepts. This clear instruction is what allows the agent to distinguish between needing to search our internal FAQ versus, say, searching the web.
  • python_faq_retrieval_tool(query: str) -> str: This is a standard Python function with type hints. The MCP framework uses these hints to validate inputs and understand the data types the tool works with.
  • faq_engine.answer_question(query): This is the line that executes the tool’s core logic. It calls a method on an faq_engine object (which we will build in the “Building RAG Pipeline” section) to perform the actual vector search in our Qdrant database and return the results as a string.

With this single function, we’ve given our agent its first sense — the ability to look up information in its own specialized memory. Next, we’ll give it the ability to look outward to the web.

Now, what if the user asks about a very recent Python library, a current event, or a topic not covered in our internal knowledge base? For that, our agent needs to be able to browse the internet.

We’ll equip our agent with a second tool that uses the FireCrawl API to perform a live web search. This gives our agent a way to find real-time, public information, making it vastly more versatile.

Add the final tool to your mcp_server.py:

@mcp_server.tool()
def firecrawl_web_search_tool(query: str) -> List[str]:
"""
Search for information on a given topic using Firecrawl.
Use this tool when the user asks a specific question not related to the Python FAQ.

Args:
query (str): The user query to search for information.

Returns:
List[str]: A list of the most relevant web search results.
"""

if not isinstance(query, str):
raise TypeError("Query must be a string.")

url = "https://api.firecrawl.dev/v1/search"
api_key = os.getenv('FIRECRAWL_API_KEY')

if not api_key:
return ["Error: FIRECRAWL_API_KEY environment variable is not set."]

payload = {"query": query, "timeout": 60000}
headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
}

try:
response = requests.post(url, json=payload, headers=headers)
response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
# Assuming the API returns JSON with a key like "data" or "results"
# Adjust .get("data", ...) if the key is different
return response.json().get("data", ["No results found from web search."])
except requests.exceptions.RequestException as e:
return [f"Error connecting to Firecrawl API: {e}"]

This firecrawl_web_search_tool function is another powerful capability for our agent. Registered with the same @mcp_server.tool() decorator, its docstring provides the critical instruction for when it should be used: for any question not related to the Python FAQ.

This creates a clear distinction from our first tool, enabling the agent to make an intelligent choice. The function’s logic is straightforward: it securely retrieves the Firecrawl API key from our environment variables, constructs a POST request containing the user’s query, and sends it to the Firecrawl API.

To ensure our application is resilient and doesn’t crash on network failures, the entire API call is wrapped in a try…except block, allowing it to handle connection issues gracefully. Upon a successful request, it parses the JSON response and returns a list of search results, effectively giving our agent access to the vast information of the internet.

With both tools now defined, our agent has a choice: look inward into its private knowledge base or look outward to the public web. Next, we will build the engine that powers our internal knowledge search.


6. Building RAG Pipeline

Our python_faq_retrieval_tool is currently just a promise; it relies on an faq_engine object to do the heavy lifting of vector search, but we haven’t built this engine yet. This component is the heart of the “Retrieval” part of our RAG system. It will be responsible for two critical tasks:

  1. Indexing: Taking our plain text data, converting it into numerical representations (embeddings), and storing them efficiently in our Qdrant vector database.
  2. Searching: Providing a method that can take a user’s query, convert it into an embedding, and search the database for the most relevant pieces of stored information.

To accomplish this, we’ll orchestrate several powerful libraries. We’ll use llama-index for its robust embedding capabilities, qdrant-client to communicate with our database, and a few standard Python utilities to handle the data processing efficiently.

Let’s begin by creating the rag_app.py file and adding the necessary imports that will form the backbone of our pipeline.

import uuid
from itertools import islice
from typing import List, Dict, Any, Generator

from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from tqdm import tqdm
from qdrant_client import models, QdrantClient

The first step in building our pipeline is to define the knowledge base itself. This is the proprietary data our agent will search through to answer questions related to Python.

In our rag_app.py file, add our source text just below the imports:

PYTHON_FAQ_TEXT = """
Question: What is the difference between a list and a tuple in Python?
Answer: Lists are mutable, meaning their elements can be changed, while tuples are immutable. Lists use square brackets `[]` and tuples use parentheses `()`.

Question: What are Python decorators?
Answer: Decorators are a design pattern in Python that allows a user to add new functionality to an existing object without modifying its structure. They are often used for logging, timing, and access control.

Question: How does Python's Global Interpreter Lock (GIL) work?
Answer: The GIL is a mutex that protects access to Python objects, preventing multiple threads from executing Python bytecode at the same time. This means that even on a multi-core processor, only one thread can execute Python code at once.

Question: What is the difference between `==` and `is` in Python?
Answer: `==` checks for equality of value (do two objects have the same content?), while `is` checks for identity (do two variables point to the same object in memory?).

Question: Explain list comprehensions and their benefits.
Answer: List comprehensions provide a concise way to create lists. They are often more readable and faster than using traditional `for` loops and `.append()` calls.

Question: What is `*args` and `**kwargs` in function definitions?
Answer: `*args` allows you to pass a variable number of non-keyword arguments to a function, which are received as a tuple. `**kwargs` allows you to pass a variable number of keyword arguments, received as a dictionary.

Question: What are Python's magic methods (e.g., `__init__`, `__str__`)?
Answer: Magic methods, or dunder methods, are special methods that you can define to add "magic" to your classes. They are invoked by Python for built-in operations, like `__init__` for object creation or `__add__` for the `+` operator.

Question: How does error handling work in Python?
Answer: Python uses `try...except` blocks to handle exceptions. Code that might raise an error is placed in the `try` block, and the code to handle the exception is placed in the `except` block.

Question: What is the purpose of the `if __name__ == "__main__":` block?
Answer: This block ensures that the code inside it only runs when the script is executed directly, not when it is imported as a module into another script.

Question: What are generators in Python?
Answer: Generators are a simple way to create iterators. They are functions that use the `yield` keyword to return a sequence of values one at a time, saving memory for large datasets.
"""

For this guide, we’re embedding our knowledge directly into the script as a multi-line string. In a production application, you would typically load this data from a collection of documents, like text filesPDFs, or a database, but the principle remains the same.

This PYTHON_FAQ_TEXT variable contains a curated list of common Python questions and answers. Each Q&A pair is a self-contained chunk of information, making it an ideal format for our vector database. When we index this data, each chunk will become a searchable unit that our agent can retrieve.

With our data defined, the next step is to create a class that can process it.

With our data defined, we now need to build the engine that can process, index, and query it. This is where the FAQEngine class comes in. It’s the central component of our RAG pipeline, encapsulating all the logic for interacting with both the embedding model and the Qdrant vector database.

First, we’ll add a small helper function to our rag_app.py. This function will help us process our data in manageable chunks, which is crucial for memory efficiency and performance when dealing with large datasets.

# Helper function for batching
def batch_generator(data: List[Any], batch_size: int) -> Generator[List[Any], None, None]:
"""Yields successive n-sized chunks from a list."""
for i in range(0, len(data), batch_size):
yield data[i : i + batch_size]

Now, let’s define the main class that puts everything together.


class FAQEngine:
"""
An engine for setting up and querying a FAQ database using Qdrant and HuggingFace embeddings.
"""

def __init__(self,
qdrant_url: str = "http://localhost:6333",
collection_name: str = "python-faq",
embed_model_name: str = "nomic-ai/nomic-embed-text-v1.5"
):

self.collection_name = collection_name

# Initialize the embedding model
print("Loading embedding model...")
self.embed_model = HuggingFaceEmbedding(
model_name=embed_model_name,
trust_remote_code=True
)

# Dynamically get the vector dimension from the model
self.vector_dim = len(self.embed_model.get_text_embedding("test"))
print(f"Embedding model loaded. Vector dimension: {self.vector_dim}")

# Initialize the Qdrant client
self.client = QdrantClient(url=qdrant_url, prefer_grpc=True)
print("Connected to Qdrant.")

@staticmethod
def parse_faq(text: str) -> List[str]:
"""Parses the raw FAQ text into a list of Q&A strings."""
return [
qa.replace("\n", " ").strip()
for qa in text.strip().split("\n\n")
]

def setup_collection(self, faq_contexts: List[str], batch_size: int = 64):
"""
Creates a Qdrant collection (if it doesn't exist) and ingests the FAQ data.
"""

# Check if collection exists, create if not
try:
self.client.get_collection(collection_name=self.collection_name)
print(f"Collection '{self.collection_name}' already exists. Skipping creation.")
except Exception:
print(f"Creating collection '{self.collection_name}'...")
self.client.create_collection(
collection_name=self.collection_name,
vectors_config=models.VectorParams(
size=self.vector_dim,
distance=models.Distance.DOT
)
)

print(f"Embedding and ingesting {len(faq_contexts)} documents...")

# Process data in batches
for batch in tqdm(batch_generator(faq_contexts, batch_size),
total=(len(faq_contexts) // batch_size) + 1,
desc="Ingesting FAQ data"):

# 1. Get embeddings for the batch
embeddings = self.embed_model.get_text_embedding_batch(batch, show_progress_bar=False)

# 2. Create Qdrant points with unique IDs and payloads
points = [
models.PointStruct(
id=str(uuid.uuid4()), # Generate a unique ID for each point
vector=vector,
payload={"context": context}
)
for context, vector in zip(batch, embeddings)
]

# 3. Upload points to the collection
self.client.upload_points(
collection_name=self.collection_name,
points=points,
wait=False # Asynchronous upload for speed
)

print("Data ingestion complete.")
print("Updating collection indexing threshold...")
self.client.update_collection(
collection_name=self.collection_name,
optimizer_config=models.OptimizersConfigDiff(indexing_threshold=20000)
)
print("Collection setup is finished.")

def answer_question(self, query: str, top_k: int = 3) -> str:
"""
Searches the vector database for a given query and returns the most relevant contexts.
"""

# 1. Create an embedding for the user's query
query_embedding = self.embed_model.get_query_embedding(query)

# 2. Search Qdrant for the most similar vectors
search_result = self.client.search(
collection_name=self.collection_name,
query_vector=query_embedding,
limit=top_k,
score_threshold=0.5 # Optional: filter out less relevant results
)

# 3. Format the results into a single string
if not search_result:
return "I couldn't find a relevant answer in my knowledge base."

relevant_contexts = [
hit.payload["context"] for hit in search_result
]

# Combine the contexts into a final, readable output
formatted_output = "Here are the most relevant pieces of information I found:\n\n---\n\n".join(relevant_contexts)
return formatted_output

The constructor, __init__, is responsible for setting up all the necessary components. It initializes the HuggingFaceEmbedding model from llama-index, which will handle the conversion of text to vectors.

A key detail here is that we dynamically determine the vector_dim by embedding a test string; this makes our code robust and adaptable to different embedding models. Finally, it establishes a connection to our Qdrant database. The print statements provide helpful feedback to the user when the script is run.

Before we can index our data, we need to clean it up. The parse_faq static method is a simple utility for this. It takes our raw PYTHON_FAQ_TEXT and splits it into a clean list of individual Question/Answer strings, which is the perfect format for our database.

The setup_collection method is where the indexing magic happens. It first checks if a Qdrant collection with our specified name already exists, preventing us from wastefully re-indexing data every time we start the application. If it doesn’t exist, it creates a new one, configuring it with the correct vector size and distance metric.

Then, it begins the main ingestion loop, using our batch_generator helper to process the FAQ data in efficient chunks. Inside this loop, a three-step process occurs for each batch:

  1. Embed: It converts the batch of text chunks into a list of numerical vectors.
  2. Structure: It packages each vector into a Qdrant PointStruct, which includes a unique ID and a payload containing the original text. This payload is crucial, as it allows us to retrieve the human-readable context after a search.
  3. Upload: It sends these points to the Qdrant collection.

Finally, the answer_question method is what our agent’s tool will call. This is the “retrieval” step. It takes a user’s query, converts it into a vector using the same embedding model, and then uses the client.search function to find the most similar vectors in our database.

It then extracts the original text from the payload of the top results and formats it into a single, clean string. This formatted text is the “context” that will ultimately be used to generate a high-quality answer.


7. Putting Everything Together and Testing the System

We have all the individual components: the server, the tools, and the RAG engine. Now, let’s wire them together in mcp_server.py and launch our application.

# This should be at the end of the mcp_server.py file

if __name__ == "__main__":
# 1. Initialize our RAG Engine
print("Initializing FAQ Engine and setting up Qdrant collection...")
faq_engine = FAQEngine(qdrant_url=QDRANT_URL, collection_name=COLLECTION_NAME)

# 2. Ingest our data into the vector database
# This will create embeddings and store them
faq_engine.setup_collection(PYTHON_FAQ_TEXT)

# 3. Start the MCP server
print(f"Starting MCP server at http://{HOST}:{PORT}")
mcp_server.run()

This final block does three things in order:

  1. Creates an instance of our FAQEngine.
  2. Calls setup_collection() to process our text and load it into Qdrant.
  3. Starts the mcp_server, which will now listen for incoming requests.

It is important to make sure your Docker container for Qdrant is still running. Then, open your terminal in the project directory and run the server:

python mcp_server.py

You should see output indicating that the FAQ Engine is being set up and that the server is running:

Initializing FAQ Engine and setting up Qdrant collection...
Collection not found. Creating and indexing...
Indexing complete.
Starting MCP server at http://127.0.0.1:8080
INFO: Started server process [12345]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:8080 (Press CTRL+C to quit)

The server is live! Now, let’s act as a client and send it requests. Open a new terminal window and use curl to interact with our agent.

Test Case 1: Querying the Python FAQ

Let’s ask a question that should be answered by our internal knowledge base.

curl -X POST http://127.0.0.1:8080/mcp \
-H "Content-Type: application/json" \
-d '{

"messages": [
{
"role": "user",
"content": "Can you explain list comprehensions in Python?"
}
]
}'

The agent should correctly choose the python_faq_retrieval_tool. The tool will query Qdrant, and you will get a response containing the relevant context from your PYTHON_FAQ_TEXT. The response from the server will be a JSON object that includes the tool call and the result:

{
"messages": [
//... original message ...
{
"role": "assistant",
"tool_calls": [
{
"id": "call_abc123",
"type": "function",
"function": {
"name": "python_faq_retrieval_tool",
"arguments": "{\"query\": \"list comprehensions in Python\"}"
}
}
]
},
{
"role": "tool",
"tool_call_id": "call_abc123",
"name": "python_faq_retrieval_tool",
"content": "Retrieved Context from FAQ:\nWhat is a list comprehension in Python?\nA: A list comprehension is a concise way to create lists. It consists of brackets containing an expression followed by a for clause, then zero or more for or if clauses. The expressions can be anything, meaning you can put in all kinds of objects in lists."
}
]
}

Test Case 2: Querying the Web

Now, let’s ask something our FAQ doesn’t know. Make sure you have your FIRECRAWL_API_KEY in the .env file.

curl -X POST http://127.0.0.1:8080/mcp \
-H "Content-Type: application/json" \
-d '{
"messages": [
{
"role": "user",
"content": "What is the new Polars DataFrame library?"
}
]
}'

This time, the agent will see that “Polars” isn’t in its FAQ tool’s docstring. It will instead choose the web_search_tool. The response will look similar but will contain the markdown content fetched by FireCrawl.

Congratulations! You have successfully built a sophisticated agentic RAG application from the ground up. By leveraging the Model Context Protocol, you’ve created a system that is modular, intelligent, and resourceful.

We’ve seen how MCP allows us to define distinct tools and how the agent uses simple docstrings to intelligently decide which capability to use for a given query. This powerful pattern allows you to create applications that can reason about their own abilities and dynamically pull information from private knowledge bases (via RAG) and the public internet, leading to far more accurate and helpful responses.

From here, you could expand the system by adding more tools (e.g., a calculator, a database query tool), ingesting more complex documents into your RAG pipeline, or connecting it to a real user interface. The MCP-powered foundation you’ve built is ready for whatever you want to build next.

Building MCP-Powered Agentic RAG Application: Step-by-Step Guide (1/2)

cr.https://levelup.gitconnected.com/building-mcp-powered-agentic-rag-application-step-by-step-guide-1-2-efea9fb6f250 

Building MCP-Powered Agentic RAG Application: Step-by-Step Guide (1/2)

A Step-by-Step Guide to Building an Agentic RAG Application with MCP

10 min readJul 14, 2025

You can read the article for free from this link

While many tutorials show how to build Retrieval-Augmented Generation (RAG) systems by connecting a language model to a single database, this approach is often too limited.

This guide takes you to the next level with Agentic RAG, where you won’t just build a system that retrieves information — you’ll build one that intelligently decides where to look. It’s the difference between an AI that can only read one assigned book and an AI that knows exactly which book to pull from the library.

To achieve this, we will use the Model Context Protocol (MCP), a modern framework designed specifically for creating powerful, multi-tool AI agents. In this step-by-step tutorial, we will walk you through the entire process:

  • Setting up a complete MCP server.
  • Defining custom tools for both private data and live web searches.
  • Constructing a full RAG pipeline that puts the agent in control.

By the end, you’ll have a fully functional and intelligent application capable of fielding complex queries by dynamically sourcing the best possible context for a truly accurate answer.

Zoom image will be displayed

Table of Contents:

  1. Introduction to Model Context Protocol (MCP)
  2. MCP-Powered Agentic RAG Architecture Overview
  3. Setting Up the Working Environment
  4. Set up an MCP server instance
  5. Define MCP Tools [Part 2]
  6. Building RAG Pipeline [Part 2]
  7. Putting Everything Together and Testing the System [Part 2]

You can find all the codes in this GitHub Repository

1. Introduction to Model Context Protocol (MCP)

Zoom image will be displayed

MCP, or Model Context Protocol, is an open standard that enables large language models (LLMs) to securely access and utilize external data, tools, and services.

Think of it as a universal connector, like USB-C for AI, allowing different AI applications to interact with various data sources and functionalities. It essentially standardizes how AI models receive and use context, such as files, prompts, or callable functions, to perform tasks more effectively.

  • Standardized Connection: MCP provides a consistent way for AI models to interact with external systems, regardless of the specific AI model or the underlying technology.
  • Secure Access: MCP ensures secure access to resources by defining clear boundaries and permissions for how AI models can interact with external data and tools.
  • Two-Way Communication: MCP facilitates both sending information to the AI model (context) and receiving results or actions back from the model, enabling dynamic and interactive AI applications.
  • Ecosystem of Tools: MCP is designed to foster a vibrant ecosystem of “MCP servers,” which are applications or services that expose their functionality or data through the MCP standard, and “MCP clients,” which are the AI applications that connect to these servers.

2. MCP-Powered Agentic RAG Architecture Overview

Before we roll up our sleeves and dive into the code, it’s crucial to understand the high-level architecture of our MCP-powered Agentic RAG application. This system is designed to be intelligent, resourceful, and responsive.

The diagram below illustrates the complete workflow, from the user’s initial question to the final, context-enriched answer.

Zoom image will be displayed

The architecture is composed of several key components that work in concert:

  1. The User: The starting point of the entire process. A user has a question and interacts with the system via a front-end application (e.g., a web interface on their laptop).
  2. MCP Client: This is the user-facing part of our application. It’s responsible for receiving the user’s Query, communicating with the backend server, and ultimately presenting the final Response back to the user.
  3. MCP Server (The Agent): This is the core of our system — the “brain” of the operation. When the MCP Server receives a request, it doesn’t just try to answer from its own pre-trained knowledge. Instead, it acts as an agent, intelligently deciding if it needs more information to provide a high-quality answer.
  4. Tools: These are external resources the MCP Server can use to find information. Our architecture features two powerful tools:
  • Vector DB search MCP tool: This tool connects to a specialized vector database. This database is ideal for storing and searching through your private documents, internal knowledge bases, or any custom text corpus. It excels at finding semantically similar information (the “Retrieval” in RAG).
  • Web search MCP tool: When the answer isn’t in our private data, this tool allows the agent to search the live internet for the most current and relevant public information.

Now, let’s trace the path of a request through the system:

  1. Query Submission: The user types a question (“Query”) into the application.
  2. Client to Server Communication: The MCP Client receives the query and forwards it to the MCP Server for processing.
  3. The Agentic Decision (Tool Call): This is where the magic happens. The MCP Server analyzes the query and determines the best course of action. It makes a “Tool call,” deciding whether it should:
  • Query the Vector DB for internal, proprietary knowledge.
  • Perform a Web search for real-time, public information.
  • Use a combination of both, or none at all if the question is general enough.

4. Information Retrieval (The “R” in RAG): The selected tool(s) execute the search and retrieve relevant documents, articles, or data snippets. This retrieved information is the “context.”

5. Context Augmentation: The retrieved context is sent back and used to augment the original request. The system now has both the user’s original query and a rich set of relevant information.

6. Response Generation: The MCP Client, now armed with this powerful context, generates a comprehensive and accurate answer. This process is known as “Retrieval-Augmented Generation” (RAG) because the final response is generated based on the retrieved information.

7. Final Answer: The final, context-aware Response is sent back to the user, providing an answer that is not just plausible, but well-supported by real data.

By orchestrating these components, we create a system that can answer complex questions by intelligently seeking out and synthesizing information from multiple sources. Now that we understand the “what” and “why,” let’s move on to the “how.”

3. Setting Up the Working Environment

With the architecture mapped out, it’s time to get our hands dirty and set up the foundation for our application. In this section, we’ll configure our project, install the necessary libraries, and launch the external services our agent will rely on.

Before you begin, make sure you have the following installed on your system:

  • Python 3.8+ and pip
  • Docker (for running our vector database)

Let’s start by organizing our files. A clean structure makes the project easier to manage. Create a root folder for your project (e.g., mcp-agentic-rag) and create the following files inside it. We’ll populate them as we go.

/mcp-agentic-rag
|-- .env
|-- mcp_server.py # Our main application and server logic
|-- rag_app.py # Our RAG-specific code and data

Now, open your mcp_server.py file and add the following code. This script will handle imports, load environment variables, and define the core configuration constants for our service.

import os
from typing import List
import requests

from dotenv import load_dotenv
from mcp.server.fastmcp import FastMCP

# Updated import: Assumes FAQEngine and the FAQ text are in 'rag_app.py'
from rag_app import FAQEngine, PYTHON_FAQ_TEXT


# Load environment variables from .env file
load_dotenv()

# Configuration constants
QDRANT_URL = "http://localhost:6333"
COLLECTION_NAME = "python_faq_collection" # Using a new collection for the Python data
HOST = "127.0.0.1"
PORT = 8080

We import standard libraries like os and requests, but the key imports are:

  • load_dotenv from python-dotenv: A handy utility to load configuration (like API keys) from a separate .env file, keeping our code clean and secure.
  • FastMCP from mcp.server.fastmcp: The core class from the MCP framework that we’ll use to build our server.
  • FAQEngine and PYTHON_FAQ_TEXT from rag_app: These are custom components we’ll build shortly. PYTHON_FAQ_TEXT will hold our raw data, and FAQEngine will be the class that handles embedding and storing it.

We will also define easily accessible variables for our Qdrant connection and server address. Using constants makes the code more readable and easier to modify later.

Dependencies and External Services

Our agent can’t work in a vacuum. It needs its tools and libraries.

  1. Install Python Libraries

Open your terminal in the project directory and install the required Python packages using pip:

pip install mcp-server qdrant-client python-dotenv requests
  • mcp-server: The core library for our MCP agent.
  • qdrant-client: The official Python client to interact with the Qdrant vector database.
  • python-dotenv: For loading our .env file.
  • requests: A standard library for making HTTP requests, which our web search tool will use.

2. Create the .env File

Create a file named .env in your root directory. For now, it can be empty. This file is where you would store sensitive information like API keys if your tools required them (e.g., an OPENAI_API_KEY or a FIRECRAWL_API_KEY).

# .env file
# Add any API keys or secrets here later
OPENAI_API_KEY="sk-..."
FIRECRAWL_API_KEY = "Your API Key"

3. Launch the Vector Database (Qdrant)

Our Vector DB search MCP tool needs a database to talk to. We’ll use Docker to quickly and easily run an instance of Qdrant. Execute the following command in your terminal:

docker run -p 6333:6333 -p 6334:6334 \
-v $(pwd)/qdrant_storage:/qdrant/storage \
qdrant/qdrant

This command does three things:

  • Downloads the official Qdrant image if you don’t have it.
  • Starts a container and maps two ports: 6333 for the API (which our app will use) and 6334 for the web UI.
  • Mounts a local directory (qdrant_storage) to persist the database data on your host machine.

You can verify it’s running by opening http://localhost:6334/ in your browser. You should see the Qdrant dashboard.

Great! Our foundation is laid. We have our project structure, initial configuration, all necessary libraries installed, and a running vector database. In the next section, we’ll set up the MCP server instance.


4. Set up an MCP server instance

With our environment configured, the next logical step is to create the central component of our application: the MCP server instance. This object, an instance of the FastMCP class, will act as the orchestrator for our agent. It’s the “brain” that will manage incoming requests, know which tools are available, and handle the overall lifecycle of a query.

# Create an MCP server instance
mcp_server = FastMCP("MCP-RAG-app",
host=HOST,
port=PORT,
timeout=30)

Let’s quickly break down the parameters we passed to the FastMCP constructor:

  • “MCP-RAG-app”: This is a unique name for our application. It’s used internally by the MCP framework for identification and logging, making it easier to distinguish between different MCP services if you’re running multiple.
  • host=HOST: This parameter tells the server which network address to listen on. We’re using the HOST constant we defined earlier (“127.0.0.1”), which means the server will only be accessible from the local machine (localhost). This is a secure default for development.
  • port=PORT: This specifies the network port for the server. We’ve set it to 8080. Once running, our application will be available at http://127.0.0.1:8080.
  • timeout=30: This is an important setting for reliability. It sets a global timeout of 30 seconds for requests. If a tool call takes longer than this, the server will stop waiting and return a timeout error, preventing the application from hanging indefinitely.

Our server is now instantiated in the code, but it’s like a brain without senses. It’s ready to run, but it doesn’t know how to do anything useful yet. In the next steps, we’ll give it its “senses” by defining the tools it can use to retrieve information from the vector database and the web.