Using MCP Servers with Local LLMs

In this article I will show how to create a model context protocol (MCP) server and integrate it with a local LLM to implement a multi-step tool calling workflow.

MCP

MCPs have been gaining popularity in the LLM space to standardize discovery of tools that are made available to LLMs. One key use case of this is to publish tools (e.g. remote APIs) that can be used by LLMs to bring in external data to enrich their understanding of the world.

I think it’s important to call out that MCPs don’t give LLMs new capabilities since you can achieve the same behavior through standard tool calling. However, I still think there is value added from standardizing the integration of the tool functions though automatic discovery. Also, if you add new tool functions to your MCP server, the LLM will instantly gain access to the new tools without code changes.

Implementing an MCP server

In the following section I will show how I built an MCP server using FastMCP. I will also show how to use the MCP server to implement a multi step tool calling workflow integrated with local qwen 2.5 (7B). In short I am building a simple POC for a GraphQL-esque response based on the tools that you opt-in to through your prompt.

As a quick implementation detail, I am running qwen in Docker through Ollama. The initialization code can be found below:

from llama_index.llms.ollama import Ollama model_llm_tools = None def init_llm(): global model_llm_tools if model_llm_tools == None: model_llm_tools = Ollama(model="qwen2.5", base_url="http://llm:11434", request_timeout=1000.0) return model_llm_tools

The first step of the MCP server is to create the entry point for the FastMCP server. Looking at the code, it’s sort of like how you would create rest APIs, but with some syntax variations. Instead of using get/post attributes, we decorate the methods with “tool” attributes to expose them as callable tools.

See the sample code below:

from mcp.server.fastmcp import FastMCP import argparse from services.client_service import ClientService from services.contract_service import ContractService from services.product_service import ProductService mcp = FastMCP("ClientAPI", port=3000) @mcp.tool() def get_client_info(name: str): """Gets client information from the client given a client name. Args: id: The name of the client to retrieve. Returns: dict: Document with information about the client. """ clientService = ClientService() return clientService.get_client(client_name=name) @mcp.tool() def get_contract_info(client_id: int): """Gets a contract definition based on a client id. Args: id: The id of the client to retrieve a contract for. Returns: dict: Document with information about the contract belonging to the client. """ contract_service = ContractService() return contract_service.get_contract_by_client_id(client_id=client_id) @mcp.tool() def get_product_info(contract_id: int): """Gets a product definition from a contract given a contract Id. Args: id: The id of the contract to retrieve a product for. Returns: dict: Document with information about the product that is part of the contract. """ product_service = ProductService() return product_service.get_product_by_contract_id(contract_id=contract_id) @mcp.tool() def get_product_price_info(product_id: int): price = 1000 if product_id < 2000: price = 500 return {"product_id": product_id, "price": price} parser = argparse.ArgumentParser() parser.add_argument("--server_type", type=str, default="sse", choices=["sse", "stdio"]) args = parser.parse_args() mcp.run(args.server_type)

In this simple example I have created an MCP server that exposes a series of tools that can be used to load client data with belonging contracts, products and prices. The idea is that the user can ask the LLM to return a nested dataset of client data enriched with contract, product and price data.

Creating an Agent

Now that we have a running MCP server, the next step is to create an agent that can wire up the integration between the tools and the LLM. In my example I am using LlamaIndex to facilitate the automated discovery of the exposed tools. In the following sample code, pay attention to the get_agent method where I load all available tools and hook them up through a LlamaIndex object called FunctionAgent.

Based on the prompt the LLM will infer which tool(s) to call to get the data requested by the user. I should point out that the LLM doesn’t actually call the tools directly. Instead, it will generate a schema response that represents the call(s) with appropriate arguments. The schema response can then be parsed by the agent and translated into actual tool calls.

Most of the work for the LLM is figuring out which tool(s) to call based on the intention of the user prompt. One of the key benefits of using an LLM for this is that it handles nuances of semantic meaning really well, which is key to mapping tools reliably for prompts with different language, but same meaning.

I have included the agent code below:

from model import init_llm from llama_index.tools.mcp import McpToolSpec, BasicMCPClient, McpToolSpec from llama_index.core.agent.workflow import FunctionAgent, ToolCallResult, ToolCall from llama_index.core.workflow import Context from helpers.tool_call_helper import extract_tool_cals import json mcp_url = "http://mcp:3000/sse" SYSTEM_PROMPT = """\ You are an AI assistant used for Tool Calling. Make sure to use tools to load client information, contract information and product information """ async def get_agent(tools: McpToolSpec): llm = init_llm() tools = await tools.to_tool_list_async() agent = FunctionAgent( name="ClientAgent", description="An agent that can load client, contract and product data", tools=tools, llm=llm, system_prompt=SYSTEM_PROMPT, ) return agent async def get_agent_context(): mcp_client = BasicMCPClient(mcp_url) mcp_tool = McpToolSpec(client=mcp_client) agent = await get_agent(mcp_tool) agent_context = Context(agent) return (agent, agent_context, mcp_client) async def execute_tools(mcp_client, tool_calls_from_llm): prev_result = None nested_result = {} current_level = nested_result for index, tool in enumerate(tool_calls_from_llm, start=0): if index > 0: key = next(iter(tool["arguments"])) value = prev_result[key] tool["arguments"][key] = value result = await mcp_client.call_tool(tool["name"], tool["arguments"]) result = json.loads(result.content[0].text) prev_result = result current_level[tool["name"]] = result current_level = current_level[tool["name"]] return nested_result async def invoke_agent(msg: str): agent, agent_context, mcp_client = await get_agent_context() handler = agent.run(msg, ctx=agent_context) response = await handler tool_calls_from_llm = extract_tool_cals(str(response)) return await execute_tools(mcp_client, tool_calls_from_llm)

As I mentioned previously, the LLM does not invoke the functions directly, but it should give us a schema with the correct sequence of calls.

Let’s look at the following sample prompt: “Load the client information, contract information, product information and product price for a client named Furniture King”.

Considering that this prompt is asking for data that can be mapped to all four MCP tools, we would expect the LLM to generate a schema with correct calls to all four. I have included a sample of what the generated schema looks like in qwen 2.5 below:

<tool_call> {"name": "get_client_info", "arguments": {"name": "Furniture King"}} </tool_call> <tool_call> {"name": "get_contract_info", "arguments": {"client_id": 1}} </tool_call> <tool_call> {"name": "get_product_info", "arguments": {"contract_id": 1}} </tool_call> <tool_call> {"name": "get_product_price_info", "arguments": {"product_id": 1}} </tool_call>

Notice how the first call in the sequence uses input directly from the prompt (Furniture King). The remaining calls are dependent on the prior call, so the LLM can only provide placeholder values. However, the naming and structure of the arguments are syntactically correct, which is key when executing the calls dynamically.

In the following code sample, I show how I made the dynamic calls. One key convention in the tool functions is that the argument of a tool can be found in the response object of the prior tool call. Except for the first call where I can use the argument from the LLM directly.

async def execute_tools(mcp_client, tool_calls_from_llm): prev_result = None nested_result = {} current_level = nested_result for index, tool in enumerate(tool_calls_from_llm, start=0): if index > 0: key = next(iter(tool["arguments"])) value = prev_result[key] tool["arguments"][key] = value result = await mcp_client.call_tool(tool["name"], tool["arguments"]) result = json.loads(result.content[0].text) prev_result = result current_level[tool["name"]] = result current_level = current_level[tool["name"]] return nested_result

The result of executing all four tools identified by the LLM is a nested object of client, contract, product and price data as seen below:

{ "get_client_info": { "client_id": 2, "client_name": "Furniture King", "client_location": "Sweden", "get_contract_info": { "contract_id": 500, "contract_name": "Furniture Contract", "contract_region": "Stockholm", "get_product_info": { "contract_id": 500, "product_name": "Twin Beds", "product_id": 2000, "get_product_price_info": { "product_id": 2000, "price": 1000 } } } } }

Should you want a simpler response, just limit the prompt to what you need (e.g. “Load the the product info and product price info for contract_id 400”).

One cool thing about this behavior is that you can kind of think of it as a GraphQL-esque response, except simpler for non-technical users since the input is based on an English sentence.

I have included a Github repo in case you are interested here.

Observations

I found tool calling to be generally reliable in qwen, but with these smaller models there are occasional cases where calls are missing. My recommended workaround is to fix this by tweaking the prompt.

References:

https://medium.com/@pedroazevedo6/build-llamaindex-agents-with-mcp-connector-69df32d95508

https://docs.llamaindex.ai/en/stable/api_reference/tools/mcp/