In this article I will show how to create a model context protocol (MCP) server and integrate it with a local LLM to implement a multi-step tool calling workflow.
MCP
MCPs have been gaining popularity in the LLM space to standardize discovery of tools that are made available to LLMs. One key use case of this is to publish tools (e.g. remote APIs) that can be used by LLMs to bring in external data to enrich their understanding of the world.
I think it’s important to call out that MCPs don’t give LLMs new capabilities since you can achieve the same behavior through standard tool calling. However, I still think there is value added from standardizing the integration of the tool functions though automatic discovery. Also, if you add new tool functions to your MCP server, the LLM will instantly gain access to the new tools without code changes.
Implementing an MCP server
In the following section I will show how I built an MCP server using FastMCP. I will also show how to use the MCP server to implement a multi step tool calling workflow integrated with local qwen 2.5 (7B). In short I am building a simple POC for a GraphQL-esque response based on the tools that you opt-in to through your prompt.
As a quick implementation detail, I am running qwen in Docker through Ollama. The initialization code can be found below:
The first step of the MCP server is to create the entry point for the FastMCP server. Looking at the code, it’s sort of like how you would create rest APIs, but with some syntax variations. Instead of using get/post attributes, we decorate the methods with “tool” attributes to expose them as callable tools.
See the sample code below:
In this simple example I have created an MCP server that exposes a series of tools that can be used to load client data with belonging contracts, products and prices. The idea is that the user can ask the LLM to return a nested dataset of client data enriched with contract, product and price data.
Creating an Agent
Now that we have a running MCP server, the next step is to create an agent that can wire up the integration between the tools and the LLM. In my example I am using LlamaIndex to facilitate the automated discovery of the exposed tools. In the following sample code, pay attention to the get_agent method where I load all available tools and hook them up through a LlamaIndex object called FunctionAgent.
Based on the prompt the LLM will infer which tool(s) to call to get the data requested by the user. I should point out that the LLM doesn’t actually call the tools directly. Instead, it will generate a schema response that represents the call(s) with appropriate arguments. The schema response can then be parsed by the agent and translated into actual tool calls.
Most of the work for the LLM is figuring out which tool(s) to call based on the intention of the user prompt. One of the key benefits of using an LLM for this is that it handles nuances of semantic meaning really well, which is key to mapping tools reliably for prompts with different language, but same meaning.
I have included the agent code below:
As I mentioned previously, the LLM does not invoke the functions directly, but it should give us a schema with the correct sequence of calls.
Let’s look at the following sample prompt: “Load the client information, contract information, product information and product price for a client named Furniture King”.
Considering that this prompt is asking for data that can be mapped to all four MCP tools, we would expect the LLM to generate a schema with correct calls to all four. I have included a sample of what the generated schema looks like in qwen 2.5 below:
Notice how the first call in the sequence uses input directly from the prompt (Furniture King). The remaining calls are dependent on the prior call, so the LLM can only provide placeholder values. However, the naming and structure of the arguments are syntactically correct, which is key when executing the calls dynamically.
In the following code sample, I show how I made the dynamic calls. One key convention in the tool functions is that the argument of a tool can be found in the response object of the prior tool call. Except for the first call where I can use the argument from the LLM directly.
The result of executing all four tools identified by the LLM is a nested object of client, contract, product and price data as seen below:
Should you want a simpler response, just limit the prompt to what you need (e.g. “Load the the product info and product price info for contract_id 400”).
One cool thing about this behavior is that you can kind of think of it as a GraphQL-esque response, except simpler for non-technical users since the input is based on an English sentence.
I have included a Github repo in case you are interested here.
Observations
I found tool calling to be generally reliable in qwen, but with these smaller models there are occasional cases where calls are missing. My recommended workaround is to fix this by tweaking the prompt.