Build Faster Agentic LLM Workflows with Asynchronous Python Calls

Large language models can be difficult to run reliably in production because they may introduce inaccurate answers, inconsistent behavior, or noticeable response delays. As models become more advanced, it can be tempting to add more context and more detailed instructions to every prompt and expect the model to handle everything in one step. However, when a single system prompt is sent to a relatively large model, response time can often range from 3 to 15 seconds. For time-sensitive agents, such as voice-based phone assistants, that pause is too long.

A faster approach is to split the overall task into several smaller tasks and allow your LLM workflow to process those tasks in parallel. This makes it easier for smaller models to complete each task accurately. It can also significantly reduce latency and, in many cases, lower your operating costs.

There are several ways to run agentic workflows in parallel. In many agentic systems, however, the main bottleneck is the actual call to the LLM. You can reduce this bottleneck by sending multiple LLM API requests at the same time with Python’s asyncio and aiohttp libraries.

In this tutorial, you will use Python to create agentic workflows that send multiple LLM requests asynchronously. This makes it possible to build faster and more accurate LLM applications that can be deployed on a virtual server, GPU server, or serverless environment as an API endpoint.

You will create a customer service workflow for a real estate agency’s phone system. The workflow will answer questions about property listings, schedule appointments, and connect callers to human representatives. It will do this by sending several asynchronous requests to different LLM endpoints, reducing response time and giving you more flexibility when choosing the right model for each task.

Key Takeaways

Using asynchronous requests to small, fast models can help you build effective, low-latency agentic workflows for applications where response speed matters.
Multi-model workflows let you customize each prompt for a specific model. This can reduce overall compute requirements and lower the cost of both self-hosted LLM deployments and third-party API usage.

Prerequisites

To follow this tutorial, you will need:

A local development environment for Python 3.
Access to an LLM API. In this tutorial, you will use the Mistral-Small-3.2-24B model deployed on a GPU server using vLLM. You can set up a GPU-based server with vLLM to serve a model by following a suitable guide for running vLLM with your preferred GPU environment.
Familiarity with Python async and await syntax. This is optional, but helpful.

Step 1 — Setting Up Your Environment

First, prepare your environment and install the required dependencies. In this tutorial, requests will be sent to a Mistral-Small-3.2-24B model deployed on an H200 GPU server. You can either send requests to your own deployed model or use a third-party LLM API. This tutorial only sends requests to one LLM, but you can use different models from different providers for each prompt if needed.

Create a virtual environment and install the necessary dependency:

Copy Code

python3 -m venv venv source venv/bin/activate pip install aiohttp

Because the model requests will be made asynchronously, you need to define asynchronous functions that can manage those calls. You will use Python’s asyncio and aiohttp libraries. If you use another LLM API, install the provider’s client library if it is required to communicate with the endpoint.

Create a new file named agentic_workflows.py. Add the following code to import the required dependencies and define the API endpoint.

Replace your_server_ip with the IP address of your GPU server deployment, or define the client for the LLM API you are using:

agentic_workflows.py

Copy Code


import asyncio
import aiohttp

# Your vLLM server URL
VLLM_SERVER_URL = "http://your_server_ip:8000/v1/chat/completions"

Step 2 — Writing the Asynchronous Call Logic

Your asynchronous call logic should accept a list of dictionaries. Each dictionary contains the model ID and the prompt that should be sent to the LLM. The function should send all requests without waiting for each individual response before starting the next one. Each prompt will include the conversation history and the system prompt for that specific request. You will also add a synchronous call_models function that can be imported and used in your main script.

Add the following asynchronous functions to agentic_workflows.py:

agentic_workflows.py

Copy Code


async def _call_single_model(call_spec):
    """Make an async call to the GPU server with a given model and messages"""
    model_id = call_spec["model_id"]
    messages = call_spec["messages"]
    
    payload = {
        "model": model_id,
        "messages": messages,
        "max_tokens": 100, 
        "temperature": 0.1 
    }
    
    async with aiohttp.ClientSession() as session:
        async with session.post(VLLM_SERVER_URL, json=payload) as response:
            result = await response.json()
            
            message = result["choices"][0]["message"]["content"]
            return message

async def _call_models_async(call_list):
    """Call multiple models asynchronously and return responses in the same order"""
    # Create tasks for all model calls
    tasks = [_call_single_model(call_spec) for call_spec in call_list]
    
    # Run all tasks concurrently and return responses in order
    responses = await asyncio.gather(*tasks)
    return responses

def call_models(call_list):
    return asyncio.run(_call_models_async(call_list))

The aiohttp.ClientSession() method creates an asynchronous client session that sends POST requests to the API endpoint and handles HTTP responses without blocking the rest of the program. This allows several LLM API calls to run at the same time instead of one after another.

The _call_models_async() function collects the request details you provide and sends all API calls without waiting for each response before starting the next call. When the responses return, the function stores them in a list in the same order in which the requests were sent. The call_models() function is synchronous, making the asynchronous workflow easier to call from other code.

Step 3 — Writing Your Prompts

Now that the basic asynchronous API request logic is in place, you need to define the prompts. This workflow uses three separate system prompts. One prompt handles pricing questions, one handles scheduling requests, and one identifies other listing-related questions that should be forwarded to a human representative. You can add more prompts later, including prompts with more autonomous decision-making, but these three are enough for this example.

agentic_workflows.py

Copy Code


SYSTEM_PROMPT_CONFIGURATIONS = {
    "pricing_prompt": {
        "model_id": "mistralai/Mistral-Small-3.2-24B-Instruct-2506", 
        "prompt": "You are a customer service agent. Determine if the user's most recent request is asking about the price of a listing. If they are asking about the price of a listing AND if they have included the listing_id, return only the listing_id of the item they are asking about in the following format: 'listing_id: XXXXXX'. \nIf they are asking about the pricing of a listing AND did NOT mention the specific listing_id number, ask them for the listing id number. If they are requesting something other than the price of a listing: return only the word 'false'."
    },
    "scheduling_prompt": {
        "model_id": "mistralai/Mistral-Small-3.2-24B-Instruct-2506", 
        "prompt": "You are a customer service agent. Determine if the user's most recent request is asking to schedule a call with a real estate agent. If they are trying to schedule a call, return the date and time they would like to schedule the call in the following format: 'date: YYYY-MM-DD, time: HH:MM'. If they are trying to schedule a call but they did not mention the specific date and time they are available, ask them what day and time they are available. Do not provide any additional information, such as the agent's availability. If they are not asking to schedule a call, return only the word 'false'." 
    }, 
    "listing_prompt": {
        "model_id": "mistralai/Mistral-Small-3.2-24B-Instruct-2506", 
        "prompt": "You are a customer service agent. Determine if the user is asking a question about a listing. If they are asking a question about a listing, return only the word 'true'. Otherwise, return only the word 'false'."
    }
}

These system prompt configurations include the prompt name for later reference, the model ID that should receive the prompt, and the prompt text that will be combined with the conversation history.

Step 4 — Processing User Input Through the Models

Next, create a function that accepts the user’s conversation, adds each system prompt to that conversation, and sends the asynchronous requests to the correct model for each prompt using the call_models() function created earlier. This example assumes that the conversation history is stored in the frontend in the same format accepted by the API and is sent together with the user’s latest message.

Add the following function to agentic_workflows.py:

agentic_workflows.py

Copy Code


def run_agentic_workflow(conversation_history):
    model_calls_list = []
    prompt_names = []  # Keep track of prompt order for response mapping
    
    for prompt_name, config in SYSTEM_PROMPT_CONFIGURATIONS.items():
        model_id = config["model_id"]
        system_prompt = config["prompt"]
        
        # Construct the full message prompt: system prompt + conversation history
        full_messages = [{"role": "system", "content": system_prompt}] + conversation_history
        
        model_calls_list.append({
            "model_id": model_id,
            "messages": full_messages
        })
        prompt_names.append(prompt_name) 
    
    
    prompt_responses = call_models(model_calls_list)

This function creates a list of dictionaries for each system prompt. Each dictionary contains the model ID and the full message set, which includes the system prompt plus the conversation history. It then uses the call_models() function to call each model for each system prompt asynchronously.

After the models are called, the responses are stored in the prompt_responses variable in the same order in which the requests were sent. You then need to map those responses to specific variables.

Continue extending the run_agentic_workflow function:

agentic_workflows.py

Copy Code


    # Map responses to their respective prompts
    pricing_response = prompt_responses[prompt_names.index("pricing_prompt")]
    scheduling_response = prompt_responses[prompt_names.index("scheduling_prompt")]
    listing_response = prompt_responses[prompt_names.index("listing_prompt")]

Step 5 — Defining the Workflow Logic

Finally, add a logic layer to this function. This layer reviews the responses and decides what should happen next. It can look up a listing price in a database, route the call to a human representative, schedule an appointment with a real estate agent, or ask the user a follow-up question for more information.

Continue adding to the run_agentic_workflow function in agentic_workflows.py:

agentic_workflows.py

Copy Code

# Route 1: Handle pricing inquiries if pricing_response.lower() != "false": if pricing_response.startswith("listing_id:"): # Extract listing ID from response listing_id = pricing_response.split("listing_id: ")[1].strip() # Simulate database/API lookup # You can add database querying logic here to look for a target listing ID. For our example, we will use a simple Python dictionary example_price_database = { "123456": "$350,000", "654321": "$450,000", "112233": "$550,000" } found_price = example_price_database.get(listing_id) if found_price: final_response = f"The price for listing {listing_id} is {found_price}." else: final_response = "We are unable to find that listing ID in our records. Are you sure you have the correct listing ID?" else: final_response = pricing_response # Response asking for listing ID # Route 2: Handle scheduling requests elif scheduling_response.lower() != "false": if scheduling_response.startswith("date:"): final_response = f"Perfect! I've scheduled a call for you on that date and time. A sales representative will reach out to you at that time." # In production: Add logic to actually book the appointment, and consider customizing the message to confirm the date and time selected. else: final_response = scheduling_response # Response asking for specific time # Route 3: Handle general listing questions elif listing_response.lower() != "false": final_response = "Please hold while I transfer you to a specialist for further assistance." # In production: Add logic to transfer chat to human representative # You could alternatively add logic to access listing details and answer the user's specific question else: final_response = "I apologize, I'm not sure how I can help with that. Let me transfer you to a human representative who can better assist you." # In production: Add logic to transfer chat to human representative return final_response

Step 6 — Testing Your Workflow

Now that the code for a basic agentic workflow is complete, you can test how it handles three different categories of customer questions. First, test a conversation about the price of a listing. In this example, the user does not provide the listing ID in the first request, so the model asks a follow-up question. The user then provides the ID, and the model returns the correct price.

Create a new file named test_workflow.py and add the following test code:

test_workflow.py

Copy Code


from agentic_workflows import run_agentic_workflow

conversation_history = [
    {"role": "user", "content": "Hi, can you tell me the price of one of your listings?"},
]
final_response = run_agentic_workflow(conversation_history)
print(f"Response: {final_response}")

Run the test script:

Copy Code

python3 test_workflow.py

You will see output similar to this:

Output

Copy Code

Response: Could you please provide the listing_id of the item you're asking about?

Next, test a complete conversation where the user provides the listing ID:

test_workflow.py

Copy Code


conversation_history = [
    {"role": "user", "content": "Hi, can you tell me the price of one of your listings?"},
    {"role": "assistant", "content": "Could you please provide the listing_id of the item you're asking about?"},
    {"role": "user", "content": "Yes, the listing ID is 123456"},
]
final_response = run_agentic_workflow(conversation_history)
print(f"Response: {final_response}")

Run the script again:

Copy Code

python3 test_workflow.py

You will see the pricing response:

Output

Copy Code

Response: The price for listing 123456 is $350,000.

Conclusion

In this tutorial, you created a parallel agentic workflow from the ground up with Python. You wrote asynchronous functions that send multiple LLM requests at the same time, configured several system prompts for different tasks, and added routing logic to process user requests.

To make this framework ready for production, you will need to test it, rewrite prompts, change models, and test it again with different conversations until the accuracy and speed match your requirements. With this type of workflow, each prompt should be tested individually and improved continuously. In this example, the Mistral-Small-3.2-24B model was selected because it provides low latency, with a median response time below 0.5 seconds, follows the specific instructions well, and can be deployed on a single GPU. GPT-OSS-120b, GPT-OSS-20b, and Llama 3.1 8b were also tested, but the Mistral model performed best for this particular use case. Selecting the right model for each prompt or task is an iterative process.

If you want to use agents with more autonomy, you can give the system prompts more or less independence by providing additional context or access to more tools. You can also chain follow-up LLM calls together for reasoning or other purposes. However, when a prompt requires several API calls in sequence, the overall latency will increase because the workflow is no longer waiting for only one group of asynchronous model calls.

The Mistral model in this tutorial was deployed with vLLM, so the prompt format did not need to be changed for different models. However, OpenAI, Claude, and open-source models may require different prompt formats when requests are sent to their APIs. If you use different model providers for each call, you may need to write Python functions that convert prompts into the format required by the model being called. For example, you might need a function that converts a vLLM-formatted prompt into a Claude-compatible prompt format.

Some models may respond much more slowly than others, which means the workflow may have to wait for the slowest response before all results are available. With Python’s asyncio library, you can process model responses as soon as they arrive by using asyncio.as_completed() instead of asyncio.gather(). Then you can check the earliest responses to determine whether they contain the route you want to continue with and proceed without waiting for every remaining response. This would also require updates to the logic layer.

When you add more system prompts, the main bottleneck you are likely to encounter is rate limiting from the LLM API or processing limits on your GPU. You can address this by hosting your own model on a GPU server, increasing API rate limit quotas, distributing requests across multiple APIs, and adding retry logic, timeout handling, and error handling for situations where rate limits occur.

FEATURED PRODUCTS

Kubernetes

ccloud³

Managed Server

Cloud GPU

S3 Object Storage

COMPUTE

MANAGED

STORAGE

NETWORKING

MANAGEMENT TOOLS

BACKUPS & SNAPSHOTS

WEBSITE HOSTING

HOUSING

FEATURED INDUSTRIES

Enterprise

Saas-Hosting

Startup

INDUSTRIES

MORE INDUSTRIES

FEATURED USE CASES

Linux-Hosting

VMware Migration

Docker Hosting

USE CASES

MORE USE CASES

RESSOURCES

Help Center

Trust Center

Glossar

Tutorials

MORE CENTRON

MORE INFOS

FEATURED PRODUCTS

Kubernetes

ccloud³

Managed Server

Cloud GPU

S3 Object Storage

COMPUTE

MANAGED

STORAGE

NETWORKING

MANAGEMENT TOOLS

BACKUPS & SNAPSHOTS

WEBSITE HOSTING

HOUSING

FEATURED INDUSTRIES

Enterprise

Saas-Hosting

Startup

INDUSTRIES

MORE INDUSTRIES

FEATURED USE CASES

Linux-Hosting

VMware Migration

Docker Hosting

USE CASES

MORE USE CASES

RESSOURCES

Help Center

Trust Center

Glossar

Tutorials

MORE CENTRON

MORE INFOS

Build Faster Agentic LLM Workflows with Asynchronous Python Calls

Key Takeaways

Prerequisites

Step 1 — Setting Up Your Environment

Step 2 — Writing the Asynchronous Call Logic

Step 3 — Writing Your Prompts

Step 4 — Processing User Input Through the Models

Step 5 — Defining the Workflow Logic

Step 6 — Testing Your Workflow

Conclusion

Create a Free Account

Posts you might be interested in:

Kimi Linear: Efficient Long-Context AI Inference

Apache Airflow: Workflow Orchestration Guide