How to Add Web Grounding to Large Language Model Responses with Python

When you send questions about recent or upcoming events to a large language model, the response may be inaccurate because the model can only rely on information that was available during its training. For instance, if you ask an LLM that was trained before the most recent presidential election who the current president is, and you do not provide access to the web, the model may either name the previous president or generate an incorrect answer. The example below shows a query sent to Llama 3.3 70B, whose training knowledge cutoff was December 2023.

example_llm.py

Copy Code


response = call_llm("Who is the current president?")
print(f"Response: {response}")

Output

Copy Code

Response: As of my knowledge cutoff in 2023, Joe Biden is the President of the United States. However, please note that my information may not be up-to-date, and I recommend checking with a reliable news source for the most current information.

Many LLM APIs handle this limitation by grounding the request with information from a web search. Web grounding is similar to retrieval augmented generation. The workflow checks whether the user’s input requires up-to-date web information, runs a search if needed, adds the search results as additional context to the final prompt, and then returns the answer to the user.

Web Grounding Workflow

If you decide to work with an open source model instead of an LLM API that already includes web grounding, you may need to build this workflow yourself. In this tutorial, you will create a Python-based workflow that uses web grounding to improve the accuracy of model responses.

Key Takeaways

LLMs only know information up to the point at which they were trained. Web grounding uses live web searches to give LLM prompts access to current information.
Web grounding is a form of retrieval augmented generation. The workflow identifies which additional context is needed to answer a prompt, searches the web for that context, and includes the search results in the prompt sent to the LLM.
Many proprietary models from providers such as OpenAI and Google include web grounding features. For other models, you may need to build your own web grounding process with Python logic and access to a web search API.

Step 1 — Getting Your API Keys

You need a tool that can search the web and return search results. Several search providers, including Google and Bing, have moved away from traditional search APIs and now promote their own proprietary web grounding or agent-based services. In this tutorial, you will use DuckDuckGo searches through a limited free plan with Serp API. Any search API can be used, provided that it gives you access to the title, snippet, and date of the web search results.

To create a free Serp API account and obtain an API key, visit the provider’s website and register for the free plan.

You also need access to an LLM. This tutorial uses Llama 3.3 70B through a generic serverless inference endpoint. You will need a model access key from your chosen inference provider to call the model.

Step 2 — Setting Up Your Environment

If Python is not already installed on your system, download it from the official Python website and install it.

Create a Python file named web_grounding_tutorial.py. In this script, import the requests library. Add your Serp API key and replace the placeholder value your_api_key with the API key from your Serp API account. You can usually find the API key in the API key section of your Serp API dashboard after logging in.

Next, add the inference URL and request headers for your LLM provider. Replace the placeholder value your_model_access_key with the model access key from your inference provider’s console.

web_grounding_tutorial.py

Copy Code


import requests

SERP_API_KEY = "your_api_key"

llm_url = "https://your-inference-provider.example.com/v1/chat/completions"
headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer your_model_access_key"
}

The request headers define the JSON content type and provide the model access key for authorization. Next, create a function that calls the LLM. This function sends an HTTP request to your serverless inference endpoint.

web_grounding_tutorial.py

Copy Code


def call_llm(prompt):
    data = {
        "model": "llama3.3-70b-instruct",
        "messages": [
            {
                "role": "user",
                "content": prompt
            }
        ],
        "max_tokens": 500
    }

    response = requests.post(llm_url, headers=headers, json=data)
    message = response.json()['choices'][0]['message']['content']
    return message

The call_llm function receives a prompt, sends it to the model through an HTTP request, and returns the response text.

Step 3 — Identifying Whether a Prompt Requires Web Grounding

Not every request needs to be grounded with a web search. For example, if you ask for a soup recipe, you may prefer the LLM to answer directly without searching the web. Your search API may also have usage limits or costs, so reducing unnecessary searches is useful. Web searches can also increase latency, which means it is better to search only when the prompt actually requires current information.

To decide whether a web search is required, create a function that asks the LLM whether the user’s input should be enhanced with search results.

web_grounding_tutorial.py

Copy Code


def needs_web_grounding(query):

    prompt = f"Does the following prompt need information more recent than December 2023 to answer correctly?. Answer only YES or NO.\nQuestion: {query}\nAnswer:"

    judgement = call_llm(prompt)

    return judgement.strip().upper().startswith("YES")

The needs_web_grounding function asks the model whether the user’s prompt requires information newer than the December 2023 knowledge cutoff of the Llama 3.3 70B model used in this tutorial. If the model responds with YES, the function returns a True boolean value. Otherwise, it returns False. This result determines whether the workflow should perform a web search.

If you call needs_web_grounding with the query Who is the current president?, it should return YES or True. If you call it with Why is the sky blue?, it should return NO or False.

The prompt inside the needs_web_grounding function will not perfectly detect every case where a web search is required. For example, if a country changes its capital city, the model may assume that capital cities rarely change and return NO. Many similar cases make this detection problem more complex than it may appear. The simple prompt shown above is sufficient for this tutorial. In a production environment, you should refine the prompt based on the level of error your application can tolerate.

Step 4 — Implementing the Web Search

Next, create the function that performs the web search. It will accept a search query and return the first three search results.

web_grounding_tutorial.py

Copy Code


def search_web(search_query): 
    params = {
        'engine': 'duckduckgo_light',
        'q': search_query,
        'api_key': SERP_API_KEY,
    }

    response = requests.get('https://serpapi.com/search', params=params)
    data = response.json()

    return data["organic_results"][:3]

The search_web function takes the user’s input and uses the SERP_API_KEY defined earlier to run a DuckDuckGo search through Serp API. It then returns the details of the top three search results. These results will be added to the user’s prompt before the LLM generates its final answer.

Step 5 — Writing the Workflow Logic

Now create a function that combines all of the functions you have written so far. It must take the user’s input, call needs_web_grounding to decide whether a search is necessary, and then either search the web and add the results to the prompt or call the LLM directly without extra context.

web_grounding_tutorial.py

Copy Code


def answer_with_web_grounding(user_input):

    if needs_web_grounding(user_input):
        print("Web search needed; fetching info...")
        search_results = search_web(user_input)
        prompt = f"Please respond to the user's query: {user_input}\n\nYou may use the following web search results as context in your answer: {search_results}"
    
    else:
        print("Web search NOT needed; answering from LLM knowledge.")
        prompt = user_input
      
    answer = call_llm(prompt)
    return answer

If needs_web_grounding returns True, the workflow builds a prompt that instructs the LLM to use the web search results as context. If it returns False, the workflow sends only the user’s input to the model.

Now run the following code:

web_grounding_tutorial.py

Copy Code


answer_with_web_grounding("Who is the current president?")

You should receive a response similar to the following.

Output

Copy Code

The current president of the United States is Donald J. Trump. He was sworn in as the 47th president on January 20, 2025, and his current term is set to end on January 20, 2029.

These results are not only current, but also include details such as the inauguration date and the expected end date of the term. This level of detail might not have been returned correctly if the model had answered without web grounding.

Improving Your Web Grounding Workflow

To improve this LLM workflow for your own application, the next step is to adjust the method that determines whether a prompt needs a web search. This should depend on the requirements of your application. If your application is designed only to find upcoming concert information, it may need a web search for every request. If the goal is only to reduce a portion of outdated answers, you can continue using the workflow built in this tutorial. The prompt in the needs_web_grounding function can be adapted for that purpose.

Next, you can clean up the search result output to reduce token usage and improve response accuracy. At the moment, the search_web function returns a JSON object similar to this:

Copy Code


[
  {
    "position": 1,
    "title": "President Donald J. Trump - The White House",
    "link": "https://www.whitehouse.gov/administration/donald-j-trump/",
    "displayed_link": "www.whitehouse.gov/administration/donald-j-trump/",
    "snippet": "President Trump built on his success in private life when he entered into politics and public service. He remarkably won the Presidency in his first ever run for any political office.",
    "favicon": "https://external-content.duckduckgo.com/ip3/www.whitehouse.gov.ico"
  },
  {
    "position": 2,
    "title": "Donald Trump Sworn In as 47th President in Capitol Ceremony ... - Yahoo",
    ...
  }
]

Several of these fields, such as position, link, and displayed_link, may not be necessary for the model’s answer. Review the format returned by your search API and create a function that removes unnecessary fields. This reduces the amount of information added to the prompt context. It may also be useful to encode the JSON into a TOON format to reduce token usage.

You may also notice that many answers are not fully available in the snippets returned by search results. For example, if you ask for the time of the next Seahawks game, the search snippet may include the day but not the exact time unless your workflow can access the content of the linked page.

web_grounding_tutorial.py

Copy Code


answer_with_web_grounding("What time is the next Seahawks game?")

Output

Copy Code

The next Seahawks game is today, Sunday, December 28, 2025, against the Panthers. However, the exact time of the game is not specified in the provided search results. I recommend checking the official Seattle Seahawks website...

One way to solve this is to add another step in which the workflow uses a Python library such as Beautiful Soup to fetch the page content and include that content in the prompt context. After adding this functionality, the context from the official Seahawks schedule page is included, and the model can determine that today’s game has already ended and that the exact date and time of the next game have not yet been scheduled.

Output

Copy Code

The next Seahawks game is scheduled for Week 18, but the date, time, and location are still "TBD" (to be determined) as it is against the 49ers at Levi's Stadium.

For better security, remove hardcoded API keys and URLs from your source code. One option is to store them as environment variables and load them into your scripts.

Finally, every LLM workflow should include enough error handling to account for the non-deterministic and sometimes unpredictable behavior of AI applications.

FAQ

Can web grounding work across a conversation instead of only one prompt?

Yes. If you want to support a conversational format where users can ask follow-up questions, you can include the grounding context in the conversation itself. If this adds too many tokens to the conversation context, consider removing older conversation entries to reduce token usage, as long as your application allows it.

Web grounding increases latency because the web search takes time. How can it be made faster?

You can run LLM calls asynchronously. Send the user’s prompt to the LLM without search results while also sending a separate request that asks whether the prompt needs web grounding. After both responses are available, run the web search only if it is required, and then call the LLM again with the web search context. This can reduce latency for prompts that do not require a web search. However, it also increases token usage because the first LLM response is discarded whenever a web search is needed.

Web searching uses too many tokens. How can token usage be reduced?

One option is to use a smaller model inside the needs_web_grounding function. You may not need a 70B-parameter model to decide whether a prompt requires web search context. A smaller model may be cheaper and faster for this classification step.

Conclusion

Web grounding is a powerful approach for improving the accuracy of AI applications. It can be integrated into any retrieval augmented generation workflow you already use. With web grounding, older or smaller models that were trained on less recent data can still return accurate and current answers.

In this tutorial, you used a Serp API endpoint to run DuckDuckGo searches and enhanced the user prompt with context from the returned search results. The next step is to adapt the implementation to the needs of your own application.

FEATURED PRODUCTS

Kubernetes

ccloud³

Managed Server

Cloud GPU

S3 Object Storage

COMPUTE

MANAGED

STORAGE

NETWORKING

MANAGEMENT TOOLS

BACKUPS & SNAPSHOTS

WEBSITE HOSTING

HOUSING

FEATURED INDUSTRIES

Enterprise

Saas-Hosting

Startup

INDUSTRIES

MORE INDUSTRIES

FEATURED USE CASES

Linux-Hosting

VMware Migration

Docker Hosting

USE CASES

MORE USE CASES

RESSOURCES

Help Center

Trust Center

Glossar

Tutorials

MORE CENTRON

MORE INFOS

FEATURED PRODUCTS

Kubernetes

ccloud³

Managed Server

Cloud GPU

S3 Object Storage

COMPUTE

MANAGED

STORAGE

NETWORKING

MANAGEMENT TOOLS

BACKUPS & SNAPSHOTS

WEBSITE HOSTING

HOUSING

FEATURED INDUSTRIES

Enterprise

Saas-Hosting

Startup

INDUSTRIES

MORE INDUSTRIES

FEATURED USE CASES

Linux-Hosting

VMware Migration

Docker Hosting

USE CASES

MORE USE CASES

RESSOURCES

Help Center

Trust Center

Glossar

Tutorials

MORE CENTRON

MORE INFOS

How to Add Web Grounding to Large Language Model Responses with Python

Web Grounding Workflow

Key Takeaways

Step 1 — Getting Your API Keys

Step 2 — Setting Up Your Environment

Step 3 — Identifying Whether a Prompt Requires Web Grounding

Step 4 — Implementing the Web Search

Step 5 — Writing the Workflow Logic

Improving Your Web Grounding Workflow

FAQ

Can web grounding work across a conversation instead of only one prompt?

Web grounding increases latency because the web search takes time. How can it be made faster?

Web searching uses too many tokens. How can token usage be reduced?

Conclusion