Async Python with asyncio and httpx (Series Part 3)

Part 3 is where speed comes from. LLM and API calls spend almost all their time waiting on the network, which is exactly the case async Python is built for. This part explains the event loop without hand waving, shows how to fan out many calls at once with httpx and asyncio, and covers the mistakes that quietly make async code run serially.

★

What you will learn

What async and await actually do, in plain terms
Running many network calls concurrently with asyncio.gather
A reusable async httpx client and why you keep it open
Bounding concurrency so you do not overwhelm an API

1. Async in one idea

A coroutine is a function that can pause at an await point and let the event loop run something else while it waits. For work that is bound by waiting on I/O, that means one thread can keep dozens of calls in flight at once. It does not make CPU-bound code faster.

Python playground

Three calls that each wait 0.3 seconds finish in about 0.3 seconds total, not 0.9, because they wait at the same time. Change gather to a plain loop with await inside and you would see them run one after another.

2. One client, many requests

httpx is an HTTP client with a first class async API. Create one AsyncClient and reuse it, because it pools connections. Creating a new client per request throws that away and is a common performance bug.

import asyncio
import httpx

async def get_json(client: httpx.AsyncClient, url: str) -> dict:
    resp = await client.get(url, timeout=10.0)
    resp.raise_for_status()
    return resp.json()

async def main() -> None:
    urls = [f"https://httpbin.org/anything/{i}" for i in range(5)]
    async with httpx.AsyncClient() as client:
        results = await asyncio.gather(*(get_json(client, u) for u in urls))
    print(f"fetched {len(results)} responses")

asyncio.run(main())

3. Bounding concurrency

Firing a thousand requests at once is a good way to get rate limited or run out of sockets. A semaphore caps how many run at the same time while still keeping the pipeline full.

import asyncio
import httpx

async def limited_get(sem: asyncio.Semaphore, client: httpx.AsyncClient, url: str) -> int:
    async with sem:                     # at most N in flight at once
        resp = await client.get(url, timeout=10.0)
        return resp.status_code

async def main() -> None:
    sem = asyncio.Semaphore(8)
    async with httpx.AsyncClient() as client:
        urls = [f"https://httpbin.org/status/200?i={i}" for i in range(50)]
        codes = await asyncio.gather(*(limited_get(sem, client, u) for u in urls))
    print("all ok:", all(c == 200 for c in codes))

asyncio.run(main())

✓ Pros

Reuse one AsyncClient across requests
Use asyncio.gather to run independent calls together
Set a timeout on every network call

✕ Cons

Awaiting calls one by one in a loop runs them serially
Calling a blocking library inside async stalls the loop
Unbounded concurrency triggers rate limits and socket errors

Checkpoint

Why does async speed up LLM and API calls but not heavy math?

4. Timeouts and cancellation

A network call that never returns will hang a request forever. Always set a timeout, both on the client and, for whole operations, with asyncio.timeout. When the timeout fires, the coroutine is cancelled cleanly and you handle the error instead of leaking a stuck task. This matters even more for model calls, which can be slow, because a hung call ties up a worker that could be serving someone else.

import asyncio
import httpx

async def fetch_with_deadline(client: httpx.AsyncClient, url: str) -> int:
    try:
        async with asyncio.timeout(5.0):        # whole-operation deadline
            resp = await client.get(url)
            return resp.status_code
    except TimeoutError:
        return 504   # treat a slow upstream as a gateway timeout

async def main() -> None:
    async with httpx.AsyncClient() as client:
        print(await fetch_with_deadline(client, "https://httpbin.org/delay/1"))

asyncio.run(main())

5. Reacting as results arrive

asyncio.gather waits for everything before giving you any result. Sometimes you want to handle each result the moment it is ready, for example to stream progress to a user or to stop early once you have enough. asyncio.as_completed yields tasks in completion order, not submission order.

import asyncio

async def work(name: str, delay: float) -> str:
    await asyncio.sleep(delay)
    return name

async def main() -> None:
    tasks = [work("slow", 0.3), work("fast", 0.1), work("mid", 0.2)]
    for coro in asyncio.as_completed(tasks):
        result = await coro
        print("ready:", result)   # prints fast, mid, slow as they finish

asyncio.run(main())

6. Do not block the event loop

This is the mistake that quietly ruins async performance. The event loop runs on one thread, so any blocking call inside a coroutine, a synchronous library, a heavy computation, time.sleep, freezes every other task while it runs. The fix for a blocking library is asyncio.to_thread, which pushes the work onto a worker thread so the loop stays free to handle other requests.

import asyncio

def parse_large_file(path: str) -> int:
    # a blocking, CPU-and-disk-bound function from a sync library
    with open(path) as f:
        return sum(1 for _ in f)

async def handler(path: str) -> int:
    # run the blocking call off the event loop so the server stays responsive
    return await asyncio.to_thread(parse_large_file, path)

⚠

Warning

One blocking call stalls everything

time.sleep, requests.get, and synchronous database drivers all block the loop. Inside async code use asyncio.sleep, an async client like httpx, and an async driver, or wrap the blocking call in asyncio.to_thread.

7. Structured concurrency with TaskGroup

Modern asyncio favors TaskGroup over loose gather calls. A task group scopes a set of tasks: if any task fails, the others are cancelled and the error propagates, so you never leave orphaned tasks running in the background. It is the safer default for launching several calls that should succeed or fail as a unit.

import asyncio
import httpx

async def main() -> None:
    async with httpx.AsyncClient() as client:
        async with asyncio.TaskGroup() as tg:
            a = tg.create_task(client.get("https://httpbin.org/get"))
            b = tg.create_task(client.get("https://httpbin.org/uuid"))
        # both are done here; a failure in either cancels the other
        print(a.result().status_code, b.result().status_code)

asyncio.run(main())

8. A reusable async client with retry and backoff

In a real app you do not scatter raw httpx calls everywhere. You wrap them in a small client so timeout, retry, and error handling live in one place that the rest of the code and your tests can rely on. Transient failures, a brief network blip or a rate limit, deserve a retry with exponential backoff, while a 404 should fail immediately. The class below encodes that policy once.

import asyncio
import httpx

class ApiClient:
    def __init__(self, base_url: str, retries: int = 3) -> None:
        self._client = httpx.AsyncClient(base_url=base_url, timeout=10.0)
        self._retries = retries

    async def get_json(self, path: str) -> dict:
        for attempt in range(self._retries):
            try:
                resp = await self._client.get(path)
                resp.raise_for_status()
                return resp.json()
            except httpx.HTTPStatusError as e:
                if e.response.status_code < 500:
                    raise                      # client error: do not retry
                await asyncio.sleep(2 ** attempt)   # 1s, 2s, 4s backoff
            except httpx.TransportError:
                await asyncio.sleep(2 ** attempt)
        raise RuntimeError(f"giving up on {path} after {self._retries} tries")

    async def aclose(self) -> None:
        await self._client.aclose()

Two details make this production friendly. It only retries server errors and transport failures, never a 4xx that will fail the same way every time, and it backs off so a struggling upstream is not hammered. The aclose method matters because the client holds a connection pool that should be closed on shutdown.

9. Lifecycle: open once, close cleanly

Where do you create and destroy that client in a long running server? Not per request, which throws away connection pooling, and not as a leaked global. FastAPI gives you a lifespan hook that runs on startup and shutdown, the natural place to open shared clients and close them. You will see this exact pattern again when you wire the model client into the app.

from contextlib import asynccontextmanager
from fastapi import FastAPI

@asynccontextmanager
async def lifespan(app: FastAPI):
    app.state.api = ApiClient(base_url="https://api.example.com")
    yield                       # app runs here
    await app.state.api.aclose()  # clean shutdown

app = FastAPI(lifespan=lifespan)

💡

Pro tip

A good async client is boring on purpose: one place for timeouts, one retry policy, one connection pool, and a clean shutdown. Boring here means no mystery latency, no socket exhaustion, and tests that can swap the whole client for a fake.

The bottom line

Async Python is the right tool when your program spends its time waiting on the network, which describes almost every LLM and API workload. Reuse one client, fan out with gather, and bound concurrency with a semaphore. FastAPI is async first, so the next part puts these exact patterns to work inside real endpoints.

? Frequently asked questions

Should every function be async? +

No. Make functions async when they perform awaitable I/O. Mixing a blocking call into an async function stalls the whole event loop.

How do I run blocking code from async? +

Use asyncio.to_thread to push a blocking call onto a worker thread so the event loop stays responsive.

Up next: Part 4, FastAPI fundamentals.

Async Python Deep Dive: asyncio, httpx, and Concurrency for API and LLM Calls

What you will learn

1. Async in one idea

2. One client, many requests

3. Bounding concurrency

✓ Pros

✕ Cons

4. Timeouts and cancellation

5. Reacting as results arrive

6. Do not block the event loop

One blocking call stalls everything

7. Structured concurrency with TaskGroup

8. A reusable async client with retry and backoff

9. Lifecycle: open once, close cleanly

The bottom line

? Frequently asked questions

Bishrul Haq

Tags

Share

Comments

Related posts

Important functionalities of Pandas in Python : Tricks and Features

How to get data from twitter using Tweepy in Python?

Predicting per capita income of the US using linear regression

Essential Sorting Algorithms for Computer Science Students

Python 3.14 for Real Projects: Free Threading, JIT, t-Strings, and Zstandard