The contextvars and the chain of asyncio tasks in Python

2024-08-10

In Python, the contextvars is somehow like thread local, but mostly for coroutines. But will the context itself be kept consistent in a chain of asyncio tasks? Let’s find out.

contextvars

Let’s have a rough review of the contextvars in Python. The contextvars module provides a way to maintain a context across a chain of function calls, natively supported with asyncio. For example

import asyncio
import contextvars

var = contextvars.ContextVar('var', default={})

async def sub2():
    print(f'in sub2, {var.get()=}')
    var.set('sub1 set')

async def sub1():
    print(f'in sub1, {var.get()=}')
    var.set('sub1 set')
    await sub2()

async def main():
    var.set('main set')
    await sub1()
    print(f'in main, {var.get()=}')

asyncio.run(main())

The output would be

in sub1, var.get()='main set'
in sub2, var.get()='sub1 set'
in main, var.get()='sub1 set'

A context may contain a bunch of contextvars.ContextVar objects, and the value of a contextvars.ContextVar object can be set and get in the context.

There are many use cases for contextvars. For example,

  • Put the request/response objects as contextvars for every requests in a web server, so that the request/response objects can be accessed in any functions in the request handling chain.
  • Put the job metadata(like the job id, the job status, etc.) as contextvars for every job in a job queue, so that the job metadata can be accessed in any functions in the job handling chain.
  • Put the tracing information(like the user id, the request id, etc.) as contextvars for every traces or spans in a logging system, so that the logs in the same traces/span chain can be identified.

The consistency of contextvars

contextvars is a good way to avoid passing the same arguments to every function in a chain of (async/await) function calls. But will the context itself be kept consistent in a chain of asyncio tasks?

The answer actually depends on how the asyncio tasks are chained. Let’s have a look at the following example.

import asyncio
import contextvars

var = contextvars.ContextVar('var', default={})

async def coro_func(level):
    var.set(level)
    print(f'{level=}, {var.get()=}')
    if level:
        await asyncio.create_task(coro_func(level-1))
    print(f'{level=}, {var.get()=}')

asyncio.run(coro_func(2))

The example uses asyncio Tasks instead of directly async/await calls to chain the coroutines. Consider a Task as a wrapper of a coroutine, and enables more control over the coroutine execution, like canceling the task, etc.. But when we chaining the Tasks, the context is not kept consistent across the Tasks:

level=2, var.get()=2
level=1, var.get()=1
level=0, var.get()=0
level=0, var.get()=0
level=1, var.get()=1
level=2, var.get()=2

When the execution gets out of an inner Task, the context of the inner Task also ends and the context of the outer Task is restored. The context is not kept consistent across the Tasks. The reason is that when a Task is created, the context of the Task is a new context object copied from the current context. Any write operations to the new context object will not affect the outside context object. So if we use Tasks to chain the coroutines, the default task creation behavior will prevent the context from being kept consistent across the Tasks.

After Python 3.11, create_task introduces context parameter to pass a specific context to the newly created Task. So we can modify the example above like this:

import asyncio
import contextvars

var = contextvars.ContextVar('var', default={})

async def coro_func(level):
    var.set(level)
    print(f'{level=}, {var.get()=}')
    if level:
        context = asyncio.current_task().get_context() if asyncio.current_task() else None
        await asyncio.create_task(coro_func(level-1), context=context)
    print(f'{level=}, {var.get()=}')

asyncio.run(coro_func(2))

The output would be

level=2, var.get()=2
level=1, var.get()=1
level=0, var.get()=0
level=0, var.get()=0
level=1, var.get()=0
level=2, var.get()=0

Therefore the context object is the same one across the Tasks. The modification at the inner Task is reflected in the outer Task.

The mechahism of context parameter to pass the context object are also available in

  • asyncio.call_soon, asyncio.call_laster and asyncio.call_at
  • asyncio.Handle
  • asyncio.Runner
  • TaskGroup.create_task

The mechanism provides a way to let the users decide whether to keep the context consistent across the Tasks. But the detail are very easily kept out of the sight since the Task creation is usually hidden in the framework or library code.

The case of starlette middlewares and anyio

The famous python web framework fastapi is built on the ASGI framework starlette and fastapi simply use the starlette middlewares mechanism. Each layer of the starlette middlewares are executed as a chain of function calls handled by anyio. starlette uses anyio.create_task_group() to run a layer of middlewares. When using asyncio, anyio simply wraps the execution of anyio.TaskGroup with asyncio.create_task(without context pass in 4.4.0).

So the context of the starlette middlewares are not kept consistent across the layers of middlewares. If you want to change a contextvars.ContextVar in a middle layer of middlewares, the change will not be reflected in the outer layers of middlewares. You should be aware of this when you are using contextvars in the starlette middlewares.

A thought on coroutines vs. threads

When we say “the contextvars is somehow like thread local, but mostly for coroutines”, we think that coroutines are just another types of threads. But actually we are not right. The coroutines are not threads. A thread has its own life cycle and its own stack space, so a thread local storage actually belongs to a thread. But a coroutine is just a function that can be paused and resumed, scheduled by a loop. So the context of a coroutine belongs not to the coroutine itself. When in a chain of coroutines, like the calling chain of Tasks, we need to be aware that the consistency of context are manually kept by the users, not automatically kept by the system.