Faust
Distributed Systems
Shared Tables
Python
Multi-agent Systems

How to share faust table between multiple agents or faust timers?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Faust is a stream processing library for Python, built on top of Kafka. It provides a straightforward approach to creating real-time streaming applications. One of the core components of Faust is its ability to use tables, which are similar to dictionaries but are fault-tolerant and can be restored after a system failure, making them highly suitable for stateful stream processing. However, when working in an environment with multiple agents or timers that need access to the same state information, sharing Faust tables can become necessary. This article explores the techniques available to share a Faust table among multiple agents or across different Faust timers.

Understanding Faust Tables

A Faust table is essentially a distributed key-value store that allows for stateful operations in a Faust application. The table’s data is partitioned across the Kafka topic partitions they map to, meaning each Faust worker reads from specific partitions which in turn define the subset of table data that the worker owns.

Techniques to Share Faust Table

Using Global Tables

Global tables are an effective way to ensure that the same data is available across different partitions and therefore to all consuming agents. Unlike standard tables, where data is partitioned, global tables synchronize their data across all nodes. Here’s a simple example:

python
1import faust
2
3app = faust.App('myapp', broker='kafka://localhost')
4global_table = app.GlobalTable(name='globalusers', default=int)
5
6@app.agent()
7async def process(stream):
8    async for value in stream:
9        user_count = global_table['users'] += 1
10        print(f'Total Users: {user_count}')

In this example, irrespective of which worker or agent accesses the globalusers table, it will see the same count of users.

Sharing Tables Using Views in Agents

Faust also supports creating views of tables which can be exposed via web views or queried by other parts of your application, including different agents. Here is an example:

python
1@app.page('/user_count/')
2async def get_user_count(web, request):
3    return web.json({'user_count': global_table['users']})
4
5# This can be queried by other agents or parts of your application.

Inter-Agent Communication

Sometimes direct sharing through global tables or views is not flexible or necessary. In such cases, agents can communicate indirectly using topics. An agent can write its results to a topic, and another agent can read this topic to update its local table or perform further processing.

python
1result_topic = app.topic('results', value_type=int)
2
3@app.agent()
4async def processor(stream):
5    async for value in stream:
6        # process and send to topic
7        await result_topic.send(value=value)
8
9@app.agent(result_topic)
10async def result_processor(stream):
11    async for result in stream:
12        # update local table or perform further action
13        pass

Considerations for Timers

Faust timers are functions that allow you to run code at regular intervals; they are perfect for tasks like cleanup, reporting, or regular updates. Sharing data between timers can use the same approach as agents:

  1. Using global tables for shared state.
  2. Emitting events to topics that other timers monitor and react to.

Conclusion and Best Practices

When designing solutions that involve sharing states across multiple agents or timers in Faust, it's crucial to consider the architecture of your system and the consistency requirements of your data. Global tables offer simplicity and ensure data consistency, but could lead to increased network traffic and potential bottlenecks. In applications where eventual consistency is acceptable, using topics to communicate changes can be more scalable.

Summary Table

MethodUse CaseBenefitsConsiderations
Global TablesShared state across all agents/workersSimple to use, consistent dataHigh network traffic, scalability concerns
Views & Web EndpointsQuerying state from agentsDecouples data access from processingRequires HTTP/web handling
Inter-Agent TopicsIndirect sharing & handling event-driven dataScalable, flexibleEventual consistency, complex handling

Each method offers different trade-offs in terms of simplicity, performance, and scalability, thus understanding these can help in selecting the right approach based on specific requirements.


Course illustration
Course illustration

All Rights Reserved.