amazon-dynamodb – 使用Lambda的DynamoDB Streams,如何按顺序处理记录(按逻辑组)?

我想使用DynamoDB Streams AWS Lambda来处理聊天消息.必须按顺序处理有关同一对话user_idX:user_idY(房间)的消息.全球订购并不重要.

假设我以正确的顺序(房间:msg1,房间:msg2等)提供DynamoDB,如何保证流将按顺序提供AWS Lambda,并保证在单个流中处理相关消息(房间)?

例如,考虑到我有2个分片,如何确保逻辑组转到同一个分片?

我必须做到这一点:

Shard 1: 12:12:msg3 12:12:msg2 12:12:msg1 ==> consumer
Shard 2: 13:24:msg2 51:91:msg3 13:24:msg1 51:92:msg2 51:92:msg1 ==> consumer

而不是这个(消息是尊重我在数据库中保存的顺序,但它们被放置在不同的分片中,因此并行地错误地处理同一房间的不同序列):

Shard 1: 13:24:msg2 51:92:msg2 12:12:msg2 51:92:msg2 12:12:msg1 ==> consumer
Shard 2: 51:91:msg3 12:12:msg3 13:24:msg1 51:92:msg1 ==> consumer

这个官方post提到了这一点,但我在文档中找不到如何实现它:

The relative ordering of a sequence of changes made to a single
primary key will be preserved within a shard. Further, a given key
will be present in at most one of a set of sibling shards that are
active at a given point in time. As a result, your code can simply
process the stream records within a shard in order to accurately track
changes to an item.

问题

1)如何在DynamoDB Streams中设置分区键?

2)如何创建保证分区密钥一致传送的流分片?

3)毕竟这真的有可能吗?由于官方文章提到:给定的密钥最多会出现在一组兄弟分片中,这些分片在给定的时间点处于活动状态,因此似乎msg1可以转到分片1然后msg2转到分片2,就像我的上面的例子?

编辑:在this问题中,我发现了这个:

The amount of shards that your stream has, is based on the amount of
partitions the table has. So if you have a DDB table with 4
partitions, then your stream will have 4 shards. Each shard
corresponds to a specific partition, so given that all items with the
same partition key should be present in the same partition, it also
means that those items will be present in the same shard.

这是否意味着我可以自动实现我需要的东西? “具有相同分区的所有项目将出现在同一个分片中”.兰帕达是否尊重这一点?

编辑2:从FAQ

The ordering of records across different shards is not guaranteed, and
processing of each shard happens in parallel.

我并不关心全局排序,只是按照逻辑方式排序.但是,不清楚分片是否合乎逻辑地使用常见问题解答中的答案.

最佳答案 对相同密钥进行更新的有序处理将自动进行.如
this presentation中所述,每个活动分片运行一个Lambda函数.由于特定分区/排序键的所有更新都只出现在一个分片谱系中,因此将按顺序处理它们.

点赞