数据库 – 建模基于sql数据操作操作的基于集合的代码清单

2023年11月24日 192次阅读

像LINQ这样的技术可以很好地描述关系数据查询,其类型包括IQueryable,IGrouping和IOrderedQueryable建模投影,选择,聚合,排序等.来自关系代数的这些概念允许我们在一个相当随意的查询中进行通信.一台机器上的一种语言,并在另一台机器上以不同的语言(~sql)执行.

能够为更复杂的多部分查询做同样的事情,甚至是涉及INSERT,UPDATE和DELETE的数据操作命令,它们可以描述完整的操作而不需要首先检索/补充数据的开销.应用层,它是Object-Relational-Mappers或ORM的典型代表.

在应用程序中,我们可以描述删除最近订单超过2年的所有客户(及其订单)之类的操作(此外,假设该关系未启用级联删除).这对于使用t-sql脚本的ADO来说当然是可行的,但是在ORM中无法在没有选择,传输,保湿和跟踪应用程序层中的数据以及可能发出单独的删除命令的情况下完成. (也许有一些可用于ORM的优化可以在某些情况下更有效地执行此操作,但通常AFAIK他们不能这样做)当然,发出t-sql脚本的问题是在语句中没有类型检查,也没有任何参数或返回数据.

除了减少运行时处理和网络聊天开销之外,能够为远程执行建模这些任意命令的一个惊天动地的优点是,域范围的不变量可以在应用程序层中编码和注册,然后可以随之自动发送临时命令.

我们可能有一个愚蠢的域不变A对于所有客户,订单价格的每个客户的总和不能超过$10,000,000.00,除非whoa位为1而另一个愚蠢的域不变B表示对于所有客户,lastName不能包含多个三个下划线(尽管这些可能可以通过检查约束的本机机制或数据库引擎本身的触发器来强制执行).然后,当我们发出更新现有订单价格的命令时,系统可以通过静态分析知道由于命令而可能违反了不变量A,而且不可能B不能,因此系统会在之后发出一些A的断言.原始命令.整个发出的脚本将被包装在一个事务中(如果断言失败,则为回滚),并且可以自动缩小不变量以仅针对特定客户的订单集断言规则,而不是不必要地重新检查所有客户的总计.我相信在今天的产品中,这种优化的,集中的,干的,业务规则编码/执行是不可能的.

为了实现这个潜力,我想我们需要一个代数(超出SELECT的关系代数)来描述INSERT,UPDATE和DELETE(统称为DML)的任意数据操作,甚至像中间计算值之类的东西用于计算的临时表,表示为t-sql中的多语句列表.

不幸的是,我一直无法找到关于将DML形式化为代数,或者能够对其进行建模或者这种元编程的研究.虽然Tutorial-D和jOOq似乎为讨论提供了一些东西 – 我只是不知道如何提取它.你能说清楚吗？

我认为有些讨论很有价值,但我想避免用它来填写评论：

Are you suggesting that domain models aren’t a good fit to protect invariants and establishing transactional boundaries? The invariants you mentioned aren’t hard to protect using a proper domain model. What problem are you trying to avoid exactly?

– plalx

As I understand it, large domains in typical ddd require bounded contexts to avoid having to hydrate large subsets of the data into the application layer for validation. I am trying to avoid that overhead. Also, domain invariants must be non-trivially restated for each bounded context, which is error-prone. By modeling the operations for remote execution, we get smarter/smaller/faster/more correct code.
In some core library, the domain could be modeled and the invariants registered. Then consumers of that library, such as for a web service, could then construct type-checked descriptions of arbitrary operations without explicit consideration for bounded contexts or particular invariants. The domain core offers to its consumers “this is the full range of what you can do over this domain” and (perhaps) the service code offers to its clients “these are the exact features we’re offering”.

– uosɐs

I’m not sure if you understood correctly what a Bounded Context is and how they might communicate with each-other. “Also, domain invariants must be non-trivially restated/maintained for each bounded context which is error-prone” There’s usually just one context that have data ownership and that context shall be responsible for invariants involving it’s own data. For instance, imagine a company that sells goods on Internet. They might have an Inventory context where products gets maintained and a Shopping context that listen to newly available products from the Inventory.

– plalx

I’m not very much arguing against current ddd techniques, so I’m not choosing excellent examples against them. I’m more interested in this alternative arrangement which I intuit would be more natural and advanced than current ddd techniques. I’ve seen data models that are extremely intertwined and don’t offer obvious boundaries (perhaps poorly designed, OK). I expect that this way could be boundaryless AND more performant.

– uosɐs

If there was a rule that a Product name couldn’t contain the word “propaganda” it would be enforced only in the Inventory context. If we were to duplicate invariants of every contexts in every other contexts it would indeed become a maintenance nightmare.

– plalx

But you plausibly might have a bounded context centered on Customers and a second bounded context centered on Orders. And maybe the $10,000,000.00 Limit I mentioned is made to be a column in Customer (and therefore variable), so this business rule can be violated in two ways: either by dropping that Limit on Customer or increasing totals in Order. So non-trivially reciprocal rules must check for violations in either bounded context depending on the change. Our system could decide to skip the assertion if Prices and Limits aren’t changed, which would be pretty slick, no? In the traditional ddd, you might also need some optimized variants for bulk manipulations (Add an Order of $1000 to every Customer) which could be automatically derived by our new system.

– uosɐs

最佳答案不像它看起来那样,你不需要的一件事是“超越”关系代数的东西.这根本不是理论问题,而是想象力和工程学问题.您所谈论的问题涉及多个领域：编程语言,库支持和DBMS.它可以完成(并且应该).但首先,它需要被普遍理解为现实和可取的,我们还没有.

就代数而言,所有缺失的都是赋值.如果您已阅读日期的第三个宣言,您可能会记得插入/更新/删除只是作业的变化：

S += f(R)        -- insert
S += f(R) - g(S) -- update
S -= f(R)        -- delete

(Python在标准库中使用set类进行了很好的演示,顺便说一句,除了你没有获得开箱即用的元组集合的运算符.)

所以这不是一个理论问题;代数很好.而且你也不是纯粹要求语法.在我看来,你想要的是一个DBMS,你可以在没有SQL和SQL生成器的情况下进行功能操作,充当中介.如果数据库中的表作为编程语言中的变量出现,并且有一个支持select,project和join的关系代数库(对于那种语言),那不是很好吗？

那么,为什么不将关系运算符纳入适当的语言？在关系理论发明40年后,为什么它的使用仅限于数据库？事实上,数十年来一直是数据库社区的哀叹.虽然它已经完成 – 参见例如,数据记录 – 近年来我们看到的新语言的过剩一直是值得注意的,因为它继承了不支持集合论操作的C传统.

但事实上,仅仅将关系和关系运算符构建到语言中是不够的.编程语言通常希望定义它们的变量,并专门拥有它们.这实际上是编程语言的定义：定义和操作内存卡盘的东西,其生命周期受程序执行的限制.有趣的数据通常在某处“开始”,而不是在程序存储器中.

所以,你真正真正想要的是“在数据库中”操纵数据,好像这些表是程序变量(也称为远处的动作),然后是一些超级方便,理想透明的方式将结果移入程序记忆.就像,哦,任务.要在这个方向上取得任何进展,你需要DBMS的合作.

现在,为了与典型的DBMS进行交互,您可以用其语言(通常是SQL)来表达您的问题,并逐行将输出提取到程序存储器中.它是一个I / O模型：写字符串,读取结果.要将I / O从编程模型中删除,您需要一个不同的API,更像是RPC.如果编程语言和DBMS使用相同的数据模型(关系)和函数(关系代数)和数据类型,那么您就有可能以相同的方式操作远程和本地数据.

这是套房：

>对关系和关系运作的语言支持
>语言识别本地和机器外变量
> DBMS支持以编程方式公开表定义,以便编译器/解释器可以“链接”它们作为库符号
> DBMS支持远程调用关系运算符,按功能运行,不支持语句

你可能已经注意到,在合理的近似值下,没有人试图做到这一点.语言设计者普遍忽略集合论和谓词逻辑. DBMS供应商 – 以及流行的免费项目 – 受到SQL的束缚,对修复SQL的集合理论缺陷或通过逻辑函数API暴露他们的系统完全不感兴趣.任何人心中最开始的事情就是开发一套完整的类型和运算符.

那么我们又有什么呢？ Linc是跳舞熊的一个很好的例子,它将字符串和原始类型的SQL拼凑在一起,在管道上喷射它,并将数据库表表达为由宿主语言提供的逐行操作.考虑到当今环境的现实,这是一个非常好的节目.但是,正如你的问题所表明的那样,新奇感已经消失,而且工作也不容易.你可能想要保留你的机票：通过当前的速度和方向来判断进度,你将在同一个座位上待上40年.