我通过
Python从一些API源收集数据,并将其添加到Postgres中的2个表中.
然后,我使用此数据来生成报告,加入和分组/过滤数据.我每天都会添加数千行.
成本,收入和销售总是累积的,这意味着每个数据点来自该产品的t1,而t2是数据回溯的时间.
因此,最新的数据拉动将包括所有先前的数据,直到t1. t1,t2是Postgres中没有时区的时间戳.我目前使用的是Postgres 10.
样品:
id, vendor_id, product_id, t1, t2, cost, revenue, sales
1, a, a, 2018-01-01, 2018-04-18, 50, 200, 34
2, a, b, 2018-05-01, 2018-04-18, 10, 100, 10
3, a, c, 2018-01-02, 2018-04-18, 12, 100, 9
4, a, d, 2018-01-03, 2018-04-18, 12, 100, 8
5, b, e, 2018-25-02, 2018-04-18, 12, 100, 7
6, a, a, 2018-01-01, 2018-04-17, 40, 200, 30
7, a, b, 2018-05-01, 2018-04-17, 0, 95, 8
8, a, c, 2018-01-02, 2018-04-17, 10, 12, 5
9, a, d, 2018-01-03, 2018-04-17, 8, 90, 4
10, b, e, 2018-25-02, 2018-04-17, 9, 0-, 3
成本和收入来自两个表,我将它们加入vendor_id,product_id和t2.
有没有办法我可以浏览所有数据并“移位”它并减去,所以我没有累积数据,而是基于时间序列的数据?
这应该在存储之前完成,还是在制作报告时更好?
作为参考,目前如果我想要一个两次变化的报告,我会做两个子查询,但它似乎倒退而不是按时间序列计算数据,只是聚合所需的间隔.
with report1 as (select ...),
report2 as (select ...)
select .. from report1 left outer join report2 on ...
非常感谢提前!
JR
最佳答案 您可以使用LAG():
…returns value evaluated at the row that is offset rows before the
current row within the partition; if there is no such row, instead
return default (which must be of the same type as value). Both offset
and default are evaluated with respect to the current row. If omitted,
offset defaults to 1 and default to null.
with sample_data as (
select 1 as id, 'a'::text vendor_id, 'a'::text product_id, '2018-01-01'::date as t1, '2018-04-18'::date as t2, 50 as cost, 200 as revenue, 36 as sales
union all
select 2 as id, 'a'::text vendor_id, 'b'::text product_id, '2018-01-01'::date as t1, '2018-04-18'::date as t2, 55 as cost, 200 as revenue, 34 as sales
union all
select 3 as id, 'a'::text vendor_id, 'a'::text product_id, '2018-01-01'::date as t1, '2018-04-17'::date as t2, 35 as cost, 150 as revenue, 25 as sales
union all
select 4 as id, 'a'::text vendor_id, 'b'::text product_id, '2018-01-01'::date as t1, '2018-04-17'::date as t2, 25 as cost, 140 as revenue, 23 as sales
union all
select 5 as id, 'a'::text vendor_id, 'a'::text product_id, '2018-01-01'::date as t1, '2018-04-16'::date as t2, 16 as cost, 70 as revenue, 12 as sales
union all
select 6 as id, 'a'::text vendor_id, 'b'::text product_id, '2018-01-01'::date as t1, '2018-04-16'::date as t2, 13 as cost, 65 as revenue, 11 as sales
)
select sd.*
, coalesce(cost - lag(cost) over (partition by vendor_id, product_id order by t2),cost) cost_new
, coalesce(revenue - lag(revenue) over (partition by vendor_id, product_id order by t2),revenue) revenue_new
, coalesce(sales - lag(sales) over (partition by vendor_id, product_id order by t2),sales) sales_new
from sample_data sd
order by t2 desc