我以此数据帧为例
import pandas as pd
#create dataframe
df = pd.DataFrame([['DE', 'Table',201705,201705, 1000], ['DE', 'Table',201705,201704, 1000],\
['DE', 'Table',201705,201702, 1000], ['DE', 'Table',201705,201701, 1000],\
['AT', 'Table',201708,201708, 1000], ['AT', 'Table',201708,201706, 1000],\
['AT', 'Table',201708,201705, 1000], ['AT', 'Table',201708,201704, 1000]],\
columns=['ISO','Product','Billed Week', 'Created Week', 'Billings'])
print (df)
ISO Product Billed Week Created Week Billings
0 DE Table 201705 201705 1000
1 DE Table 201705 201704 1000
2 DE Table 201705 201702 1000
3 DE Table 201705 201701 1000
4 AT Table 201708 201708 1000
5 AT Table 201708 201706 1000
6 AT Table 201708 201705 1000
7 AT Table 201708 201704 1000
我需要做的是用[‘ISO’,’产品’]为每个组填写一些缺少数据的0比林,其中序列中断,即在某一周没有创建账单,因此缺少.它需要基于“开单周”和“最短创建周”的最大值.即,这是应该完成而没有按顺序中断的组合.
因此,对于上述内容,我需要以编程方式附加到数据库中的缺失记录如下所示:
ISO Product Billed Week Created Week Billings
0 DE Table 201705 201703 0
1 AT Table 201708 201707 0
最佳答案
def seqfix(x):
s = x['Created Week']
x = x.set_index('Created Week')
x = x.reindex(range(min(s), max(s)+1))
x['Billings'] = x['Billings'].fillna(0)
x = x.ffill().reset_index()
return x
df = df.groupby(['ISO', 'Billed Week']).apply(seqfix).reset_index(drop=True)
df[['Billed Week', 'Billings']] = df[['Billed Week', 'Billings']].astype(int)
df = df[['ISO', 'Product', 'Billed Week', 'Created Week', 'Billings']]
print(df)
ISO Product Billed Week Created Week Billings
0 AT Table 201708 201704 1000
1 AT Table 201708 201705 1000
2 AT Table 201708 201706 1000
3 AT Table 201708 201707 0
4 AT Table 201708 201708 1000
5 DE Table 201705 201701 1000
6 DE Table 201705 201702 1000
7 DE Table 201705 201703 0
8 DE Table 201705 201704 1000
9 DE Table 201705 201705 1000