Python Pandas用缺少的值填充数据帧

2023年4月8日 218次阅读

我以此数据帧为例

import pandas as pd

#create dataframe
df = pd.DataFrame([['DE', 'Table',201705,201705, 1000], ['DE', 'Table',201705,201704, 1000],\
                   ['DE', 'Table',201705,201702, 1000], ['DE', 'Table',201705,201701, 1000],\
                   ['AT', 'Table',201708,201708, 1000], ['AT', 'Table',201708,201706, 1000],\
                   ['AT', 'Table',201708,201705, 1000], ['AT', 'Table',201708,201704, 1000]],\
                   columns=['ISO','Product','Billed Week', 'Created Week', 'Billings'])
print (df)

  ISO Product  Billed Week  Created Week  Billings
0  DE   Table       201705        201705      1000
1  DE   Table       201705        201704      1000
2  DE   Table       201705        201702      1000
3  DE   Table       201705        201701      1000
4  AT   Table       201708        201708      1000
5  AT   Table       201708        201706      1000
6  AT   Table       201708        201705      1000
7  AT   Table       201708        201704      1000

我需要做的是用[‘ISO’,’产品’]为每个组填写一些缺少数据的0比林,其中序列中断,即在某一周没有创建账单,因此缺少.它需要基于“开单周”和“最短创建周”的最大值.即,这是应该完成而没有按顺序中断的组合.

因此,对于上述内容,我需要以编程方式附加到数据库中的缺失记录如下所示：

  ISO Product  Billed Week  Created Week  Billings
0  DE   Table       201705        201703         0
1  AT   Table       201708        201707         0

最佳答案

def seqfix(x):
    s = x['Created Week']
    x = x.set_index('Created Week')
    x = x.reindex(range(min(s), max(s)+1))
    x['Billings'] = x['Billings'].fillna(0)
    x = x.ffill().reset_index()
    return x

df = df.groupby(['ISO', 'Billed Week']).apply(seqfix).reset_index(drop=True)
df[['Billed Week', 'Billings']] = df[['Billed Week', 'Billings']].astype(int)
df = df[['ISO', 'Product', 'Billed Week', 'Created Week', 'Billings']]

print(df)

  ISO Product  Billed Week  Created Week  Billings
0  AT   Table       201708        201704      1000
1  AT   Table       201708        201705      1000
2  AT   Table       201708        201706      1000
3  AT   Table       201708        201707         0
4  AT   Table       201708        201708      1000
5  DE   Table       201705        201701      1000
6  DE   Table       201705        201702      1000
7  DE   Table       201705        201703         0
8  DE   Table       201705        201704      1000
9  DE   Table       201705        201705      1000