python – 使用需要在多个级别进行广播的多索引pandas数据帧执行算术运算

2019年7月21日 160次阅读

我有一个如下所示的数据框：

   one                     two                    three                   
     1         2             1             2          1             2     
     X    Y    X    Y        X    Y        X    Y     X        Y    X    Y
a  0.3 -0.6 -0.3 -0.2  1.5e+00  0.3 -1.0e+00  1.2   0.6 -9.8e-02 -0.4  0.4
b -0.6 -0.4 -1.1  2.3 -7.4e-02  0.7 -7.4e-02 -0.5  -0.3 -6.8e-01  1.1 -0.1

如何将df的所有元素除以df [“three”]？

我尝试了df.div(df [“three”],level = [1,2])没有运气.

最佳答案这是一个班轮.

df / pd.concat( [ df.three ] * 3, axis=1 ).values

这是另一种不那么简洁但可能更具可读性的方式.

df2 = df.copy()
for c in df.columns.levels[0]:
    df2[c] = df[c] / df['three']

最后,这是一个更长的解决方案,有更多的解释.在意识到有更好的方法之前,我这样做了.但是我会把它保留在这里,因为它更能提供有关此类操作后幕后发生的事情的信息(尽管可能有些过分).

首先,多索引不能很好地复制,因此我将创建一个非常相似的示例数据帧.

np.random.seed(123)
tuples = list(zip(*[['one', 'one', 'two', 'two', 'three', 'three'],
                    ['foo', 'bar', 'foo', 'bar', 'foo', 'bar']]))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
df = pd.DataFrame(np.random.randn(3, 6), index=['A', 'B', 'C'], columns=index)

first        one                 two               three          
second       foo       bar       foo       bar       foo       bar
A      -1.085631  0.997345  0.282978 -1.506295 -0.578600  1.651437
B      -2.426679 -0.428913  1.265936 -0.866740 -0.678886 -0.094709
C       1.491390 -0.638902 -0.443982 -0.434351  2.205930  2.186786

最简单的方法是将分母扩展3,以使其与完整数据帧的维度相匹配.或者你可以遍历列,但之后你必须重新组合它们,这可能不像你在多索引的情况下那么容易.所以广播栏’三’就像这样.

denom = pd.concat( [df['three']]*3, axis=1 )
denom = pd.DataFrame( denom.values, columns=df.columns, index=df.index )

first        one                 two               three          
second       foo       bar       foo       bar       foo       bar
A      -0.578600  1.651437 -0.578600  1.651437 -0.578600  1.651437
B      -0.678886 -0.094709 -0.678886 -0.094709 -0.678886 -0.094709
C       2.205930  2.186786  2.205930  2.186786  2.205930  2.186786

第一个’denom’行只是将’three’列扩展为与现有数据帧相同的形状.第二个’denom’是匹配行和列索引所必需的.现在你可以写一个普通的除法运算.

df / denom

first        one                 two           three    
second       foo       bar       foo       bar   foo bar
A       1.876305  0.603926 -0.489074 -0.912112     1   1
B       3.574501  4.528744 -1.864725  9.151619     1   1
C       0.676082 -0.292165 -0.201267 -0.198625     1   1

关于这个较长的解决方案的一个衬垫的快速说明.单个线性中的值从数据帧转换为数组,这具有擦除行和列索引的方便的副作用.或者在这个更长的解决方案中,我明确地符合指数根据您的情况,任何一种方法都可能是更好的方法.