python – pandas的子运算符有什么作用？

2023年12月1日 204次阅读

这是直接来自教程,即使在阅读文档之后我也无法理解.

In [14]: df = DataFrame({'one' : Series(randn(3), index=['a', 'b', 'c']),
   ....:                 'two' : Series(randn(4), index=['a', 'b', 'c', 'd']),
   ....:                 'three' : Series(randn(3), index=['b', 'c', 'd'])})
   ....: 

In [15]: df
Out[15]: 
        one     three       two
a -0.626544       NaN -0.351587
b -0.138894 -0.177289  1.136249
c  0.011617  0.462215 -0.448789
d       NaN  1.124472 -1.101558

In [16]: row = df.ix[1]

In [17]: column = df['two']

In [18]: df.sub(row, axis='columns')
Out[18]: 
        one     three       two
a -0.487650       NaN -1.487837
b  0.000000  0.000000  0.000000
c  0.150512  0.639504 -1.585038
d       NaN  1.301762 -2.237808

为什么第二行变成0？它被归为0吗？

此外,当我使用row = df.ix [0]时,整个第二列变为NaN.为什么？

最佳答案
sub意味着减去,所以让我们来看看：

In [44]:
# create some data
df = pd.DataFrame({'one' : pd.Series(np.random.randn(3), index=['a', 'b', 'c']),
                    'two' : pd.Series(np.random.randn(4), index=['a', 'b', 'c', 'd']),
                    'three' : pd.Series(np.random.randn(3), index=['b', 'c', 'd'])})
df
Out[44]:
        one     three       two
a -1.536737       NaN  1.537104
b  1.486947 -0.429089 -0.227643
c  0.219609 -0.178037 -1.118345
d       NaN  1.254126 -0.380208
In [45]:
# take a copy of 2nd row
row = df.ix[1]
row
Out[45]:
one      1.486947
three   -0.429089
two     -0.227643
Name: b, dtype: float64
In [46]:
# now subtract the 2nd row row-wise
df.sub(row, axis='columns')
Out[46]:
        one     three       two
a -3.023684       NaN  1.764747
b  0.000000  0.000000  0.000000
c -1.267338  0.251052 -0.890702
d       NaN  1.683215 -0.152565

因此,当您将“列”指定为要操作的轴时,可能会让您感到困惑的是什么.我们从每一行中减去第二行的值,这就解释了为什么第二行现在变为全0.你传递的数据是一个系列,我们正在对齐列,所以实际上我们正在对齐列名,这就是为什么它是逐行执行的.

In [47]:
# now take a copy of the first row
row = df.ix[0]
row
Out[47]:
one     -1.536737
three         NaN
two      1.537104
Name: a, dtype: float64
In [48]:
# perform the same op
df.sub(row, axis='columns')
Out[48]:
        one  three       two
a  0.000000    NaN  0.000000
b  3.023684    NaN -1.764747
c  1.756346    NaN -2.655449
d       NaN    NaN -1.917312

那么为什么我们现在有一个包含所有NaN值的列？这是因为当您使用NaN执行任何运算符函数时,结果是NaN

In [55]:

print(1 + np.NaN)
print(1 * np.NaN)
print(1 / np.NaN)
print(1 - np.NaN)
nan
nan
nan
nan