这是直接来自教程,即使在阅读文档之后我也无法理解.
In [14]: df = DataFrame({'one' : Series(randn(3), index=['a', 'b', 'c']),
....: 'two' : Series(randn(4), index=['a', 'b', 'c', 'd']),
....: 'three' : Series(randn(3), index=['b', 'c', 'd'])})
....:
In [15]: df
Out[15]:
one three two
a -0.626544 NaN -0.351587
b -0.138894 -0.177289 1.136249
c 0.011617 0.462215 -0.448789
d NaN 1.124472 -1.101558
In [16]: row = df.ix[1]
In [17]: column = df['two']
In [18]: df.sub(row, axis='columns')
Out[18]:
one three two
a -0.487650 NaN -1.487837
b 0.000000 0.000000 0.000000
c 0.150512 0.639504 -1.585038
d NaN 1.301762 -2.237808
为什么第二行变成0?它被归为0吗?
此外,当我使用row = df.ix [0]时,整个第二列变为NaN.为什么?
最佳答案
sub
意味着减去,所以让我们来看看:
In [44]:
# create some data
df = pd.DataFrame({'one' : pd.Series(np.random.randn(3), index=['a', 'b', 'c']),
'two' : pd.Series(np.random.randn(4), index=['a', 'b', 'c', 'd']),
'three' : pd.Series(np.random.randn(3), index=['b', 'c', 'd'])})
df
Out[44]:
one three two
a -1.536737 NaN 1.537104
b 1.486947 -0.429089 -0.227643
c 0.219609 -0.178037 -1.118345
d NaN 1.254126 -0.380208
In [45]:
# take a copy of 2nd row
row = df.ix[1]
row
Out[45]:
one 1.486947
three -0.429089
two -0.227643
Name: b, dtype: float64
In [46]:
# now subtract the 2nd row row-wise
df.sub(row, axis='columns')
Out[46]:
one three two
a -3.023684 NaN 1.764747
b 0.000000 0.000000 0.000000
c -1.267338 0.251052 -0.890702
d NaN 1.683215 -0.152565
因此,当您将“列”指定为要操作的轴时,可能会让您感到困惑的是什么.我们从每一行中减去第二行的值,这就解释了为什么第二行现在变为全0.你传递的数据是一个系列,我们正在对齐列,所以实际上我们正在对齐列名,这就是为什么它是逐行执行的.
In [47]:
# now take a copy of the first row
row = df.ix[0]
row
Out[47]:
one -1.536737
three NaN
two 1.537104
Name: a, dtype: float64
In [48]:
# perform the same op
df.sub(row, axis='columns')
Out[48]:
one three two
a 0.000000 NaN 0.000000
b 3.023684 NaN -1.764747
c 1.756346 NaN -2.655449
d NaN NaN -1.917312
那么为什么我们现在有一个包含所有NaN值的列?这是因为当您使用NaN执行任何运算符函数时,结果是NaN
In [55]:
print(1 + np.NaN)
print(1 * np.NaN)
print(1 / np.NaN)
print(1 - np.NaN)
nan
nan
nan
nan