我发现了一段我不太懂的代码.它基本上是这样的:
array = np.ones((5, 4))*np.nan
s1 = pd.Series([1,4,0,4,5], index=[0,1,2,3,4])
I = s1 == 4
print(I)
0 False
1 True
2 False
3 True
4 False
dtype: bool
我真的理解这部分,它在4的索引处返回一个带有True的bo.Series布尔值.现在,作者使用I来索引数组:
array[I,0] = 3
array[I,1] = 7
array[I,2] = 2
array[I,3] = 5
print(array)
[[ 3. 7. 2. 5.]
[ 3. 7. 2. 5.]
[ nan nan nan nan]
[ nan nan nan nan]
[ nan nan nan nan]]
新阵列对我来说毫无意义,我想返回:
[[ nan nan nan nan]
[ 3. 7. 2. 5.]
[ nan nan nan nan]
[ 3. 7. 2. 5.]
[ nan nan nan nan]]
有人可以解释这里发生了什么,以及如何更改上面的代码以返回我需要的东西?
最佳答案 解释在于numpy数组和pandas系列以不同方式处理逻辑索引.前者将True视为1,将False视为0,而后者将逻辑为True的值视为True,并将逻辑为False的值删除.作为示范:
import numpy as np
import pandas as pd
arr = np.array([1,2,3,4,5])
arr # this is a numpy array
array([1, 2, 3, 4, 5])
arr[[True, False, True]]
array([2, 1, 2]) # check here how it is actually picking the value at position
# 1 and 0 alternatively;
ser = pd.Series([1,2,3,4,5])
ser # this is a pandas Series
0 1
1 2
2 3
3 4
4 5
dtype: int64
ser[[True, False, True]] # in pandas Series, it will pick up values where the logic is True;
0 1
2 3
dtype: int64
你会看到他们的行为方式不同.由于您的数组是一个numpy数组,我们不能使用逻辑索引来获取值.为了得到你想要的结果,我们可以尝试从I中提取真值的索引,然后在你的数组上使用它:
array[I[I == True].index,0] = 3
array[I[I == True].index,1] = 7
array[I[I == True].index,2] = 2
array[I[I == True].index,3] = 5
print(array)
[[ nan nan nan nan]
[ 3. 7. 2. 5.]
[ nan nan nan nan]
[ 3. 7. 2. 5.]
[ nan nan nan nan]]