我可以使用以下命令将两列成功转换为矩阵.
dfb = datab.parse("a")
dfb
Name Product
0 Mike Apple,pear
1 John Orange,Banana
2 Bob Banana
3 Connie Pear
pd.get_dummies(dfb.Product).groupby(dfb.Name).apply(max)
Apple,pear Banana Orange,Banana Pear
Name
Bob 0 1 0 0
Connie 0 0 0 1
John 0 0 1 0
Mike 1 0 0 0
但是,我想要的矩阵如下.
Apple Banana Orange Pear
Name
Bob 0 1 0 0
Connie 0 0 0 1
John 0 1 1 0
Mike 1 0 0 1
最佳答案 1.
df = dfb.set_index('Name').Product.str.get_dummies(',')
print (df)
Apple Banana Orange Pear
Name
Mike 1 0 0 1
John 0 1 1 0
Bob 0 1 0 0
Connie 0 0 0 1
2.
解决方案pandas.get_dummies
和split
用于新DataFarme,最后groupby
用于列,因此axis = 1且level = 0并聚合max:
dfb = dfb.set_index('Name')
df = pd.get_dummies(dfb.Product.str.split(',', expand=True), prefix='', prefix_sep='')
.groupby(axis=1, level=0).max()
print (df)
Apple Banana Orange Pear
Name
Mike 1 0 0 1
John 0 1 1 0
Bob 0 1 0 0
Connie 0 0 0 1
3.
split和MultiLabelBinarizer的解决方案:
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
df = pd.DataFrame(mlb.fit_transform(dfb.Product.str.split(',')),
columns=mlb.classes_,
index=dfb.Name)
print (df)
Apple Banana Orange Pear
Name
Mike 1 0 0 1
John 0 1 1 0
Bob 0 1 0 0
Connie 0 0 0 1
如果列名称重复:
df = df.groupby('Name').max()
print (df)
Apple Banana Orange Pear
Name
Bob 0 1 0 0
Connie 0 0 0 1
John 0 1 1 0
Mike 1 0 0 1