我正在尝试使用sklearn_pandas DataFrameMapper.这将获取列名以及该列所需的预处理Transformation函数.像这样,
mapper = sklearn_pandas.DataFrameMapper([
('hour',None),
('season',sklearn.preprocessing.OneHotEncoder()),
('holiday',None)
])
season是我的pandas DataFrame中的int64 col.
这给了我以下错误 – 解压缩的值太多.
我知道OneHotEncoder采用2-D样本而不是1-D样本.
我如何使用这个OneHotEncoder与sklearn_pandas或不可能.
最佳答案 sklearn-pandas的官方版本在处理一维数组和转换时存在一些问题.尝试以下fork:
https://github.com/dukebody/sklearn-pandas
但是,我认为您可以使用LabelBinarizer(如sklearn_pandas示例中)而不是OneHotEncoder来完成您想要的任务.
更新2015-11-28
在sklearn-pandas> = 0.0.12中,您可以解决以下问题:
mapper = sklearn_pandas.DataFrameMapper([
('hour',None),
(['season'],sklearn.preprocessing.OneHotEncoder()),
('holiday',None)
])
来自文档:
The difference between specifying the column selector as
'column'
(as
a simple string) and['column']
(as a list with one element) is the
shape of the array that is passed to the transformer. In the first
case, a one dimensional array with be passed, while in the second case
it will be a 2-dimensional array with one column, i.e. a column
vector.