python – 应用于列的pandas group by函数

2023年12月4日 273次阅读

在Groupby文档中,我只看到应用于轴0索引或列标签的函数分组示例.我没有看到讨论如何按照将一个函数应用于列而得到的标签进行分组的示例.我认为这将使用apply完成.以下示例是最好的方法吗？

df = pd.DataFrame({'name' : np.random.choice(['a','b','c','d','e'], 20), 
               'num1': np.random.randint(low = 30, high=100, size=20),
               'num2': np.random.randint(low = -3, high=9, size=20)})

df.head()

  name  num1 num2
0   d   34  7
1   b   49  6
2   a   51  -1
3   d   79  8
4   e   72  5

def num1_greater_than_60(number_num1):
    if number_num1 >= 60:
        return 'greater'
    else:
        return 'less'

df.groupby(df['num1'].apply(num1_greater_than_60))

最佳答案来自DataFrame.groupby()docs：

by : mapping, function, str, or iterable
    Used to determine the groups for the groupby.
    If ``by`` is a function, it's called on each value of the object's
    index. If a dict or Series is passed, the Series or dict VALUES
    will be used to determine the groups (the Series' values are first
    aligned; see ``.align()`` method). If an ndarray is passed, the
    values are used as-is determine the groups. A str or list of strs
    may be passed to group by the columns in ``self``

所以我们可以这样做：

In [35]: df.set_index('num1').groupby(num1_greater_than_60)[['name']].count()
Out[35]:
         name
greater    15
less        5