xorbits.pandas.groupby.DataFrameGroupBy.apply#

DataFrameGroupBy.apply(func, *args, output_type=None, dtypes=None, dtype=None, name=None, index=None, skip_infer=None, **kwargs)#

Apply function func group-wise and combine the results together.

The function passed to apply must take a dataframe as its first argument and return a DataFrame, Series or scalar. apply will then take care of combining the results back together into a single dataframe or series. apply is therefore a highly flexible grouping method.

While apply is a very flexible method, its downside is that using it can be quite a bit slower than using more specific methods like agg or transform. Pandas offers a wide range of method that will be much faster than using apply for their specific purposes, so try to use them before reaching for apply.

Parameters

func (callable) – A callable that takes a dataframe as its first argument, and returns a dataframe, a series or a scalar. In addition the callable may take positional and keyword arguments.
args (tuple and dict) – Optional positional and keyword arguments to pass to func.
kwargs (tuple and dict) – Optional positional and keyword arguments to pass to func.

Return type

Series or DataFrame

See also

pipe: Apply function to the full GroupBy object instead of to each group.
aggregate: Apply aggregate function to the GroupBy object.
transform: Apply function column-by-column to the GroupBy object.
Series.apply: Apply a function to a Series.
DataFrame.apply: Apply a function to each row or column of a DataFrame.

Notes

Changed in version 1.3.0(pandas): The resulting dtype will reflect the return value of the passed func, see the examples below.

Functions that mutate the passed object can produce unexpected behavior or errors and are not supported. See gotchas.udf-mutation for more details.

Examples

>>> df = pd.DataFrame({'A': 'a a b'.split(),  
...                    'B': [1,2,3],
...                    'C': [4,6,5]})
>>> g1 = df.groupby('A', group_keys=False)  
>>> g2 = df.groupby('A', group_keys=True)  

Notice that g1 and g2 have two groups, a and b, and only differ in their group_keys argument. Calling apply in various ways, we can get different grouping results:

Example 1: below the function passed to apply takes a DataFrame as its argument and returns a DataFrame. apply combines the result for each group together into a new DataFrame:

>>> g1[['B', 'C']].apply(lambda x: x / x.sum())  
          B    C
0  0.333333  0.4
1  0.666667  0.6
2  1.000000  1.0

In the above, the groups are not part of the index. We can have them included by using g2 where group_keys=True:

>>> g2[['B', 'C']].apply(lambda x: x / x.sum())  
            B    C
A
a 0  0.333333  0.4
  1  0.666667  0.6
b 2  1.000000  1.0

Example 2: The function passed to apply takes a DataFrame as its argument and returns a Series. apply combines the result for each group together into a new DataFrame.

Changed in version 1.3.0(pandas): The resulting dtype will reflect the return value of the passed func.

>>> g1[['B', 'C']].apply(lambda x: x.astype(float).max() - x.min())  
     B    C
A
a  1.0  2.0
b  0.0  0.0

>>> g2[['B', 'C']].apply(lambda x: x.astype(float).max() - x.min())  
     B    C
A
a  1.0  2.0
b  0.0  0.0

The group_keys argument has no effect here because the result is not like-indexed (i.e. a transform) when compared to the input.

Example 3: The function passed to apply takes a DataFrame as its argument and returns a scalar. apply combines the result for each group together into a Series, including setting the index as appropriate:

>>> g1.apply(lambda x: x.C.max() - x.B.min())  
A
a    5
b    2
dtype: int64
Extra Parameters
----------------
output_type : {'dataframe', 'series'}, default None
    Specify type of returned object. See `Notes` for more details.
dtypes : Series, default None
    Specify dtypes of returned DataFrames. See `Notes` for more details.
dtype : numpy.dtype, default None
    Specify dtype of returned Series. See `Notes` for more details.
name : str, default None
    Specify name of returned Series. See `Notes` for more details.
index : Index, default None
    Specify index of returned object. See `Notes` for more details.
skip_infer: bool, default False
    Whether infer dtypes when dtypes or output_type is not specified.

This docstring was copied from pandas.core.groupby.generic.DataFrameGroupBy.