xorbits.pandas.crosstab#

xorbits.pandas.crosstab(index, columns, values=None, rownames=None, colnames=None, aggfunc=None, margins: bool = False, margins_name: Hashable = 'All', dropna: bool = True, normalize: bool = False) → DataFrame[source]#

Compute a simple cross tabulation of two (or more) factors.

By default, computes a frequency table of the factors unless an array of values and an aggregation function are passed.

Parameters

index (array-like, Series, or list of arrays/Series) – Values to group by in the rows.
columns (array-like, Series, or list of arrays/Series) – Values to group by in the columns.
values (array-like, optional) – Array of values to aggregate according to the factors. Requires aggfunc be specified.
rownames (sequence, default None) – If passed, must match number of row arrays passed.
colnames (sequence, default None) – If passed, must match number of column arrays passed.
aggfunc (function, optional) – If specified, requires values be specified as well.
margins (bool, default False) – Add row/column margins (subtotals).
margins_name (str, default 'All') – Name of the row/column that will contain the totals when margins is True.
dropna (bool, default True) – Do not include columns whose entries are all NaN.
normalize (bool, {'all', 'index', 'columns'}, or {0,1}, default False) –
Normalize by dividing all values by the sum of values.
- If passed ‘all’ or True, will normalize over all values.
- If passed ‘index’ will normalize over each row.
- If passed ‘columns’ will normalize over each column.
- If margins is True, will also normalize margin values.

Returns

Cross tabulation of the data.

Return type

DataFrame

See also

DataFrame.pivot: Reshape data based on column values.
pivot_table: Create a pivot table as a DataFrame.

Notes

Any Series passed will have their name attributes used unless row or column names for the cross-tabulation are specified.

Any input passed containing Categorical data will have all of its categories included in the cross-tabulation, even if the actual data does not contain any instances of a particular category.

In the event that there aren’t overlapping indexes an empty DataFrame will be returned.

Reference the user guide for more examples.

Examples

>>> a = np.array(["foo", "foo", "foo", "foo", "bar", "bar",  
...               "bar", "bar", "foo", "foo", "foo"], dtype=object)
>>> b = np.array(["one", "one", "one", "two", "one", "one",  
...               "one", "two", "two", "two", "one"], dtype=object)
>>> c = np.array(["dull", "dull", "shiny", "dull", "dull", "shiny",  
...               "shiny", "dull", "shiny", "shiny", "shiny"],
...              dtype=object)
>>> pd.crosstab(a, [b, c], rownames=['a'], colnames=['b', 'c'])  
b   one        two
c   dull shiny dull shiny
a
bar    1     2    1     0
foo    2     2    1     2

Here ‘c’ and ‘f’ are not represented in the data and will not be shown in the output because dropna is True by default. Set dropna=False to preserve categories with no data.

>>> foo = pd.Categorical(['a', 'b'], categories=['a', 'b', 'c'])  
>>> bar = pd.Categorical(['d', 'e'], categories=['d', 'e', 'f'])  
>>> pd.crosstab(foo, bar)  
col_0  d  e
row_0
a      1  0
b      0  1
>>> pd.crosstab(foo, bar, dropna=False)  
col_0  d  e  f
row_0
a      1  0  0
b      0  1  0
c      0  0  0

Warning

This method has not been implemented yet. Xorbits will try to execute it with pandas.

This docstring was copied from pandas.