xorbits.pandas.from_dummies#
- xorbits.pandas.from_dummies(data: DataFrame, sep: None | str = None, default_category: None | Hashable | dict[str, Hashable] = None) DataFrame [source]#
Create a categorical
DataFrame
from aDataFrame
of dummy variables.Inverts the operation performed by
get_dummies()
.New in version 1.5.0(pandas).
- Parameters
data (DataFrame) – Data which contains dummy-coded variables in form of integer columns of 1’s and 0’s.
sep (str, default None) – Separator used in the column names of the dummy categories they are character indicating the separation of the categorical names from the prefixes. For example, if your column names are ‘prefix_A’ and ‘prefix_B’, you can strip the underscore by specifying sep=’_’.
default_category (None, Hashable or dict of Hashables, default None) – The default category is the implied category when a value has none of the listed categories specified with a one, i.e. if all dummies in a row are zero. Can be a single value for all variables or a dict directly mapping the default categories to a prefix of a variable.
- Returns
Categorical data decoded from the dummy input-data.
- Return type
- Raises
ValueError –
When the input
DataFrame
data
contains NA values. * When the inputDataFrame
data
contains column names with separators that do not match the separator specified withsep
. * When adict
passed todefault_category
does not include an implied category for each prefix. * When a value indata
has more than one category assigned to it. * Whendefault_category=None
and a value indata
has no category assigned to it.
TypeError –
When the input
data
is not of typeDataFrame
. * When the inputDataFrame
data
contains non-dummy data. * When the passedsep
is of a wrong data type. * When the passeddefault_category
is of a wrong data type.
See also
get_dummies()
Convert
Series
orDataFrame
to dummy codes.Categorical
Represent a categorical variable in classic.
Notes
The columns of the passed dummy data should only include 1’s and 0’s, or boolean values.
Examples
>>> df = pd.DataFrame({"a": [1, 0, 0, 1], "b": [0, 1, 0, 0], ... "c": [0, 0, 1, 0]})
>>> df a b c 0 1 0 0 1 0 1 0 2 0 0 1 3 1 0 0
>>> pd.from_dummies(df) 0 a 1 b 2 c 3 a
>>> df = pd.DataFrame({"col1_a": [1, 0, 1], "col1_b": [0, 1, 0], ... "col2_a": [0, 1, 0], "col2_b": [1, 0, 0], ... "col2_c": [0, 0, 1]})
>>> df col1_a col1_b col2_a col2_b col2_c 0 1 0 0 1 0 1 0 1 1 0 0 2 1 0 0 0 1
>>> pd.from_dummies(df, sep="_") col1 col2 0 a b 1 b a 2 a c
>>> df = pd.DataFrame({"col1_a": [1, 0, 0], "col1_b": [0, 1, 0], ... "col2_a": [0, 1, 0], "col2_b": [1, 0, 0], ... "col2_c": [0, 0, 0]})
>>> df col1_a col1_b col2_a col2_b col2_c 0 1 0 0 1 0 1 0 1 1 0 0 2 0 0 0 0 0
>>> pd.from_dummies(df, sep="_", default_category={"col1": "d", "col2": "e"}) col1 col2 0 a b 1 b a 2 d e
Warning
This method has not been implemented yet. Xorbits will try to execute it with pandas.
This docstring was copied from pandas.