xorbits.pandas.from_dummies#

xorbits.pandas.from_dummies(data: DataFrame, sep: None | str = None, default_category: None | Hashable | dict[str, Hashable] = None) → DataFrame[源代码]#

Create a categorical DataFrame from a DataFrame of dummy variables.

Inverts the operation performed by get_dummies().

1.5.0(pandas) 新版功能.

参数

data (DataFrame) – Data which contains dummy-coded variables in form of integer columns of 1’s and 0’s.
sep (str, default None) – Separator used in the column names of the dummy categories they are character indicating the separation of the categorical names from the prefixes. For example, if your column names are ‘prefix_A’ and ‘prefix_B’, you can strip the underscore by specifying sep=’_’.
default_category (None, Hashable or dict of Hashables, default None) – The default category is the implied category when a value has none of the listed categories specified with a one, i.e. if all dummies in a row are zero. Can be a single value for all variables or a dict directly mapping the default categories to a prefix of a variable.

返回

Categorical data decoded from the dummy input-data.

返回类型

DataFrame

引发

ValueError –
- When the input DataFrame data contains NA values. * When the input DataFrame data contains column names with separators that do not match the separator specified with sep. * When a dict passed to default_category does not include an implied category for each prefix. * When a value in data has more than one category assigned to it. * When default_category=None and a value in data has no category assigned to it.
TypeError –
- When the input data is not of type DataFrame. * When the input DataFrame data contains non-dummy data. * When the passed sep is of a wrong data type. * When the passed default_category is of a wrong data type.

参见

get_dummies(): Convert Series or DataFrame to dummy codes.
Categorical: Represent a categorical variable in classic.

提示

The columns of the passed dummy data should only include 1’s and 0’s, or boolean values.

实际案例

>>> df = pd.DataFrame({"a": [1, 0, 0, 1], "b": [0, 1, 0, 0],  
...                    "c": [0, 0, 1, 0]})

>>> pd.from_dummies(df)  
   a
   b
   c
   a

>>> df = pd.DataFrame({"col1_a": [1, 0, 1], "col1_b": [0, 1, 0],  
...                    "col2_a": [0, 1, 0], "col2_b": [1, 0, 0],
...                    "col2_c": [0, 0, 1]})

>>> df  
      col1_a  col1_b  col2_a  col2_b  col2_c
0       1       0       0       1       0
1       0       1       1       0       0
2       1       0       0       0       1

>>> pd.from_dummies(df, sep="_")  
    col1    col2
0    a       b
1    b       a
2    a       c

>>> df = pd.DataFrame({"col1_a": [1, 0, 0], "col1_b": [0, 1, 0],  
...                    "col2_a": [0, 1, 0], "col2_b": [1, 0, 0],
...                    "col2_c": [0, 0, 0]})

>>> df  
      col1_a  col1_b  col2_a  col2_b  col2_c
0       1       0       0       1       0
1       0       1       1       0       0
2       0       0       0       0       0

>>> pd.from_dummies(df, sep="_", default_category={"col1": "d", "col2": "e"})  
    col1    col2
0    a       b
1    b       a
2    d       e

警告

This method has not been implemented yet. Xorbits will try to execute it with pandas.

This docstring was copied from pandas.