xorbits.pandas.DataFrame#

class xorbits.pandas.DataFrame(*args, **kwargs)[source]#

Two-dimensional, size-mutable, potentially heterogeneous tabular data.

Data structure also contains labeled axes (rows and columns). Arithmetic operations align on both row and column labels. Can be thought of as a dict-like container for Series objects. The primary pandas data structure.

Parameters
  • data (ndarray (structured or homogeneous), Iterable, dict, or DataFrame (Not supported yet)) –

    Dict can contain Series, arrays, constants, dataclass or list-like objects. If data is a dict, column order follows insertion-order. If a dict contains Series which have an index defined, it is aligned by its index. This alignment also occurs if data is a Series or a DataFrame itself. Alignment is done on Series/DataFrame inputs.

    If data is a list of dicts, column order follows insertion-order.

  • index (Index or array-like (Not supported yet)) – Index to use for resulting frame. Will default to RangeIndex if no indexing information part of input data and no index provided.

  • columns (Index or array-like (Not supported yet)) – Column labels to use for resulting frame when data does not have them, defaulting to RangeIndex(0, 1, 2, …, n). If data contains column labels, will perform column selection instead.

  • dtype (dtype, default None) – Data type to force. Only a single dtype is allowed. If None, infer.

  • copy (bool or None, default None (Not supported yet)) –

    Copy data from inputs. For dict data, the default of None behaves like copy=True. For DataFrame or 2d ndarray input, the default of None behaves like copy=False. If data is a dict containing one or more Series (possibly of different dtypes), copy=False will ensure that these inputs are not copied.

    Changed in version 1.3.0(pandas).

See also

DataFrame.from_records

Constructor from tuples, also record arrays.

DataFrame.from_dict

From dicts of Series, arrays, or dicts.

read_csv

Read a comma-separated values (csv) file into DataFrame.

read_table

Read general delimited file into DataFrame.

read_clipboard

Read text from clipboard into DataFrame.

Notes

Please reference the User Guide for more information.

Examples

Constructing DataFrame from a dictionary.

>>> d = {'col1': [1, 2], 'col2': [3, 4]}  
>>> df = pd.DataFrame(data=d)  
>>> df  
   col1  col2
0     1     3
1     2     4

Notice that the inferred dtype is int64.

>>> df.dtypes  
col1    int64
col2    int64
dtype: object

To enforce a single dtype:

>>> df = pd.DataFrame(data=d, dtype=np.int8)  
>>> df.dtypes  
col1    int8
col2    int8
dtype: object

Constructing DataFrame from a dictionary including Series:

>>> d = {'col1': [0, 1, 2, 3], 'col2': pd.Series([2, 3], index=[2, 3])}  
>>> pd.DataFrame(data=d, index=[0, 1, 2, 3])  
   col1  col2
0     0   NaN
1     1   NaN
2     2   2.0
3     3   3.0

Constructing DataFrame from numpy ndarray:

>>> df2 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),  
...                    columns=['a', 'b', 'c'])
>>> df2  
   a  b  c
0  1  2  3
1  4  5  6
2  7  8  9

Constructing DataFrame from a numpy ndarray that has labeled columns:

>>> data = np.array([(1, 2, 3), (4, 5, 6), (7, 8, 9)],  
...                 dtype=[("a", "i4"), ("b", "i4"), ("c", "i4")])
>>> df3 = pd.DataFrame(data, columns=['c', 'a'])  
...
>>> df3  
   c  a
0  3  1
1  6  4
2  9  7

Constructing DataFrame from dataclass:

>>> from dataclasses import make_dataclass  
>>> Point = make_dataclass("Point", [("x", int), ("y", int)])  
>>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])  
   x  y
0  0  0
1  0  3
2  2  3

Constructing DataFrame from Series/DataFrame:

>>> ser = pd.Series([1, 2, 3], index=["a", "b", "c"])  
>>> df = pd.DataFrame(data=ser, index=["a", "c"])  
>>> df  
   0
a  1
c  3
>>> df1 = pd.DataFrame([1, 2, 3], index=["a", "b", "c"], columns=["x"])  
>>> df2 = pd.DataFrame(data=df1, index=["a", "c"])  
>>> df2  
   x
a  1
c  3

This docstring was copied from pandas.

__init__(*args, **kwargs)[source]#

Methods

__init__(*args, **kwargs)

Attributes

at

Access a single value for a row/column label pair.

iat

Access a single value for a row/column pair by integer position.

iloc

Purely integer-location based indexing for selection by position.

loc

Access a group of rows and columns by label(s) or a boolean array.

data