xorbits.pandas.DataFrame.to_orc#

DataFrame.to_orc(path: FilePath | WriteBuffer[bytes] | None = None, *, engine: Literal['pyarrow'] = 'pyarrow', index: bool | None = None, engine_kwargs: dict[str, Any] | None = None) bytes | None[源代码]#

Write a DataFrame to the ORC format.

1.5.0(pandas) 新版功能.

参数
  • path (str, file-like object or None, default None) – If a string, it will be used as Root Directory path when writing a partitioned dataset. By file-like object, we refer to objects with a write() method, such as a file handle (e.g. via builtin open function). If path is None, a bytes object is returned.

  • engine ({'pyarrow'}, default 'pyarrow') – ORC library to use. Pyarrow must be >= 7.0.0.

  • index (bool, optional) – If True, include the dataframe’s index(es) in the file output. If False, they will not be written to the file. If None, similar to infer the dataframe’s index(es) will be saved. However, instead of being saved as values, the RangeIndex will be stored as a range in the metadata so it doesn’t require much space and is faster. Other indexes will be included as columns in the file output.

  • engine_kwargs (dict[str, Any] or None, default None) – Additional keyword arguments passed to pyarrow.orc.write_table().

返回类型

bytes if no path argument is provided else None

引发
  • NotImplementedError – Dtype of one or more columns is category, unsigned integers, interval, period or sparse.

  • ValueError – engine is not pyarrow.

参见

read_orc

Read a ORC file.

DataFrame.to_parquet

Write a parquet file.

DataFrame.to_csv

Write a csv file.

DataFrame.to_sql

Write to a sql table.

DataFrame.to_hdf

Write to hdf.

提示

  • Before using this function you should read the user guide about ORC and install optional dependencies.

  • This function requires pyarrow library.

  • For supported dtypes please refer to supported ORC features in Arrow.

  • Currently timezones in datetime columns are not preserved when a dataframe is converted into ORC files.

实际案例

>>> df = pd.DataFrame(data={'col1': [1, 2], 'col2': [4, 3]})  
>>> df.to_orc('df.orc')  
>>> pd.read_orc('df.orc')  
   col1  col2
0     1     4
1     2     3

If you want to get a buffer to the orc content you can write it to io.BytesIO

>>> import io  
>>> b = io.BytesIO(df.to_orc())  
>>> b.seek(0)  
0
>>> content = b.read()  

警告

This method has not been implemented yet. Xorbits will try to execute it with pandas.

This docstring was copied from pandas.core.frame.DataFrame.