xorbits.pandas.DataFrame.to_hdf#

DataFrame.to_hdf(path_or_buf: FilePath | HDFStore, key: str, mode: Literal['a', 'w', 'r+'] = 'a', complevel: int | None = None, complib: Literal['zlib', 'lzo', 'bzip2', 'blosc'] | None = None, append: bool_t = False, format: Literal['fixed', 'table'] | None = None, index: bool_t = True, min_itemsize: int | dict[str, int] | None = None, nan_rep=None, dropna: bool_t | None = None, data_columns: Literal[True] | list[str] | None = None, errors: OpenFileErrors = 'strict', encoding: str = 'UTF-8') None#

Write the contained data to an HDF5 file using HDFStore.

Hierarchical Data Format (HDF) is self-describing, allowing an application to interpret the structure and contents of a file with no outside information. One HDF file can hold a mix of related objects which can be accessed as a group or as individual objects.

In order to add another DataFrame or Series to an existing HDF file please use append mode and a different a key.

Warning

One can store a subclass of DataFrame or Series to HDF5, but the type of the subclass is lost upon storing.

For more information see the user guide.

Parameters
  • path_or_buf (str or pandas.HDFStore) – File path or HDFStore object.

  • key (str) – Identifier for the group in the store.

  • mode ({'a', 'w', 'r+'}, default 'a') –

    Mode to open file:

    • ’w’: write, a new file is created (an existing file with the same name would be deleted).

    • ’a’: append, an existing file is opened for reading and writing, and if the file does not exist it is created.

    • ’r+’: similar to ‘a’, but the file must already exist.

  • complevel ({0-9}, default None) – Specifies a compression level for data. A value of 0 or None disables compression.

  • complib ({'zlib', 'lzo', 'bzip2', 'blosc'}, default 'zlib') – Specifies the compression library to be used. These additional compressors for Blosc are supported (default if no compressor specified: ‘blosc:blosclz’): {‘blosc:blosclz’, ‘blosc:lz4’, ‘blosc:lz4hc’, ‘blosc:snappy’, ‘blosc:zlib’, ‘blosc:zstd’}. Specifying a compression library which is not available issues a ValueError.

  • append (bool, default False) – For Table formats, append the input data to the existing.

  • format ({'fixed', 'table', None}, default 'fixed') –

    Possible values:

    • ’fixed’: Fixed format. Fast writing/reading. Not-appendable, nor searchable.

    • ’table’: Table format. Write as a PyTables Table structure which may perform worse but allow more flexible operations like searching / selecting subsets of the data.

    • If None, pd.get_option(‘io.hdf.default_format’) is checked, followed by fallback to “fixed”.

  • index (bool, default True) – Write DataFrame index as a column.

  • min_itemsize (dict or int, optional) – Map column names to minimum string sizes for columns.

  • nan_rep (Any, optional) – How to represent null values as str. Not allowed with append=True.

  • dropna (bool, default False, optional) – Remove missing values.

  • data_columns (list of columns or True, optional) – List of columns to create as indexed data columns for on-disk queries, or True to use all columns. By default only the axes of the object are indexed. See Query via data columns. for more information. Applicable only to format=’table’.

  • errors (str, default 'strict') – Specifies how encoding and decoding errors are to be handled. See the errors argument for open() for a full list of options.

  • encoding (str, default "UTF-8") –

See also

read_hdf

Read from HDF file.

DataFrame.to_orc

Write a DataFrame to the binary orc format.

DataFrame.to_parquet

Write a DataFrame to the binary parquet format.

DataFrame.to_sql

Write to a SQL table.

DataFrame.to_feather

Write out feather-format for DataFrames.

DataFrame.to_csv

Write out to a csv file.

Examples

>>> df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]},  
...                   index=['a', 'b', 'c'])  
>>> df.to_hdf('data.h5', key='df', mode='w')  

We can add another object to the same file:

>>> s = pd.Series([1, 2, 3, 4])  
>>> s.to_hdf('data.h5', key='s')  

Reading from HDF file:

>>> pd.read_hdf('data.h5', 'df')  
A  B
a  1  4
b  2  5
c  3  6
>>> pd.read_hdf('data.h5', 's')  
0    1
1    2
2    3
3    4
dtype: int64

Warning

This method has not been implemented yet. Xorbits will try to execute it with pandas.

This docstring was copied from pandas.core.frame.DataFrame.