xorbits.datasets.Dataset.export#
- Dataset.export(path: Union[str, os.PathLike], storage_options: Optional[dict] = None, create_if_not_exists: Optional[bool] = True, max_chunk_rows: Optional[int] = None, column_groups: Optional[dict] = None, num_threads: Optional[int] = None, version: Optional[str] = None, overwrite: Optional[bool] = True)[source]#
Export the dataset to storage.
The storage can be local or remote, e.g. local disk or S3, …
- Parameters
path (str) – The export path, can be a local path or a remote url, lease refer to: fsspec
storage_options (dict, optional) – Key/value pairs to be passed on to the caching file-system backend, if any.
create_if_not_exists (bool) – Whether to create the path if it does not exist.
max_chunk_rows (int) – Max rows per chunk file, default is 100.
column_groups (dict) – A dict of group name string to a list of column index or name.
num_threads (int) – The thread concurrency on each chunk.
version (str) – The version string, default is 0.0.0.
overwrite (bool) – Whether overwrites the dataset version.
- Return type
A dict of export info.
Examples
Export to local disk.
>>> import xorbits.datasets as xdatasets >>> ds = xdatasets.from_huggingface("cifar10", split="train") >>> ds.export("./export_dir")
Export to remote storage.
>>> import xorbits.datasets as xdatasets >>> storage_options = {"key": aws_access_key_id, "secret": aws_secret_access_key} >>> ds = xdatasets.from_huggingface("cifar10", split="train") >>> ds.export("./export_dir", storage_options=storage_options)