xorbits.pandas.DataFrame.to_json#

DataFrame.to_json(path_or_buf: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None, orient: Literal['split', 'records', 'index', 'table', 'columns', 'values'] | None = None, date_format: str | None = None, double_precision: int = 10, force_ascii: bool_t = True, date_unit: TimeUnit = 'ms', default_handler: Callable[[Any], JSONSerializable] | None = None, lines: bool_t = False, compression: CompressionOptions = 'infer', index: bool_t | None = None, indent: int | None = None, storage_options: StorageOptions | None = None, mode: Literal['a', 'w'] = 'w') → str | None#

Convert the object to a JSON string.

Note NaN’s and None will be converted to null and datetime objects will be converted to UNIX timestamps.

Parameters

path_or_buf (str, path object, file-like object, or None, default None) – String, path object (implementing os.PathLike[str]), or file-like object implementing a write() function. If None, the result is returned as a string.
orient (str) –
Indication of expected JSON string format.
- Series:
  - default is ‘index’
  - allowed values are: {‘split’, ‘records’, ‘index’, ‘table’}.
- DataFrame:
  - default is ‘columns’
  - allowed values are: {‘split’, ‘records’, ‘index’, ‘columns’, ‘values’, ‘table’}.
- The format of the JSON string:
  - ’split’ : dict like {‘index’ -> [index], ‘columns’ -> [columns], ‘data’ -> [values]}
  - ’records’ : list like [{column -> value}, … , {column -> value}]
  - ’index’ : dict like {index -> {column -> value}}
  - ’columns’ : dict like {column -> {index -> value}}
  - ’values’ : just the values array
  - ’table’ : dict like {‘schema’: {schema}, ‘data’: {data}}
  Describing the data, where data component is like orient='records'.
date_format ({None, 'epoch', 'iso'}) – Type of date conversion. ‘epoch’ = epoch milliseconds, ‘iso’ = ISO8601. The default depends on the orient. For orient='table', the default is ‘iso’. For all other orients, the default is ‘epoch’.
double_precision (int, default 10) – The number of decimal places to use when encoding floating point values. The possible maximal value is 15. Passing double_precision greater than 15 will raise a ValueError.
force_ascii (bool, default True) – Force encoded string to be ASCII.
date_unit (str, default 'ms' (milliseconds)) – The time unit to encode to, governs timestamp and ISO8601 precision. One of ‘s’, ‘ms’, ‘us’, ‘ns’ for second, millisecond, microsecond, and nanosecond respectively.
default_handler (callable, default None) – Handler to call if object cannot otherwise be converted to a suitable format for JSON. Should receive a single argument which is the object to convert and return a serialisable object.
lines (bool, default False) – If ‘orient’ is ‘records’ write out line-delimited json format. Will throw ValueError if incorrect ‘orient’ since others are not list-like.
compression (str or dict, default 'infer') –
For on-the-fly compression of the output data. If ‘infer’ and ‘path_or_buf’ is path-like, then detect compression from the following extensions: ‘.gz’, ‘.bz2’, ‘.zip’, ‘.xz’, ‘.zst’, ‘.tar’, ‘.tar.gz’, ‘.tar.xz’ or ‘.tar.bz2’ (otherwise no compression). Set to None for no compression. Can also be a dict with key 'method' set to one of {'zip', 'gzip', 'bz2', 'zstd', 'xz', 'tar'} and other key-value pairs are forwarded to zipfile.ZipFile, gzip.GzipFile, bz2.BZ2File, zstandard.ZstdCompressor, lzma.LZMAFile or tarfile.TarFile, respectively. As an example, the following could be passed for faster compression and to create a reproducible gzip archive: compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}.

New in version 1.5.0(pandas): Added support for .tar files.

Changed in version 1.4.0: Zstandard support.(pandas)
index (bool or None, default None) – The index is only used when ‘orient’ is ‘split’, ‘index’, ‘column’, or ‘table’. Of these, ‘index’ and ‘column’ do not support index=False.
indent (int, optional) – Length of whitespace used to indent each record.
storage_options (dict, optional) –
Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs the key-value pairs are forwarded to urllib.request.Request as header options. For other URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are forwarded to fsspec.open. Please see fsspec and urllib for more details, and for more examples on storage options refer here.

New in version 1.2.0(pandas).
mode (str, default 'w' (writing)) – Specify the IO mode for output when supplying a path_or_buf. Accepted args are ‘w’ (writing) and ‘a’ (append) only. mode=’a’ is only supported when lines is True and orient is ‘records’.

Returns

If path_or_buf is None, returns the resulting json format as a string. Otherwise returns None.

Return type

None or str

See also

read_json: Convert a JSON string to pandas object.

Notes

The behavior of indent=0 varies from the stdlib, which does not indent the output but does insert newlines. Currently, indent=0 and the default indent=None are equivalent in pandas, though this may change in a future release.

orient='table' contains a ‘pandas_version’ field under ‘schema’. This stores the version of pandas used in the latest revision of the schema.

Examples

>>> from json import loads, dumps  
>>> df = pd.DataFrame(  
...     [["a", "b"], ["c", "d"]],
...     index=["row 1", "row 2"],
...     columns=["col 1", "col 2"],
... )

>>> result = df.to_json(orient="split")  
>>> parsed = loads(result)  
>>> dumps(parsed, indent=4)  
{
    "columns": [
        "col 1",
        "col 2"
    ],
    "index": [
        "row 1",
        "row 2"
    ],
    "data": [
        [
            "a",
            "b"
        ],
        [
            "c",
            "d"
        ]
    ]
}

Encoding/decoding a Dataframe using 'records' formatted JSON. Note that index labels are not preserved with this encoding.

>>> result = df.to_json(orient="records")  
>>> parsed = loads(result)  
>>> dumps(parsed, indent=4)  
[
    {
        "col 1": "a",
        "col 2": "b"
    },
    {
        "col 1": "c",
        "col 2": "d"
    }
]

Encoding/decoding a Dataframe using 'index' formatted JSON:

>>> result = df.to_json(orient="index")  
>>> parsed = loads(result)  
>>> dumps(parsed, indent=4)  
{
    "row 1": {
        "col 1": "a",
        "col 2": "b"
    },
    "row 2": {
        "col 1": "c",
        "col 2": "d"
    }
}

Encoding/decoding a Dataframe using 'columns' formatted JSON:

>>> result = df.to_json(orient="columns")  
>>> parsed = loads(result)  
>>> dumps(parsed, indent=4)  
{
    "col 1": {
        "row 1": "a",
        "row 2": "c"
    },
    "col 2": {
        "row 1": "b",
        "row 2": "d"
    }
}

Encoding/decoding a Dataframe using 'values' formatted JSON:

>>> result = df.to_json(orient="values")  
>>> parsed = loads(result)  
>>> dumps(parsed, indent=4)  
[
    [
        "a",
        "b"
    ],
    [
        "c",
        "d"
    ]
]

Encoding with Table Schema:

>>> result = df.to_json(orient="table")  
>>> parsed = loads(result)  
>>> dumps(parsed, indent=4)  
{
    "schema": {
        "fields": [
            {
                "name": "index",
                "type": "string"
            },
            {
                "name": "col 1",
                "type": "string"
            },
            {
                "name": "col 2",
                "type": "string"
            }
        ],
        "primaryKey": [
            "index"
        ],
        "pandas_version": "1.4.0"
    },
    "data": [
        {
            "index": "row 1",
            "col 1": "a",
            "col 2": "b"
        },
        {
            "index": "row 2",
            "col 1": "c",
            "col 2": "d"
        }
    ]
}

Warning

This method has not been implemented yet. Xorbits will try to execute it with pandas.

This docstring was copied from pandas.core.frame.DataFrame.