csv

Multi-threaded CSV writer, much faster than pandas.DataFrame.to_csv(), with full support for dask and dask distributed.

xarray_extras.csv.to_csv(x: DataArray, path: str | Path, *, nogil: bool = True, **kwargs: Any) Any

Print DataArray to CSV.

When x has numpy backend, this function is functionally equivalent to (but much) faster than):

x.to_pandas().to_csv(path_or_buf, **kwargs)

When x has dask backend, this function returns a dask delayed object which will write to the disk only when its .compute() method is invoked.

Formatting and optional compression are parallelised across all available CPUs, using one dask task per chunk on the first dimension. Chunks on other dimensions will be merged ahead of computation.

Parameters:
  • xDataArray with one or two dimensions

  • path – Output file path

  • nogil (bool) – If True, use accelerated C implementation. Several kwargs won’t be processed correctly (see limitations below). If False, use pandas to_csv method (slow, and does not release the GIL). nogil=True exclusively supports float and integer values dtypes (but the coords can be anything). In case of incompatible dtype, nogil is automatically switched to False.

  • kwargs – Passed verbatim to pandas.DataFrame.to_csv() or pandas.Series.to_csv()

Limitations

  • Fancy URIs are not (yet) supported.

  • compression=’zip’ is not supported. All other compression methods (gzip, bz2, xz) are supported.

  • When running with nogil=True, the following parameters are ignored: columns, quoting, quotechar, doublequote, escapechar, chunksize, decimal

Distributed computing

This function supports dask distributed, with the caveat that all workers must write to the same shared mountpoint and that the shared filesystem must strictly guarantee close-open coherency, meaning that one must be able to call write() and then close() on a file descriptor from one host and then immediately afterwards open() from another host and see the output from the first host. Note that, for performance reasons, most network filesystems do not enable this feature by default.

Alternatively, one may write to local mountpoints and then manually collect and concatenate the partial outputs.