Save NumPy Arrays to CSV Files

This post explains how to write NumPy arrays to CSV files.

We will look at:

  • the syntax for writing different NumPy arrays to CSV
  • the limitations of writing NumPy arrays to CSV
  • alternative ways to save NumPy arrays

Let’s get to it.

Writing NumPy Arrays to CSV

You can use the np.savetxt() method to save your Numpy array to a CSV file.

Make sure to:

  • add “.csv” to the filename destination, and
  • set the delimiter keyword to “,”

If you don’t use these two settings, NumPy will save your files as .txt. More on that later.

CSV files can be great because they are human-readable. They also have the added benefit of being easy to load into pandas or Dask DataFrames.

Write one dimensional array

Let’s create a one-dimensional array containing random numbers using np.random.rand().

import numpy as np

# create 1D array
a = np.array([1,2,3])

# store in current directory
np.savetxt(
    "a.csv", 
    a, 
    delimiter=","
)

NumPy will write the array column-wise by default. Let’s inspect the contents of a.csv to confirm:

1
2
3

To write the data row-wise instead, set the newline kwarg to “,” (your delimiter).

# write array row-wise
np.savetxt(
    "very.csv", 
    a, 
    delimiter=",", 
    newline=","
)

You can also write the array row-wise by converting it to a 2D array first

np.savetxt(
    "very.csv", 
    [a], 
    delimiter=","
)

In both cases, the content of very.csv will look like this:

1    2    3

Write two dimensional array

Let’s now create a two-dimensional NumPy array and save it to CSV.

# create 2D array
b = np.array([1, 2, 3], [4, 5, 6])

# write 2D array to CSV
np.savetxt(
    "merry.csv", 
    b, 
    delimiter=","
)

2D arrays get written row-wise by default, as you would expect.

The contents of merry.csv:

1    2    3
4    5    6

Write three dimensional array

Finally, let’s create a 3-dimensional NumPy array and try to save it to CSV.

# create 3D array
c = np.random.rand(3,3,3)

# write 3D array to CSV
np.savetxt(
    "christmas.csv", 
    c, 
    delimiter=","
)

This doesn’t work. You will see an error message like this:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/var/folders/ky/bqjn_gxn1xv0cn_8q5xvp3q40000gn/T/ipykernel_46872/1804416501.py in <module>
----> 1 np.savetxt(f"{home}/Documents/numpy/bar.csv", c, delimiter=",")

<__array_function__ internals> in savetxt(*args, **kwargs)

~/mambaforge/envs/numpy-zarr/lib/python3.9/site-packages/numpy/lib/npyio.py in savetxt(fname, X, fmt, delimiter, newline, header, footer, comments, encoding)
   1380         # Handle 1-dimensional arrays
   1381         if X.ndim == 0 or X.ndim > 2:
-> 1382             raise ValueError(
   1383                 "Expected 1D or 2D array, got %dD array instead" % X.ndim)
   1384         elif X.ndim == 1:

ValueError: Expected 1D or 2D array, got 3D array instead

CSV is a human-readable, tabular format. This means only 1D and 2D NumPy arrays can be written to CSV.

Save Numpy Array with np.save()

Another way to store NumPy arrays on disk is using the native np.save() method. This will store your arrays in binary file format.

This format allows you to save NumPy arrays in all dimensions. This means the files will not be human-readable.

# save 3D array to binary NPY format
np.save('christmas.npy', c)

Let’s see if there’s a difference in the file sizes between storing in CSV and NPY.

# create medium-sized 2D array
d = np.random.rand(100,100)

# save 2D array to CSV format
np.savetxt(
    f"time.csv", 
    d, 
    delimiter=","
)

# get the size (in bytes) of the stored .npy file
! stat -f '%z' time.csv

>>> 250000

# save 2D array to binary NPY format
np.save('time.npy', d)

# get the size (in bytes) of the stored .npy file
! stat -f '%z' time.npy

>>> 80128

As you can see, the NPY file format outputs smaller file sizes: ~80KB compared to the 250KB CSV.

Other ways to save NumPy arrays

There are also other ways to store NumPy arrays. Here’s a great blog post that shows you how to write NumPy arrays to TXT files.

You can technically also use the np.ndarray.tofile() method, but this will encode the arrays into platform-dependent binary formats and so is generally not recommended.

Parallel read/write of NumPy arrays

If you’re working with small, local data the formats mentioned above will do the job.

But many real-world datasets are the opposite of small and local: they are very large (often larger than your local memory) and cloud-based. This means you will need to read and write your NumPy arrays in parallel.

The NPY file format does not allow for reading and writing in parallel.

Write Arrays to Zarr Instead

If you need parallel read/write, writing your NumPy arrays to the Zarr file format is the way to go. Zarr is a format for the storage of chunked and compressed arrays in any dimension. This means you can read and write your arrays in parallel by processing multiple chunks at once; by using a parallel processing library like Dask.

This has two important benefits over all the other file formats mentioned earlier:

  1. You can read / write arrays much faster
  2. You can read / write arrays that exceed your local machine’s memory

Conclusion

Writing NumPy arrays to CSV is possible with the np.savetxt() method — but only if your array has 2 dimensions or less.

The binary NPY format is more memory-efficient than CSV, but it is not human-readable and cannot be processed in parallel.

Writing your NumPy arrays in parallel is much faster and avoids memory errors if your dataset is very large. Use the Zarr file format to read and write your NumPy arrays in parallel.

2 thoughts on “Save NumPy Arrays to CSV Files”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s