This post explains how to write NumPy arrays to CSV files.
We will look at:
- the syntax for writing different NumPy arrays to CSV
- the limitations of writing NumPy arrays to CSV
- alternative ways to save NumPy arrays
Let’s get to it.
Writing NumPy Arrays to CSV
You can use the np.savetxt()
method to save your Numpy array to a CSV file.
Make sure to:
- add “.csv” to the filename destination, and
- set the delimiter keyword to “,”
If you don’t use these two settings, NumPy will save your files as .txt. More on that later.
CSV files can be great because they are human-readable. They also have the added benefit of being easy to load into pandas or Dask DataFrames.
Write one dimensional array
Let’s create a one-dimensional array containing random numbers using np.random.rand()
.
import numpy as np
# create 1D array
a = np.array([1,2,3])
# store in current directory
np.savetxt(
"a.csv",
a,
delimiter=","
)
NumPy will write the array column-wise by default. Let’s inspect the contents of a.csv
to confirm:
1
2
3
To write the data row-wise instead, set the newline
kwarg to “,” (your delimiter).
# write array row-wise
np.savetxt(
"very.csv",
a,
delimiter=",",
newline=","
)
You can also write the array row-wise by converting it to a 2D array first
np.savetxt(
"very.csv",
[a],
delimiter=","
)
In both cases, the content of very.csv
will look like this:
1 2 3
Write two dimensional array
Let’s now create a two-dimensional NumPy array and save it to CSV.
# create 2D array
b = np.array([1, 2, 3], [4, 5, 6])
# write 2D array to CSV
np.savetxt(
"merry.csv",
b,
delimiter=","
)
2D arrays get written row-wise by default, as you would expect.
The contents of merry.csv
:
1 2 3
4 5 6
Write three dimensional array
Finally, let’s create a 3-dimensional NumPy array and try to save it to CSV.
# create 3D array
c = np.random.rand(3,3,3)
# write 3D array to CSV
np.savetxt(
"christmas.csv",
c,
delimiter=","
)
This doesn’t work. You will see an error message like this:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/var/folders/ky/bqjn_gxn1xv0cn_8q5xvp3q40000gn/T/ipykernel_46872/1804416501.py in <module>
----> 1 np.savetxt(f"{home}/Documents/numpy/bar.csv", c, delimiter=",")
<__array_function__ internals> in savetxt(*args, **kwargs)
~/mambaforge/envs/numpy-zarr/lib/python3.9/site-packages/numpy/lib/npyio.py in savetxt(fname, X, fmt, delimiter, newline, header, footer, comments, encoding)
1380 # Handle 1-dimensional arrays
1381 if X.ndim == 0 or X.ndim > 2:
-> 1382 raise ValueError(
1383 "Expected 1D or 2D array, got %dD array instead" % X.ndim)
1384 elif X.ndim == 1:
ValueError: Expected 1D or 2D array, got 3D array instead
CSV is a human-readable, tabular format. This means only 1D and 2D NumPy arrays can be written to CSV.
Save Numpy Array with np.save()
Another way to store NumPy arrays on disk is using the native np.save()
method. This will store your arrays in binary file format.
This format allows you to save NumPy arrays in all dimensions. This means the files will not be human-readable.
# save 3D array to binary NPY format
np.save('christmas.npy', c)
Let’s see if there’s a difference in the file sizes between storing in CSV and NPY.
# create medium-sized 2D array
d = np.random.rand(100,100)
# save 2D array to CSV format
np.savetxt(
f"time.csv",
d,
delimiter=","
)
# get the size (in bytes) of the stored .npy file
! stat -f '%z' time.csv
>>> 250000
# save 2D array to binary NPY format
np.save('time.npy', d)
# get the size (in bytes) of the stored .npy file
! stat -f '%z' time.npy
>>> 80128
As you can see, the NPY file format outputs smaller file sizes: ~80KB compared to the 250KB CSV.
Other ways to save NumPy arrays
There are also other ways to store NumPy arrays. Here’s a great blog post that shows you how to write NumPy arrays to TXT files.
You can technically also use the np.ndarray.tofile()
method, but this will encode the arrays into platform-dependent binary formats and so is generally not recommended.
Parallel read/write of NumPy arrays
If you’re working with small, local data the formats mentioned above will do the job.
But many real-world datasets are the opposite of small and local: they are very large (often larger than your local memory) and cloud-based. This means you will need to read and write your NumPy arrays in parallel.
The NPY file format does not allow for reading and writing in parallel.
Write Arrays to Zarr Instead
If you need parallel read/write, writing your NumPy arrays to the Zarr file format is the way to go. Zarr is a format for the storage of chunked and compressed arrays in any dimension. This means you can read and write your arrays in parallel by processing multiple chunks at once; by using a parallel processing library like Dask.
This has two important benefits over all the other file formats mentioned earlier:
- You can read / write arrays much faster
- You can read / write arrays that exceed your local machine’s memory
Conclusion
Writing NumPy arrays to CSV is possible with the np.savetxt()
method — but only if your array has 2 dimensions or less.
The binary NPY format is more memory-efficient than CSV, but it is not human-readable and cannot be processed in parallel.
Writing your NumPy arrays in parallel is much faster and avoids memory errors if your dataset is very large. Use the Zarr file format to read and write your NumPy arrays in parallel.
2 thoughts on “Save NumPy Arrays to CSV Files”