matplotlib PdfPages - storing a lossy copy of a plot with lots of data

326 views Asked by At

I'm creating plots with matplotlib.pyplot and writing them to pdf. Some of these plots have largeish (up to 100,000) points and obviously have a lot of overlapping points, i.e. certain parts of the chart are just a solid mass. (That's okay - I'm interested in what the sparser parts of the graph look like.)

When I save these plots to pdf, it takes a long time to write, and reading the pdf is even worse. Is there a way to store a "lossy" copy of the plot in the pdf? For example, if I took a screenshot of the plot and embedded it in the pdf, it would load a lot faster.

1

There are 1 answers

0
askewchan On BEST ANSWER

I recommend trying to plot with the option rasterized:

pts = np.random.rand(2, 100000)
plt.scatter(*pts, rasterized=True)
plt.savefig('rast.pdf')

For comparison:

plt.scatter(*pts)
plt.savefig('reg.pdf')

And

$ ls -lh tmp*.pdf
177K Dec  9 22:03 tmp_rast.pdf
1.5M Dec  9 22:02 tmp_reg.pdf