Github
LinkedIn
Twitter
YouTube
RSS

Selecting the correct image file type

Published: February 19, 2021

Selecting the correct image file type

This is part two of our four part series

  • Part 1: Specifying the correct dimension in {knitr} graphic
  • Part 2: What image format should you use (this post)
  • Part 3: Including external graphics in your document
  • Part 4: Optimal {knitr} settings.

There are (at least) three file formats to choose from: JPEG, PNG and SVG.

Attribute JPEG PNG SVG
Type Raster Raster Vector
Transparency No Yes Yes
Animation No No Yes
Lossy Yes No Yes
Recommended Occasionally Yes Often

If you are reading this via a syndication site, be sure to go the original post for updated links.

JPEG Images

As the JPEG compression algorithm significantly reduces file size, JPEG files are ubiquitous across the web. If you take a photo on your camera, it’s almost certainly using a JPEG storage format. Historically the file extension was .jpg as Microsoft Windows only handled three character file extensions (also .htm vs .html). But today both extensions are used (personally I prefer .jpeg, but I’m not very consistent if I’m totally honest).

If you did a little Googling on which file format to use for images, then the answer you would come across is that JPEG’s are the default choice. But remember, figures are different from standard images! R figures have text, straight lines, lots of white space, and perhaps transparency.

However,

  1. JPEGs don’t support transparency.
  2. The algorithm used to compress a JPEG image is discrete cosine transform (DCT). Essentially, similar pixels within an image are merged. However, this averaging process means that the method is lossy, i.e. by storing the image as a JPEG, we are losing information.

So in general don’t use JPEGs. The only exception might be when you have a “photograph-type” plot, such as a detailed contour or heatmap. You would only consider this, however, if the file size of the PNG was large.


Do you use RStudio Pro? If so, checkout out our managed RStudio services


Portable Network Graphics (PNGs)

A Portable Network Graphics file (PNG) is a raster file format that uses lossless compression. It was originally created as a replacement for the GIF, but unlike GIFs, PNG files don’t support animations.

While the PNGs file size is a little larger than JPEGs, it is usually the better default option. For a nicer, smoother image, you should use type = "cairo-png" when creating a png.

The size of a PNG can often be reduced using the optipng utility. For graphs, a reduction of around 50% isn’t unusual. Running optipng takes around one to two seconds per image, so isn’t really suitable for dynamic, i.e. Shiny based applications. However, for generating images that will never change, e.g. blog posts, then an extra second or two is not an issue.

If you’ve already created a bunch of images, then a simple R script can easily optimise all files

png_files = list.files(pattern = "*\\.png$", 
                       full.names = TRUE, 
                       recursive = TRUE)
for (png in png_files) system2("optipng", png)

The optipng function has a compression argument, that allows you to improve the PNG compression, e.g.

system2("optipng", args = c("-o1", png))

The argument -o selects the level of compression. The higher the number, the hard optipng tries to compress. To be honest, I’ve found leaving the compression level at the default level to be more than significant. Life is too short to worry about the odd byte.

Scalable Vector Graphics (SVGs)

A Scalable Vector Graphics file (SVG) uses an XML-based format to precisely describe how the image should appear. Since the graph is described using text, an SVG can be scaled to different sizes without losing quality, i.e. we no longer worry about resolution.

This format is particularly appealing for figures, which are simply a combination of lines, texts and shapes. The downside is that file sizes can get prohibitively large. For example, if you have a scatter plot with lots of points, each individual point will have it’s own entry in the SVG file.

To understand the trades a bit more, lets create multiple {ggplot2} scatter plots where were gradually increase the number points

library("ggplot2")
no_of_pts = c(1, 5, 10, 50, 100, 500, 1000, 5000, 10000, 50000, 100000)
for (i in no_of_pts) {
  x = runif(i)
  y = runif(i)
  g = ggplot(data = data.frame(x = x, y = y)) +
    geom_point(aes(x, y))
  # Save graphic using SVG, PNG, etc
}

The first scatter plot (when i = 1) only contains a single point, whereas the final scatter plot contains i = 100000 points, and is almost entirely black with points. For each scatter plot, we generated

  • a PNG version
  • a PNG version, optimised using optipng
  • a SVG graph.

Comparision of file sizes of different file formats.

The figure shows that the file size increases with the number of points. However, at around 10,000 pts, the PNGs file size starts to decrease. This is because the plots are simply a black background. However, the file sizes for SVGs increase in a predictably linear fashion. When plotting 200 pts, the file size is starting to get prohibitive. At around 5,000 pts, the SVG file is over 1MB.

As an aside, the JPEG file size is about the same as the PNG file size in this test.

Next Generation Formats

If you’ve ever used Google’s PageSpeed Insights, one of the recommendations is to serve images in next-gen formats, such as JPEG 2000, JPEG XR, and WebP. However, a little investigation suggests this is overkill for the vast majority of sites. As a significant numbers of browsers don’t yet support these formats, shaving off a few bytes doesn’t seem worth the effort. Obviously if you have a top 100 site like Amazon or are serving lots of images on a page, then it may/will be worth the hassle. For most sites using Hugo, shaving milliseconds off load time isn’t required.

Conclusion

Spending a little time optimising file size brings lots of benefits

  • your web-pages load faster
  • your documents are smaller
  • your gitlab repo clones quicker and takes up less storage

Typically, I don’t care about optimising to the nearest byte, but a combination of choosing the correct dimensions and the correct image type, gives you something close to optimal with little thought.