Image Optimization, Part 4: Progressive JPEG…Hot or Not?

By YUI TeamDecember 5th, 2008

Stoyan Stefanov.About the Author: Stoyan Stefanov is a Yahoo! web developer working for the Exceptional Performance team and leading the development of the YSlow performance tool. He also an open-source contributor, conference speaker and technical writer: his latest book is called Object-Oriented JavaScript.

This is part 4 in an ongoing series. You can read the other parts here:

In the previous article, the progressive JPEGs were briefly mentioned as a possible option when optimizing JPEGs. This post now diggs into this option a little deeper with the results of an optimization experiment involving over 10,000 images.

Baseline vs. progressive JPEGs

Baseline are the “normal” JPEGs, the type of JPEG that all image programs write by default. The browsers load them top-to-bottom as more of the image information comes down the wire.

Loading a baseline JPEG

Loading a baseline JPEG, click to enlarge

Progressive JPEGs are another type of JPEGs, they are rendered, as the name suggests, progressively. First you see a low quality version of the whole image. Then, as more of the image information arrives over the network, the quality gradually improves.

Loading a baseline JPEG

Loading a progressive JPEG, click to enlarge

From usability perspective, progressive is usually good, because the user gets feedback that something is going on. Also if you’re on a slow connection, progressive JPEG is preferable because you don’t need to wait for the whole image to arrive in order to get an idea if it is what you wanted. If not, you can click away from the page or hit the back button, without waiting for the (potentially large) high quality image.

A reason against progressive JPEGs I’ve heard is that they look a bit old school and that users might be underimpressed, if not irritated, by the progressive rendering. I am not aware of a user study that focuses on this issue, please comment if you have heard or conducted such a experiment.

There is controversial information in blogs and books whether progressive JPEGs are bigger or smaller than the baseline JPEGs in terms of file size. So, as part of the never-ending quest for smaler file sizes and lossless optimization, here is an experiment that attempts to answer this question.

The experiment

One of the many free APIs that Yahoo! provides is the image search API. I used it to find images that match a number of queries, such as “kittens”, “puppies”, “monkeys”, “baby”, “flower”, “sunset”.. 12 queries in total. Once having the image URLs, I downloaded all the images and cleaned up 4xx and 5xx error responses and non-jpegs (turned out sometimes sites host PNGs or even BMPs renamed as .jpg). After the cleanup there were 10360 images to work with, images of all different dimensions and quality, and best of all, real life images from live web sites.

Having the source images, I ran them through jpegtran twice with the following commands:

> jpegtran -copy none -optimize source.jpg result.jpg

and

> jpegtran -copy none -progressive source.jpg result.jpg

The first one optimizes the Huffman tables in the baseline JPEGs (details discussed in the previous article). The second command converts the source JPEGs into progressive ones.

Let’s see what the result file sizes turn out to be.

Results

The Census report, like most such surveys, had cost an awful lot of money and didn’t tell anybody anything they didn’t already know — except that every single person in the Galaxy had 2.4 legs and owned a hyena. Since this was clearly not true the whole thing had eventually to be scrapped.

Douglas Adams — “So Long, and Thanks for All the Fish”

The median JPEG returned in this experiment was 52.07 Kb, which is probably not the most useful statistic. The important thing is that the median saving when using jpegtran to optimize the image losslessly as a baseline JPEG is 9.04% of the original (the median image becomes 47.36 Kb) and when using a progressive JPEG, it’s 11.45% (46.11 Kb median).

So it looks like progressive JPEGs are smaller on average. But that’s only the average, it’s not a hard rule. In fact in more than 15% of the cases (1611 out of the 10360 images) the progressive JPEG versions were bigger. Since it’s difficult to predict when an image will be smaller as progressive by just looking at it (or for automated processing without even looking at it), an idea of how the image will perform based on its dimensions or file size would be really helpful.

Looking for a relationship, I plotted all the results on a graph where:

  • Y is the difference in the savings “baseline minus progressive”, so negative numbers will mean cases when baseline is smaller

  • X is the file size of the original image

The graph shows how the results are all over the place, but there seems to be a trend — the bigger the image, the better it is to save it as a progressive JPEG.

Progressive vs baseline JPEG

“Zooming” into the area of smaller file sizes to see where progressive JPEGs get less effective, let’s only consider the images that are 30K and under. Then using the trendline feature of Excel, we can see where the line is drawn (for a clearer trendline mouse over or focus or click the image).

src="http://yuiblog.com/assets/2-progressive-jpeg.png"
id="progressive-jpeg-chart"
alt="Progressive vs baseline JPEG for smaller images"
/>

Summary

The take-home messages after looking at the graphs above:

  • when your JPEG image is under 10K, it’s better to be saved as baseline JPEG (estimated 75% chance it will be smaller)
  • for files over 10K the progressive JPEG will give you a better compression (in 94% of the cases)

So if your aim is to squeeze every byte (and consitency is not an issue), the best thing to do is to try both baseline and progressive and pick the smaller one.

The other option is have all images smaller than 10K as baseline and the rest as progressive. Or simply use baseline for thumbs, progressive for everything else.

IE and progressive JPEGs

“Oh, not IE again!” is probably what you’re thinking, but it’s not so bad actually. It’s just that IE doesn’t render progresive JPEGs progressively. It displays the image just fine, but only when it arrives completely. So in IE, the baseline JPEGs display more progressively (top-to-bottom is still progress) than the progressive JPEGs.

A word on ImageMagick

ImageMagick is a an impressive set of command-line image tools, which you can also use to optimize files. Unlike most other image software, by default ImageMagick writes optimized baseline JPEGs (as if using the -optimize switch in jpegtran).

ImageMagick can also strip meta data and write progressive JPEGs, so I repeated the experiment outlined above but using ImageMagick instead of jpegtran. The commands used were:

>convert -strip source.jpg result.jpg // baseline JPG
>convert -strip -interlace Plane source.jpg result.jpg // progressive JPEG

Observations from the ImageMagick experiment:

  • The baseline vs. progressive trendline is the same: images 10K or bigger are better optimized when using progressive encoding
  • The overall compression is better: the median is 10.85% optimization for baseline JPEGs (jpegtran saved 9.04%) and 13.25% for progressive JPEGs (11.45% with jpegtran)
  • There is some quality loss. ImageMagick doesn’t perform fully lossless operations. Inspecting random images visually I couldn’t tell any difference but using an image diff utility shows that pixel information in the images has been modified.

And the last set of stats gleaned from the experiment has to do with the speed of writing JPEGs. Here’s how jpegtran and ImageMagick performed while optimizing the 10k+ images on my laptop (Windows XP, 2GHz dual CPU, 500Mb RAM). From fastest to slowest:

  1. jpegtran baseline (11 images per second),
  2. jpegtran progressive (9 images/s),
  3. ImageMagick baseline (7 images/s),
  4. ImageMagick progressive (5.5 images/s)

10 Comments

  1. Hey Stoyan! This is good to know. I know a few people at the recent Yahoo Frontend summit said that progressive loading looked oldschool, but I myself didn’t really think so. Personally top-down loading seems more annoying in most circumstances, since sometimes I might not want to look at the full quality image. Progressive loading seems to give the user a preview while loading, almost like a fuzzy thumbnail image.

    Also noticed smushit.com already converts JPEG to progressive for you! Nice! (well, for those images over 10k!)

  2. Another excellent article!

    It’s great that your actually doing tests and research to get accurate data to determine whether assumptions are correct or not.

    The work your doing really is valuable. Think about it, these articles will be indexed and for the next 100 years whenever people search for things like ‘jpeg progressive comparison’ or what not, your article will be near the top if not the top.

    I certainly think you should do a ‘Summary’ article at the end where you list out very simple rules as to what image format to choose in what situation.

    Even better, implement your findings into YSlow. Have it automatically inspect the images being used and suggest alternatives based on that analysis.

    Once again, nice work :)

  3. Good stuff. Small thing, easy to remember and very useful to know. Thanks.

  4. File size aside, I would defer to Jakob’s Law: users spend most of their time on other sites. Most sites use baseline, so users expect baseline. I think the potential benefits of progressive aren’t significant or clear enough to warrant violating user expectations.

  5. Kick-Butt articles (all four of them)! Thanks for delving into the topic of image optimization and putting some hard numbers to things that I had felt for a long time, but was never able to put concrete evidence to.

  6. Thanks for this great post!

  7. Hi,
    I hate to harp on it again, but boasting that a command line can remove metadata from an image should be retracted as it is against the Digital Millennium Copyright Act. Programmers should either find another command line that keeps the metadata where it should remain (in the photo file), or develop another method for shedding the extra weight elsewhere.

    Thanks,
    Jackie

  8. Arran Ross-Paterson said:
    June 8, 2009 at 2:49 am

    The Digital Millennium Copyright Act would only apply if you or the organisation you where representing did not hold the copyright of the images that you where publishing, but more importantly the Digital Millennium Copyright Act is only relevant to the minority of the web that is developed in the USA.

  9. Hi Stoyan

    These are an excellent set of articles. They’re very thorough and filled with useful info. Thanks

  10. Thanks for this great article.

    I saw a comment in IRC about using progressive jpegs to save space and improve loads times but wasn’t able to query the author of the comment at the time to learn *why. After reading a couple of less than spectacular articles on the subject, I hit yours.

    Nicely done. I got the why, the when, potential hows, and a scientifically oriented proof of the reasoning. Who could ask for more? :)