This is the second in a series of articles describing experiments conducted to learn more about optimizing web page performance. You may be wondering why you’re reading a performance article on the YUI Blog. It turns out that most of web page performance is affected by front-end engineering, that is, the user interface design and development.
In an earlier post, I described What the 80/20 Rule Tells Us about Reducing HTTP Requests. Since browsers spend 80% of the time fetching external components including scripts, stylesheets and images, reducing the number of HTTP requests has the biggest impact on reducing response time. But shouldn’t everything be saved in the browser’s cache anyway?
It’s important to differentiate between end user experiences for an empty versus a full cache page view. An “empty cache” means the browser bypasses the disk cache and has to request all the components to load the page. A “full cache” means all (or at least most) of the components are found in the disk cache and the corresponding HTTP requests are avoided.
The main reason for an empty cache page view is because the user is visiting the page for the first time and the browser has to download all the components to load the page. Other reasons include:
Strategies such as combining scripts, stylesheets, or images reduce the number of HTTP requests for both an empty and a full cache page view. Configuring components to have an Expires header with a date in the future reduces the number of HTTP requests for only the full cache page view.
Previously, we observed where the time is spent when a user requests http://www.yahoo.com with an empty cache. When a user loads the page, the browser downloads approximately 30 components (see Figure 1). Figure 2 is a graphical view of where the time is spent loading http://www.yahoo.com with a full cache. Each bar represents a specific component requested by the browser. Since components are already in the cache on a full cache page view, and the Expires header has a date in the future, the browser only has to download three components including the HTML document
Table 1 shows a summary of the total size and number of requests for each type of component to load http://www.yahoo.com. How much does a full cache benefit the user? Loading the page over my cable modem at home, it took 2.4 seconds with an empty cache and only 0.9 seconds with a full cache. The full cache page view had 90% fewer HTTP requests and 83% fewer bytes to download than the empty cache page view.

* Times were measured over cable modem (~2.5 mbps).
The performance team at Yahoo! ran an experiment to determine the percentage of users and page views with an empty cache on some of Yahoo!’s most popular pages. We defined the experiment to measure users’ cache behavior related to a new component (an image). For this new image we measured the following statistics each day:
The new image was configured with the following HTTP headers:
Expires: Thu, 15 Apr 2004 20:00:00 GMT Last-Modified: Wed, 28 Sep 2006 23:49:57 GMT
When the browser saves a component in its cache, it also saves the Expires and Last Modified values. Specifying an Expires date in the past forces the browser to request the image every time the page is viewed (with a few exceptions, such as when users click the browser’s “back” button to return to a page). If the image is already in the browser’s cache and is being re-requested, the browser will pass the Last-Modified date in the request header. This is called a conditional GET request and if the image has not been modified, the server will return a 304 Not Modified response. The requests from browsers, therefore, result in one of the following response status codes:
Since the status codes are recorded in the apache access logs, we are able to determine the empty and full cache measurements by analyzing the logs.
The percentage of users with an empty cache is:
# of unique users with at least one 200 response
total # of unique users
The percentage of page views with an empty cache is:
# of 200 responses
# of 200 responses + # of 304 responses
Figure 3 shows the percentage of users and page views with an empty cache plotted over each day of the experiment. On the first day of the experiment, no one had these images cached so the empty cache percentage was 100%. As the days passed more users had the images cached, so the percentages dropped until at some point it reached a constant steady state.
40-60% of Yahoo!’s users have an empty cache experience and ~20% of all page views are done with an empty cache. To my knowledge, there’s no other research that shows this kind of information. And I don’t know about you, but these results came to us as a big surprise. It says that even if your assets are optimized for maximum caching, there are a significant number of users that will always have an empty cache. This goes back to the earlier point that reducing the number of HTTP requests has the biggest impact on reducing response time. The percentage of users with an empty cache for different web pages may vary, especially for pages with a high number of active (daily) users. However, we found in our study that regardless of usage patterns, the percentage of page views with an empty cache is always ~20%.
Conclusion: Keep in mind the empty cache user experience. It might be more prevalent than you think!
August 6, 2007 at 12:04 am
[...] Browser Cache Usage – Exposed! [...]
August 8, 2007 at 6:53 am
nice post.. btw: did you tried YSlow against the blog site ? eheh – Performance Grade: D (68)
August 13, 2007 at 11:27 am
[...] an empty cache experience and about 20% of all page views are done with an empty cache (see this article for more information on browser cache usage) This fact outlines the importance of keeping web pages [...]
August 15, 2007 at 12:54 am
[...] for improving performance for first time visitors. As described in Tenni Theurer’s blog Browser Cache Usage – Exposed!, 40-60% of daily visitors to your site come in with an empty cache. Making your page fast for these [...]
August 16, 2007 at 12:33 am
[...] Glaubt man den Ingenieuren von Yahoo, haben 40% bis 60% der Yahoo Benutzer Erfahrungen mit leerem Cache und 20% aller Seitenaufrufe erfolgen gar ohne Cache (mehr zu diesem Thema in diesem Blogeintrag). [...]
August 17, 2007 at 2:31 am
40-60% users with an empty cache seems pretty high. I could understand that for Firefox users as they are considered more tech savvy to know how to delete private data. but I doubt that for the majority of IE users and normally they stand for the bigger piece of cake in the browser game…
August 17, 2007 at 4:38 pm
[...] Performance Research, Part 2: Browser Cache Usage – Exposed! » Yahoo! User Interface Blog [...]
September 5, 2007 at 8:06 pm
The best possible way to achieve cacheability of an object is to perform a server-side re-write of all linked content (images, scripts etc) and re-write the links to refer to a file name based off the MD5 hash of the file content.
So, your link:
http://us.i1.yimg.com/us.yimg.com/i/ww/beta/y3.gif
is re-written:
http://us.i1.yimg.com/never-expire/A31D5F12.gif
Where A31D5F12 is the MD5 hash.
The never-expire directory is configured via the Apache .htaccess file (with mod_expires) so that all content contained expires a long time in the future.
Since the hash generates a globally unique value from a given file (ahem), the content is cacheable forever. Any referenced media can be treated in this way.
Every time any linked media is changed, the page must be re-written and the appropriate new MD5 based file names substituted.
September 5, 2007 at 8:14 pm
The cacheability checker will help anyone
http://www.mnot.net/cacheability/
Also, see
http://www.web-caching.com/
November 7, 2007 at 8:36 pm
[...] into the service with an empty browser cache (a figure derived as the result of a great deal of research on the topic by Yahoo), one can derive the following [...]
December 20, 2007 at 8:00 pm
[...] mais sobre cache [...]
December 22, 2007 at 6:58 am
I’ve overheard quite a few times that using 304 as a response to Yahoo/Google crawlers may affect the way my website is ranked afterwards.
Is there any piece of truth behind that affirmation?
January 29, 2008 at 9:37 pm
[...] 20 percent of all views of that page are occurring with an empty browser cache, according to studies conducted at Yahoo (the logo image not being in a visitor’s browser cache, either because the have not visited [...]
February 5, 2008 at 12:51 am
[...] some might argue that testing the speed of un-cached pages would be unfair, however according to Yahoo’s research on caching, approximately 50% of users will never have the opportunity to have the page contents be cached. [...]
February 6, 2008 at 11:57 am
[...] describing experiments conducted to learn more about optimizing web page performance (Part 1, Part 2, Part 3, Part 4). You may be wondering why you’re reading a performance article on the YUI [...]
February 8, 2008 at 8:47 am
[...] de un sistema operativo, éstas otras han de ser descargadas (si no han sido cacheadas, lo que sucede el 50% de las veces) por cada usuario que visualice la web. Esto supone un problema de rendimiento, ya que [...]
February 10, 2008 at 12:03 pm
[...] которые заходят к вам в первый раз. Как сказано в блоге Tenni Theurer’а: «40-60% посетителей приходят на сайт с пустым кешем». [...]
March 6, 2008 at 10:03 pm
[...] 80% of those page views are done with a primed cache (based on Yahoo!’s browser cache statistics). We’re down to 80M page [...]
March 11, 2008 at 2:38 am
[...] 80% of those page views are done with a primed cache (based on Yahoo!’s browser cache statistics). We’re down to 80M page [...]
March 18, 2008 at 9:11 am
Have you done any similar research to sites using https/ssl? It seems that the browsers cache usage is very different once the content is delivered with ssl in some cases never appearing to cache any content, even small images – though using css does seem to help.
March 22, 2008 at 8:40 am
[...] 很多网站的 UI 设计人员为了达到某些视觉效果,会在一些用户需要频繁访问的页面模块上应用大量的图片。这样的情况,研究表明,对于用户粘度比较高的站点, 在Web 服务器上对这一类对象设置 Expires Header 就是十分有必要的,大量带宽就这么节省下来,费用也节省了下来。顺便说一下,对于验证码这样的东西,要加个简单的规则过滤掉。 [...]
March 27, 2008 at 2:26 am
[...] for improving performance for first time visitors. As described in Tenni Theurer’s blog Browser Cache Usage – Exposed!, 40-60% of daily visitors to your site come in with an empty cache. Making your page fast for these [...]
August 3, 2008 at 9:04 pm
[...] for improving performance for first time visitors. As described in Tenni Theurer’s blog post Browser Cache Usage – Exposed!, 40-60% of daily visitors to your site come in with an empty cache. Making your page fast for these [...]
August 19, 2008 at 12:25 am
Hi,
I’ve a lot of confusion about Header Expires. I know it enables the users to load the pages fast if the components like CSS, images, scripts etc… in cache. But the thing I want to know is What is the syntax to use this, and where to use? I mean where to use this Header Expires in the html file or in CSS file?
Your help is valuable to me and greatly appriciated.
Regards,
SreeRam.
August 19, 2008 at 9:33 am
@SreeRam: The Expires header can be set in your web server configuration. For example, Apache uses optional modules to include headers, including both
ExpiresandCache-Control. Use the ExpiresDefault directive to set an expiration date relative to the current date. For more information on this rule, take a look here.November 1, 2008 at 6:06 am
I am managing a website that has hundreds of thousands of new unique visitors daily. Which means that they have empty cache, how to cope up with this problem in a high javascript/css/AJAX based website?
December 25, 2008 at 6:32 pm
for the best result, how long must we set expires header?
March 29, 2009 at 6:12 am
IE’s cache has been broken since the start, it will not check for new copies of a page at appropriate times, IE does not work with dynamic websites unless you effectively disable the cache. That is part of the reason you will always see 20% or more no cache page views.
April 15, 2009 at 5:37 am
Is it possible to specifying conditional caching within for page loads? i.e. cache the html but do not cache the javascript that it links to, for example?
April 16, 2009 at 7:01 am
for HTTP STATUS 304, we still need time to check if the media modified or not, right? how to remove the checking time?
June 14, 2009 at 7:21 am
[...] Performance Research, Part 2: Browser Cache Usage – Exposed! [...]
August 2, 2009 at 5:43 pm
[...] When you measure your pages, you must test them using both an empty and a primed cache. The general assumption is that 20 percent to 50 percent of your incoming requests are being done with an empty cache. This supposition was proven to be true in a test that Yahoo conducted. [...]
September 26, 2009 at 10:14 am
Surprised at the number of ‘use a unique filename for every rev. of the file’ and ‘make the filename an MD5 hash of its contents’ type comments.
Have you never heard of ETags? Or do most IE flavours not support it?
March 28, 2010 at 12:13 am
[...] for improving performance for first time visitors. As described in Tenni Theurer’s blog post Browser Cache Usage – Exposed!, 40-60% of daily visitors to your site come in with an empty cache. Making your page fast for these [...]
April 27, 2010 at 12:14 am
[...] I ran an experiment to measure browser cache stats from the server side. Tenni’s write up, Browser Cache Usage – Exposed, is the stuff of legend. There she reveals that while 80% of page views were done with a primed [...]
April 27, 2010 at 8:43 am
[...] and I ran an experiment to measure browser cache stats from the server side. Tenni’s write up, Browser Cache Usage – Exposed, is the stuff of legend. There she reveals that while 80% of page views were done with a primed [...]