Ultimate web site optimization trick: data URIs

base64

Are you an obsessive compulsive web site optimization nut? Are you willing to sacrifice maintainability for one less HTTP request? Have people ever asked you to seek a mental health professional because of your OCD? If so, then you are in good company. Us optimization nuts may not be very well received when we start building a project, but we become very appreciated when someone asks how to make it scale. Depending on who you ask, web optimization is either the most useless talent ever or the single most important skill on a web developers resume.

Allow me to broaden your optimization toolbox with the ultimate HTTP connection killer, the data URI scheme. Most web developers have never heard of data URIs but they can dramatically reduce the number of HTTP connections required to download your web site.

This article will explain what data URIs are, how to use them, and how to properly implement them.

What exactly are data URIs?

The data URI scheme (aka data: URLs) is a method to include data in-line in a URI.

I don’t suppose that makes much sense, let me elaborate; Simply put, data URIs allow you to include a file (or multiple files) inside of another file. The most obvious and practical use is to embed background images inside a CSS file. You do this by base64 encoding a file and embedding the contents, with some basic meta data, into a URI.

It is important to note that while the data URI scheme was created in 1998 it never reached the status of Standard Protocol. It is still on the list of “Proposed Standards”. However, all of the most popular modern browsers have implemented the data URI scheme.

Due to the IE problem (explained further down) data URIs can only be fully supported in CSS files.

Data URI scheme

The format is pretty straight forward, the official data URL scheme as defined in the RFC is:

data:[<MIME-type>][;charset=”<encoding>”][;base64],<data>

Okay, let me break that down for you and provide some real world examples.

Like any URI it starts with a protocol identifier, data. Just like http, ftp, mailto or gopher this tells the browser how to use the following information.

The first piece the browser needs to know is the MIME type of the data you are including (e/g image/png or text/html). Over normal HTTP transfers the server identifies the MIME type from the file extension and sends it back with the file in the response headers. Since we are not transferring the data through an HTTP connection there is no automatic MIME-type data, and we have to manually identify the type of data we are giving to the browser.

Next is the charset identifier, which is exactly what it sounds like. If you are passing any kind of text document (e.g. plain text, html, xml) then you should tell the browser what kind of character encoding your are using for that text (e.g. UTF-8 or US-ASCII).

And the last piece of possible meta data is the base64 keyword. This tells the browser that the following data is base64 encoded.

All of this meta data is optional depending on the data you are embedding. If you are passing an image you won’t include the charset information, likewise if you’re passing a text document then your probably wont be encoding it and will omit the base64 keyword.

Useful examples

Here are a few examples of data URIs at work.

CSS background

In the real world, CSS backgrounds are about the only place where data URIs can be used (see: IE problem below). In my opinion this is the single best use for this technique anyway. We can embed all of our icons and backgrounds inside of the very style sheet that defines the rules that use those images.

#logo {
	background-image: url('[*about 2000 more characters*]AAAABJRU5ErkJggg==');
}

JavaScript

If you don’t have to worry about old versions of IE then you can use data URIs anywhere, even in JavaScript for any kind of URL destination. If you need to open a new window with a basic layout but don’t want to host an otherwise useless file then embed the layout in a data URI.

window.open('data:text/html;charset=utf-8,%3C%21DOCTYPE%20html%3E%0D%0A%3Cht' +
	'ml%20lang%3D%22en%22%3E%0D%0A%3Chead%3E%3Ctitle%3EEmbedded%20Window%3C%' +
	'2Ftitle%3E%3C%2Fhead%3E%0D%0A%3Cbody%3E%3Ch1%3EDATA%3C%2Fh1%3E%3C%2Fbod' +
	'y%3E%0A%3C%2Fhtml%3E%0A%0D%0A', '_blank', 'height=300,width=400');

Embedded images

If you don’t need to support IE 6/7 then you can embed an image inside of the image tag itself with a data URI. This probably isn’t very useful, but it is a good demonstration of the technique.

<a href="http://www.linkedin.com/in/stevenbenner" target="_blank">
	<img src="[*about 2000 more characters*]AAAABJRU5ErkJggg==" />
</a>

Embedded links

Even Internet Explorer 8 won’t let you embed data in links because of security considerations, but it is another good demonstration of the technique.

<a href="[*about 2000 more characters*]AAAABJRU5ErkJggg==">
	Link to Twitter logo
</a>

The Internet Explorer problem

As usual, Microsoft Internet Explorer screws up everything, data URLs are not supported in any version prior to Internet Explorer version 8. As much as I may want to add this to the hate list, I honestly cannot fault Microsoft for not supporting this until version 8. As I said earlier the data URI scheme is still on the list of Proposed Standards. So the fault rests entirely on the shoulders of the W3C for not looking at this RFC for the last 10 years.

In addition to the problem of old versions, Internet Explorer 8 has some additional restrictions.

  • data must be smaller than 32KB
  • data URIs cannot be used for JavaScript
  • only object, img, input and link HTML tags can use data URIs

However, you can use data URIs in any CSS url statements, so there is still plenty of use for data URIs in the latest version of Internet Explorer.

Graceful degradation options for older versions of IE

There are two options for graceful degradation:

  • Make regular images available and use regular url links as a fall-back.
  • Or, use the MHTML technique.

Personally, I’m not a fan of the MHTML technique because it will greatly increases the size of your CSS file (doubling the number of embedded images). In my opinion, this is just too high of a price to pay. It is far simpler to make the images available on your server and just link to them with regular CSS url statements. At least it is for me, if for some reason you would rather take the file size hit instead of publishing the image files then there is a good MHTML how-to article here.

For the this article I will use CSS url fall-backs for graceful degradation.

IE 6-7 specific CSS degradation

There are two ways to implement your CSS fall-back statements to target IE 6 and 7:

  • Use Internet Explorer revision targeting with conditional comments
  • Or, use the asterisk CSS hack.

Both of these are good choices, with simple pros and cons. IE revision targeting with conditional comments will require an additional CSS file specifically for older versions of IE, but in return it allows you to craft a style sheet that validates. Whereas the asterisk hack saves you the trouble of having an extra file but creates an invalid CSS statement.

Personally, I am anal about writing valid CSS and often have a separate CSS file for IE anyway (for IE PNG fix behavior statement) so I use IE revision targeting statements.

I’ll leave the choice up to you, so here are examples of both techniques:

Internet Explorer revision targeting

I’m sure you seen this technique countless times before, but here it is again. Internet Explorer supports version targeting statements in HTML comment blocks. These conditional comments are quite smooth because they produce valid markup and guarantee, for all time, that your code will only be parsed by the versions of Internet Explorer that you target.

The following snippet demonstrates how you include a CSS file that will only be seen and downloaded by Internet Explorer versions 7 and older:

<link rel="stylesheet" type="text/css" href="base.css" />
<!--[if lte IE 7]>
<link rel="stylesheet" type="text/css" href="old_ie.css" />
<![endif]-->

In this example base.css is seen by all browsers but old_ie.css is only seen by IE 7 and older. Place this code in your HTML head block and the statements you place in old_ie.css will override the rules from base.css in old versions of Internet Explorer. For example:

base.css

#logo {
	background-image: url('[*about 2000 more characters*]AAAABJRU5ErkJggg==');
}

old_ie.css

#logo {
	background-image: url(/images/logo.gif);
}

Now when a modern browser visits the page they will properly download and use the embedded background image but when an old version of IE visits the site it will use the rule from old_ie.css.

The asterisk hack

A less well known technique for targeting old versions of IE is the so called Asterisk Hack. This is a nifty little trick that someone, somewhere figured out. If you prepend an asterisk to a CSS rule then that rule will only be processed by Internet Explorer versions 6 and 7.

Here is an example of this hack:

#wrapper {

	/* Processed by all browsers */
	background-image: url('[*about 2000 more characters*]AAAABJRU5ErkJggg==');

	/* Only seen by IE 6 & 7 */
	*background-image: url('/images/graphic.png');

}

Simple! Unfortunately this is invalid CSS, which irks me, but it is a clean and efficient hack. Only IE 6 and 7 will see the second background statement, and since it comes after the default data URI statement, it will override it with a link to the image.

How to get encoded files for data URIs

I built a little tool that allows me to quickly encode files on my computer. Here is the download if you would like to use it. This isn’t my finest work, it is just a basic tool to do the job in 20 lines of code, don’t expect a lot of bells and whistles.

Steve’s Encode64 Utility

Steve's Encode64 Utility Screen Shot

Click here to download

How to generate base64 strings in code

If you need to dynamically generate base64 strings then you’re in luck, this task is quite simple. Here are a couple code examples to get you on your way:

PHP

$encoded_string = base64_encode(file_get_contents('logo.png'));

C#

private string Base64Encode(string fileToEncode)
{
	byte[] EncodeBuffer;

	using (FileStream fs = new FileStream(fileToEncode, FileMode.Open))
	{
		int Length = Convert.ToInt32(fs.Length);
		EncodeBuffer = new byte[Length];

		fs.Read(EncodeBuffer, 0, Length);
	}

	return Convert.ToBase64String(EncodeBuffer);
}

Conclution

I think data URIs are an awesome, and powerful optimization trick. You can save many extra HTTP connections, and the associated latency and overhead for all of your users who are running a modern browser. This will make your site much snappier, especially if you use a lot of icons.

Using these techniques you can have a fully compatible web site that will gracefully degrade to traditional external resources if you can accept the following restrictions:

  • Only images
  • Only in CSS files
  • Less than 32KB each

I can see this working well for background images and for icons. With data URIs you can embed all of your presentation resources into the same CSS file that uses them. This file will grow much larger than normal but it will cache well and save in overall bandwidth.

By:
Updated: Jul 29th, 2012

Comments

  1. Steven,

    Very nice write-up. One other thing to mention is the importance of gzipping any stylesheets that contain data URIs. The base64 encoding will increase file size by about 33%, but once you apply compression, the file size will return to roughly the original size.

  2. Hi Rob, Thanks for the comment!

    Indeed, gzip is very important to the equation. Most commercial hosts are set to gzip CSS files by default. This will greatly reduce the download size of the files and make the data URIs very efficient.

    For people running their own Apache server who don’t know the syntax, you enable gzip compression by enabling mod_deflate and adding the following line to your config file:

    AddOutputFilterByType DEFLATE text/html text/plain text/xml text/css application/x-javascript

    I don’t remember off hand how to do it in IIS but it is very simple, just google it.

  3. Wro4j (Web Resource Optimizer) framework provides support for data uri when dealing with image backgrounds in css. Besides that, it merge, minimize & gzip the static resources (css & javascripts): http://code.google.com/p/wro4j/wiki/Base64DataUriSupport

    Alex

  4. I just wanted to point out that you have an error in your C# code. You are missing a return and or your return is in the wrong scope (depending on how you look at it). :)

  5. @Alex

    Interesting project! I don’t use Java but I’m sure someone reading this will find it useful. Thanks for the link.

    @Aaron

    Ahh, of course you are technically correct, the best kind of correct. The return should be outside of the using statement. I just updated the function. For the record the original version compiles and runs just fine. :)

  6. Cany0000n

    I like it. Thanks

  7. I prefer to use the CSS sprite technique to reduce the number of request required to load my pages.
    In my opinion it caches better and don’t compromises the maintainability of the markup.
    However thanks for the great tutorial and for the simple tool to calculate the base64 encoding ;)

  8. But that’s the best part of data URIs! You can still use your sprites, except even more effectively. Data URIs enhance what you’re trying to accomplish with sprites. I’ve taken to embedding my sprites into the CSS file, with a graceful degradation fallback to the standard sprite image on the server.

    So long as your server is properly configured for optimal caching then a CSS file will cache just as well as an image. With this technique there’s one less request for modern browsers.

    Maintainability is slightly reduced because you have to remember to update the CSS file whenever you update the sprite, but sprites themselves are a huge maintainability sacrifice. Since you have to keep a master composite image and append new images as needed. What’s even worse, if you need to remove one image then you also have to modify the CSS background-position for every image that follows it.

    But we’ve all accepted this as an acceptable trade off for the reduced number of HTTP connections. I think data URI’s are just one more little step in this direction.

  9. I am using picasa for referring to image. So, I think this may not be useful for me.

  10. Nice post Steven. Thanks for sharing.

Leave a reply

Pages linking to this article