ApplicationCache

Up until now, users of web applications have only been able to use the applications while connected to the Internet. When offline, web based e-mail, calendars and other online tools have been unavailable, and, for the most part, continue to be. While offline, users may still access some portions of sites they have visited by accessing what is in the local cache, but that is limited and inconvenient. If a user gets bumped offline in the middle of a process — writing an email, filling a form — hitting submit can lead to a loss of all the data entered. The HTML 5 specification provides a few solutions including a SQL-based database API for storing data locally, and an offline application HTTP cache for ensuring applications are available even when the user is offline. HTML 5 contains several features that address the challenge of building Web applications that don’t lose all functionality while offline, including SQL, offline application caching APIs as well as online/offline events, status, and the localStorage API.

In this section we are focusing on the Application Cache.

A book could be written about each of these topics. In this series of articles, we’re are introduced to each of the HTML5 API and related modules (for example, GeoLocation is not in the HTML5 specifications, but is often thought to be). This series will provide a brief introduction to each topic, linking to the specifications and other relevant articles. It’s important to know about what is going to be in HTML5 so that when these features are needed, and when they are available and supported on your target devices, you’ll know about these features.

Application Cache

To access a webpage offline, we’ve been able to click on the ‘save’ menu item in our browser to save the html file and associated media for a while now, but this method of saving web pages only works for static content. With the ubiquity of web-based applications, it is more important now than ever that web applications be accessible when the user is offline: whether the user is offline for the duration of a 5 hour flight, or if the user is just temporarily offline as 3G connectivity is lost while driving… um, I mean, riding as a passenger. While browsers have been able to cache components of a website, HTML5 addresses some of the difficulties of being offline with the ApplicationCache API.

Using the cache interface gives your web application the advantages of 1) offline browsing, 2) faster reloads and 3) reduced server load. With Application Cache offline browsing, you entire site (within limits) can be navigable even when a user is offline. By caching resources, the resources are local, loading much faster, and they’re retrieved only when altered, which reduces the server load. AppCache enables the local storing of up to 5MB (or limits you to 5MB, depending on your perspective) per website.

The Application Cache (or AppCache) enables you to specify which files should be cached and made available offline, enabling your website to work correctly, including page reload, when your user is not online.

For AppCache to work, you include the manifest attribute in the opening <html> tag. The value of which is the URL of a text file listing which resources should be cached. In your HTML file, include manifest="URL_of_manifest".

<!doctype HTML>
<html manifest="resourcelist.manifest">
<meta charset="utf-8" />
<title>....

With the inclusion of the manifest attribute linking to a valid manifest file, when a user downloads this page, the browser will cache the files listed in the manifest file, the manifest file itself and the current document and make them available even when the user is offline.

When the page is visited, the browser will try to update the cache if the manifest file has changed. It fetches the manifest and, if the manifest has changed since the page was last visited, the browser re-downloads all the assets and re-caches them.

The cache manifest file

The .manifest file is a text file that lists the resources the browser should cache. The file must start with the following string: CACHE MANIFEST. The required string is then followed by a list of files to be cached, and optional comments and section headers.

To create a comment, include a # as the first character of the line and the remainder of the line will be ignored. Section headers change the current section. There are three possible section headers, CACHE:, FALLBACK: and NETWORK:. The files following the CACHE: header are explicit. If no header is defined, or if files are listed above the headers, those files are explicit. FALLBACK: switches to the fallback section — if the first file is not available, the second file listed on the line will be served. While secure (https://) files may be cached, they have to be from the same origin as the manifest. The NETWORK: heading leads the online white list section – it isn’t cached, and references to this file will bypass the cache and access the file online.

The manifest file is also permanently stored into the browser cache, only to be overwritten if there is an edit made to the file. While the browser will not read the comment, the .manifest file needs to be served with the mime-type "text/cache-manifest". You may need to add a custom file type to your web server or .htaccess configuration if you don’t have access to the server.

A .manifest file lists the files that the browser should cache. The file may look something like this:

CACHE MANIFEST
#version01

#files that explicitly cached
CACHE:
index.html
css/styles.css
scripts/application.js

#Resources requiring connectivity
NETWORK:
signin.php
dosomething.cgi

The Application Cache API adds an element to HTML5. The <event-source> element is a new feature in HTML 5 that allows servers to continuously stream updates. When used with the appCache, its src attribute takes as its value a NETWORK: file. The NETWORK: files will never be cached, so that any attempt to access that file will bypass the cache. The <event-source> tag defines a source for events sent by a server. The src attribute of the <event-source> element takes as it’s value the white-listed NETWORK: file to allow for continuous upsteaming from the server.

You’ll note that there is a version number in a comment on the second line. While the browser will generally ignore the comment, it will note if there are changes to the file. Changing the commented version number has become the standard way of informing the browser that the manifest file should be considered updated.

Updating the cache

Once an application is offline it remains cached until the user clears their browser’s data storage for your site, the .manifest file is modified or the app cache is programmatically updated.

If you update a file listed in the manifest, this does not inform the browser that the assets must be re-cached. The reason we added the commented version number was to inform the browser that the assets must be re-cached by altering the .manifest file itself.

If the manifest file or a resource specified in it fails to download, the entire cache update process fails and the browser will keep using the old application cache.

Published by

Estelle Weyl

My name is Estelle Weyl. I am a consulting web developer, am writing some books with O'Reilly, run frontend workshops, and speak about web development, performance, and other fun stuff all over the world. If you have any recommendations on topics for me to hit, please let me know via comments. If you want