Web cache

Browser Cache [ bɹaʊ̯zə ( ɹ ) kæʃ ] is a buffer memory of the web browser in which previously retrieved resources (such as text or images ) on the user's computer ( locally ) be kept as a copy. If a resource is required later, she's out of the cache available faster than if they would have to be re-downloaded from the World Wide Web.

Each time for displaying a page, the content is needed to a URL, it is first checked in the cache to see if they are already present.

An advantage is that network traffic and the time to download all parts of a website can be greatly reduced. The disadvantage is that the cached data can be outdated when the web page has been updated in the meantime.

Areas of application

"Resource" is anything that can be accessed via a specific URL. At different times, may be present at the same URL different content. The cache any URL of the last become known content is assigned.

The same concept for a browser cache on a user's PC is also used by so-called proxy servers for all computer networks, such as for one location or university at the connection point to the Internet or to all clients of a (physical) telecommunications provider in a service area. They provide all participants in this network frequently sought files (such as the WP- globe) directly, without first having to go through the actual internet.

Caching is basically for each protocol in question, which makes resources available. In practice, but it is used only for HTTP / HTTPS. If a file download abfordert with the browser via FTP, is obtained in this moment a fresh version; however, could hold a proxy server copies of frequently desired files. mailto: sends data and does not call it off. Other protocols for resources do not provide specific support for version management and are now hardly still in use.

User control

In a browser, users generally have the following options to interactively influence on configuration settings or the behavior of the cache:

  • Determining the maximum amount of disk space; at zero is no caching.
  • Generate a new retrieve from the server ( would affect the visible URL in the address bar, single page, or a graph).
  • Current page and all incorporated therein resources ( images, scripts, styles, etc. ) new retrieve (usually would concern only HTML pages).
  • Entire cache, may also selectively only for all resources of a particular domain ( the current page ).
  • At the end ( or failing that, at the beginning ) of each session empty the entire cache.
  • Set maximum age for resources.

Cache management

The following principles apply to the part of the browser or proxy server:

  • No system is obliged to observe any evidence of the origin of the resource.
  • In the interest of an attractive offer, the browser developers are interested to evaluate information about the cache management to represent the pages both fast and up to date.

Extensive inputs were given in RFC 2616 in 1997 in RFC 2068 and 1999, but need not be fully implemented.

Web server

A web server should each resource cache information also supply (metadata) to provide the user with a current representation and to achieve the lowest possible communication overhead for users, such as servers.

The operators of the Web server benefits from the fact that he does not ever have to answer queries for unchanged resources and expend for computer and network capacity.

To obtain the frequency of visited pages and information about the reader, use the other hand very small embedded snippets, where you suitable prevents the absorption in the cache and forces a re -orders based on each displaying the page.

Version identification

For cache management for each resource is optionally used one of two identifiers, unless accompanied by the server:

Absence of such information, then the cache management knows only the time of the last successful retrieval.

Methods

The following methods are available:

  • Unlimited resources left, where the professional hard disk space is sufficient; Delete suitable only when space.
  • Resources that have been used recently, maybe you could soon be needed again. Long time not mentioned resources are to be deleted.
  • Resources, which are often referred, to be retained. Rare and long time did not demand resources are to be deleted.
  • File size - very large, long time and not first delete the whole rare needed resources; for many small resources rather leave.
  • Stability - multiple modification contents are Candidates for Deletion. Over the years and several validation checks across unchanged resources should be kept as they are often used. Short maturities are an indication of volatility, even if the current exceeded the minimum shelf life does not necessarily mean the uselessness.
  • POST data also lead to the URL in most cases mean that such sites can not be accessed repeatable, and are usually not stored in the cache. This would also be quite dangerous, because such information is often changing the content of the landing page and a necessary acknowledgment to the server would not be sent.
  • If in the URL on? a query is detected ( for example in a database query ), algorithms were originally apart from the storage, because by combining the query parameters many different URL without reuse came about. Increasingly, however, of all CMS pages statically presented in this URL format, so that this assumption is no longer reliable.
  • Fields:
  • Expires: with a specific time and
  • Cache-Control: max -age = in seconds as a relative indication
  • The invalidity of the resource causes but not necessarily the deletion from the cache, but only a verification of validity, which can lead to an extension of the validity period without changing the content.
  • Where the declared as Expires time at the moment of the query in the past, this version can not be included in the cache; Information to this URL should be erased.
  • Lack of information from the server to the validity period, can from the time of the last change to be closed, if necessary, from the recorded by caching behavior, if the resource changes frequently or is constant: If last changed three years ago, the resource is probably quite stable; is it just a quarter of an hour old or if it has changed twice during the last day, they would be too short to check for freshness. How exactly cache management deal adequately with missing meta- information, the intelligence of the programmer is left. Plump and time, such as network capacity eating it, each time to retrieve missing information in a large file from the web server would.
  • In the user configuration, a maximum age might have been given about two weeks.
  • Cache-Control: no-cache
  • Cache-Control: max -age = 0
  • The short HEAD information of the resource to the server query ( initially without the full content ), evaluate the result and even then, if necessary, acknowledged also by the content (GET).
  • The known version information ( Last-Modified/ETag ) send to the server. The server responds with either the HTTP status code 304 Not Modified (the version is still valid) or sends a new version ( 200 OK) - at worst, now 404 Not Found.
  • Upon completion of the presentation page are those files for which later re- appears possible, incorporated into the data structure of the cache; all other files are marked as temporary and will be deleted in due course.
  • In particularly sensitive resources ( such as financial transactions, account data) may remain traces on the hard disk; about because the " delete" a file only the distance from the visible file system, does not mean immediately but the physical overwriting. The browser crashes or the computer, files could be worse; even after apparent deletion of the physical hard disk might contain sensitive information that would be readable at registration on the relevant user account.
  • With Cache-Control: no-store marked resources should not cache on the hard drive, but keep only volatile in core memory a browser.
  • However, the concept has a gap: Only rarely, the operating system provides an application the ability to request core memory, the assured is not outsourced to the store page file on the hard disk.

Safety aspects

The cache of a user can draw conclusions which topics are called.

  • A user should use on his computer an individualized cache; for each user account that is a separate and prior to reading by other users protected area. Around 2000/2005 were the hitherto customary central cache directories that are shared by all users of a PC and could be read by any user, replaced by individualized cache whose access rights are limited to the logged- in operating system user, and the are also limited in each case to a user profile of the browser.
  • Some browsers know a " private mode". Here, among other things, an additional cache data structure is created, which takes up all now retrieved resources. The "normal" cache can optionally be used in parallel reading, it may be there but no information is written. With the end of the private mode of the extra cache is cleared.
  • HTTPS, however, has no effect on caching when requesting individual user who knows the URL and decrypted contents yes. However, there are some browsers an individual configuration setting, should not be cached according to the information retrieved via HTTPS. Due to the proliferation of HTTPS for wireless connections allows the use of this protocol but hardly a conclusion on a special confidentiality requirements of the order transmitted pages.

Proxy

Some fields are specifically for proxy server; So any intermediate stations between the browser and the server with the actual origin of the data. The intermediate bodies may on " shared cache " for all participants of the network ( or user of the computer ) frequently queried resources ready.

  • Cache-Control: s- maxage = nSeconds As max-age, but only on shared ("s" = " shared" ) cache.
  • Cache-Control: private Do not save to shared cache.
  • Cache-Control: public Explicitly released for storage on shared cache.

HTTP caching

In summary, the following main fields of the HTTP affect caching - if supplied by the Web server or the foundation was created:

  • Last-Modified: timestamp
  • ETag: version
  • Expires: timestamp
  • Cache-Control: max-age
  • Must- revalidate
  • No-cache
  • No-store
  • Private
  • Public
  • S- maxage
  • And more ...
  • If-Modified -Since for Last-Modified
  • If-None -Match for ETag
  • Pragma: no-cache proxy server bypass

The same information that is transmitted in addition to the content, a web server, can be integrated into an HTML document and overwrite the default data of the server if necessary:

Web Links

References and Notes

  • Memory management
  • Web
149077
de