Uniform Resource Locator

A Uniform Resource Locator ( URL abbr, English for uniform resource locator ) identifies and locates a resource, such as a site about the access method to be used (eg, the network protocol such as HTTP or FTP) and location ( engl. location) of the resource in computer networks. The current version is published as RFC 1738. The RFC specifications are industrial standards of the IETF Internet Foundation.

URLs are a subset of general identification mark by means of Uniform Resource Identifiers ( URIs). Since URLs are the first and most common type of URIs, the terms are often used interchangeably. In common parlance, URLs are also referred to as an Internet address or Web address, where it ( the colloquial frequent equating of Internet and WWW following) usually specially URLs are meant websites.

  • 2.1 scheme
  • 2.2 scheme- specific- part 2.2.1 user / password
  • 2.2.2 host
  • 2.2.3 Port
  • 2.2.5 Query String
  • 2.2.6 fragment
  • 7.1 Name and standardization
  • 7.2 components

List of allowed characters

The following points may, but need not be encoded:

ABCDEFGHIJKLMNOPQRSTU VWXYZ abcdefghijklmnopqrstu vwxyz 0 1 2 3 4 5 6 7 8 9 - _. ~

The following points need to be in certain cases coded with the percent encoding:

! * ' ();: @ & = $, /? % # [ ]

More examples can be found in RFC 3986 and on the http://www.w3.org/Addressing/URL/uri-spec.html page.

Construction

The basic URL structure consists of an access method defining schema name (English scheme ) and a scheme -specific part ( scheme- specific- part), which are separated by a colon:

: which often shadowy, but not necessarily the same reads as the underlying network protocol (for ftp or http this is for example the case, but not at mailto or file).

Possible URL elements are, for example, at http:

Scheme- specific- part → → → |       | http://hans:geheim @ example.org: 80/demo/example.cgi country = de & city = aa # history | | | | | | | | | | | Host | url-path query fragment | | Password port | user scheme ( here the same network protocol ) at mailto:

Mailto: [email protected] | | | E -mail address according to RFC822 ) scheme (in this case no network protocol ) in news ( in this example, neither a network protocol nor a host address is included):

News: alt.hypertext | | | Name of the newsgroup scheme or file:

File :/ / / C :/ foo / bar.txt | | | Path to the file ( locally, ie in the file system of the computer that interprets the URL) scheme Strictly speaking, this scheme has the form file :/ / / , but the host part is practically not used, as the file schema for lack of a way a network protocol for accessing the file indicate no can be used useful over a network. File URLs are used for example in the Java programming language to access a manner to local files. Depending on the browser is often the opening of file links only after special client-side configuration or with the help of add-ons, etc. possible.

Scheme

Specifies the choice of technical method, the resource should be addressed. Is generally, but not necessarily identical to the one used network protocol over which the resource can be localized. Examples are HTTP, HTTPS, or FTP, but also mailto (for writing an e -mail) or file ( to access local files).

Scheme- specific- part

Depending on the scheme are different specific information necessary and possible. In most cases, it starts with the string :/ /, but in some variants, only the colon is defined. The following examples are based on the HTTP protocol.

User / password

If required, a login user name (user) and password (password) are transmitted. These are separated by a colon, precede the host with a separating at sign ( @ ).

Although the HTTP protocol was chosen for this example, the disclosure of your username and password as part of the URL is not part of the HTTP specification! Your browser must accept this URL syntax, but ask the user whether he wants to sign up really with the specified data. The Internet Explorer 6 (Windows XP SP2) and later versions fall here from the frame by flatly reject this URL syntax as faulty. With a registry entry can they be the same behavior force, as shown by the predecessor to version 5.5: This take on the credentials without being asked and transfer them directly to the server.

Some other protocols, such as FTP, specifying the user data in the form shown, however, is entirely correct and covered by the standards.

Host

The host component is separated in the form of an IPv4 address in decimal notation by dots, separated in the form of an IPv6 address in hexadecimal notation with colons and enclosed in brackets or listed in the form of a FQDN.

Port

The specification of the port permits the control of a TCP / IP ports. If no port is specified, the default port for the protocol is used - for example, HTTP 80, HTTPS 443 and FTP 21

The path describes a specific resource ( this may cover, for example with the directory structure of the target system, ie about a file or directory) on the server. The path can also be empty. An empty path can optionally be replaced by a slash and is equivalent to this.

The interpretation (file or directory; deliver text file or script to run ) is left to the server. A typical example of the freedom of interpretation is the behavior at the request of the trail / by a client: Depending on how the server provides about the content of a particular excellent file ( such as / index.html, / README, / HEADER), without this for is the requesting client apparent. Similarly, the server can, however - depending on the protocol - also explicitly refer to this resource or issue a directory listing.

Query string

In the case of HTTP can after the actual resource indicator - separated by a question mark - followed by a query string. This additional information can be transmitted, which can be further processed server-side or client-side.

Fragment

After a double-cross a portion of the resource can be referenced, typically in an anchor in an HTML page is then automatically heruntergescrollt to: The URL " # http://example.com/dokument.html paragraph 3 " would in the fictitious document the initial show of the third paragraph.

Examples

Usage

The acronym URL is always mainly in English usage. On the other hand is often used in the German language with a feminine article, but also with male products. The choice of the genus depends on whether it is formed in accordance with the German translation " address" ( feminine ), or by the rule that words on " or" (here " Locator ", or " identifier") or " he - " ( " identifier ", " locator "," Gazette " ) in German are always masculine.

URLs in texts

RFC 3986, Appendix C, recommends URIs (and thus URLs) in texts

  • Independently on a row,
  • With double quotes " http://example.com/ " or
  • with angle brackets

Delineate on the context and especially against the punctuation of the sentence.

Relative URL

In addition to the "absolute" or "complete" URL shown previously, there is also relative URL. They are valid only within a context from which they "inherit" properties. They lack the local information in the World Wide Web or an intranet true. They are mainly in the group http, https and ftp possible, but even with mailto. This would correspond to a phone number without the area code ( the country, the local network ).

Relative URLs are often used to store a group of related resources available in a local file system or on different sites in different network domains unchanged and each other to link. Moreover, the interpretation of the identifier (character string between host port, and # ) is optional for each server - although it handles the overwhelming number of servers and each standard software as indicated above, however, can / like? % & Be evaluated by their own rules.

In mailto: would be a relative URL mailto: neighbor ( without @ ) - it applies only to the local network.

History

Name and standardization

In the early days of the WWW (from late 1990) was found in the documentation on info.cern.ch first no dedicated name for the addressing of web pages, the issue has been descriptive only as " W3 document address", " W3 name", " W3 address " or" Hyper text Name " documented. The time specified ( and used in the first web pages) form of address equivalent to but even the later as " URL" standardized form; the standardization process, although changes were considered because of the now- advanced distribution of the WWW but rejected again.

In the summer of 1992, tried to establish a working group to study, which should standardize access to documents on the Web Tim Berners -Lee at the IETF meeting in Boston. He suggested as the name Universal Document Identifier ago (UDI ), which is a general Internet standard should be defined according to his imagination. The name was but considered too " arrogant" criticized, mainly on the word " universal" (engl for universal, comprehensive ) was. Instead, (engl for uniform ) was adopted by the group of more modest term " uniform " is proposed. Furthermore, "Document" was replaced by " resource ", to emphasize that the Web should be integrated with other information systems. The URI working group was finally reached, with yet another name change for the definable standard has been decided: "identifier" was replaced by " Locator " to emphasize that it is not to permanently registered addresses in web addresses.

Still informal - - ​​standardization draft RFC 1630 presented in June 1994 by Berners -Lee Due to the litigious Functioning of the group, the first was. He calls the favorite of Berners- Lee called " Universal Resource Identifiers " in the title and already defined the terms URI, URL and URN. In December 1994, published by the group in RFC 1738, the standard for " Uniform Resource Locators ( URL) ".

Components

Berners -Lee borrowed the individual components partly aware of existing systems to new Web addresses users resp possible immediately familiar. to make it appear logical:

  • The path part ( http://www.example.com/foo/bar/baz.html ) directly quoted the path syntax in UNIX file systems.
  • The introduced with a double - slash notation of the host derived from the syntax of the network file system from Apollo Domain / OS, the paths to remote hosts on the pattern / / example.com / foo / bar were / ... addressed.
  • The marked with a hash fragment is the usual spelling in the USA for apartment and suite numbers in mailing addresses borrowed: " 12 Foo Avenue # 34 " stands for " Foo Avenue No. 12, Apartment 34 "; corresponding means foo.html # bar "part ( section, chapter ... ) bar within the document foo.html ".
792505
de