Uniform Resource Identifier

A Uniform Resource Identifier ( URI abbr, English for uniform resource identifier ) is an identifier and is a string that is used to identify an abstract or physical resource. URIs are used for the identification of resources ( such as web pages, other files, calling web services, such as e- mail recipients ) on the Internet and they are especially used in the WWW. The current version is published as RFC 3986.

Originally, Tim Berners- Lee the term in 1994 in RFC 1630 as a Universal Resource Identifier. It was only later appeared in official W3C documents, the resolution on uniform. For this reason, Universal will occasionally - called the first part of their name - even in the technical literature.

URIs can as a string (encoded with a character set ) into digital documents, in particular those involved are written in HTML format or by hand on paper. A reference from a website to another is called a hyperlink or simply " link".

  • 3.1 Absolute URIs
  • 3.2 Relative Reference
  • 3.3 Reference within the same document
  • 3.4 Suffix References

Conception

A URI (or in the extension IRI) is the abstract principle ( = syntax ) of a marking in which a set of rules is fixed. This basic concept of the URI is then transferred to various specific applications, then the relevant rules and terms apply to.

  • Examples: " URI can not contain spaces. " Or "At the beginning, the name of a schema in ASCII letters and digits, optionally divided by point and hyphen -minus, starting with letter, followed by a colon followed. "

There are basically three types of applications:

  • Name The content of a resource (and each copy the same content ) is given a unique identifier.
  • For example, the ISBN of a book. There are an unlimited number of copies of this book.
  • A resource is situates about her name, so identified as to where they may be found; however, it is not necessarily a defined their content.
  • Example: Current weather on the internet. It is known, at which location (URL) to find that; the content is constantly changing.
  • Example: A book is described by which library it is; there in the second space, third shelf, fourth tray from the top, fifth book from the left. There, the current Top 5 bestseller list could stand. Whatever may be the content.
  • The rules of the URI may also be used, if anything at all is not a classical resource, and be used to identify the same.
  • First was understood as a "resource" something like resources in the information technology sense, ie in the broadest sense electronic files that may be made available on the internet. Of these, 1994 from the RFC 1630 and RFC 1738.
  • But this concept was expanded; so in the RFC 2396 (section 1.1) was in 1998 been defined: A resource can be anything did Has identity. Even human beings, organizations, and printed books could be considered as a resource in short, is about the labeling assignment viable entities.
  • Examples: e- mail address, number of a mobile phone, passport and the legitimate owner, social security number, fingerprint, and the man to do so.

In January 2005, 3986 within the meaning of the URI also extends the concept of resource to abstract concepts with RFC:

"A resource is not Necessarily accessible via the Internet; eg, human beings, corporations, and bound books in a library can thus be resources. Likewise, abstract concepts can be resources, : such as the operators and operands of a mathematical equation, the types of a relationship (eg, 'parent' or ' employee '), or numeric values ​​( eg, zero, one, and infinity ). "

Construction

According to the current standard RFC 3986 a URI of five parts: scheme ( scheme or protocol), authority ( provider or server), path (path), query ( query ) and fragment ( part ), of which only vaguely and in each path URI must be present. The generic syntax is

URI = scheme ": " here - part [ "?" query] [" #" fragment ] Here, here -part ( hierarchical part ) represents an optional authority and the path. If the indication an authority required to upgrade the resource to locate, ultimately, it is initiated by a double slash and begin the subsequent path must start with a slash. The standard clarifies these components with two examples:

Foo :/ / example.com: 8042/over/there name = ferret # nose?    \ _ / \ ______________ / \ _________ / \ _________ / \ __ /     | | | | | scheme authority path query fragment     | _____________________ | __    / \ / \    urn: example: animal: ferret: nose Scheme

The scheme (the part before the colon ) defines the context and thus indicating the type of the URI, which defines the interpretation of the following part. Known schemes are for example the protocols http and ftp, as well as notation concepts such urn and doi. With the colon ends the first mandatory part of the URI. If there is no reference to a name, the management organizing ( active ) authority, it follows directly on this double point of the path to the localization of the resource.

Authority

Many URI schemes such as http or ftp have an authority- part. The term authority refers to an instance that can centrally manage the names in this ( specified by the schema of interpretation ) space. An example is the domain name system, which is managed by global and local registrars.

The authority consists of an optional user information (followed by a "@"), the host and an optional ( initiated by a colon ) port specification. It follows two slashes ("/ /") and is powered by a single slash ( "/"), a question mark ("?" ), A hash ( "#") or the end of the URI limited. The host part may consist of an IP address, an IPv6 address are available ( in square brackets " [ ... ]") or a registered name. Valid values ​​are, for example,

The potential of user name and password in the user information ( "user: password @ ...") is referred to in RFC 3986 ( section 3.2.1) as obsolete and should no longer be used, since URIs are often transferred and recorded in plain text.

Path

The path contains - often hierarchically organized - details which identify together with the query part of a resource. If an authority described in the previous section was specified in the URI, the path starts with a slash ("/" ) shall commence; there is no authority, the path must not start with a double slash ("/ /") begin. Thus, the unambiguous interpretation is backed up. He is survived by a question mark ("?" ), A hash ( "#") or the end of the URI limited. Valid paths are, for example,

  • / over / there
  • Example: animal: ferret: nose

Query

The query part (Query String) includes data for the identification of such resources, which can not be located exactly by the path, but from the source identified by the path through this same query such as a data set to be retrieved from a database. He begins with a question mark ( "?") And by the number sign ( "#") or limited to the end of the URI. A valid query, for example,

  • ? title = Uniform_Resource_Identifier & action = submit

Play Here "&" and "=" in about the same role as and "." " " In the field or part of the authority.

Fragment

Fragment is the optional fragment identifier and references a location within a resource. The fragment identifier always refers only to the immediately preceding part of the URI and is used by a number sign ( "#") initiated. An example is the HTML anchor.

Examples

An example with very many elements simultaneously in the URI:

  • Http://nobody:password @ example.org: 8080/cgi-bin/script.php action = submit & pageid = 86,392,001 # section_2?

URI references

Often applications do not use the full URI, but an abbreviated syntax, for example, to save space or to make it easy to move to other servers. Some URI schemes limit in its definition also the syntax of a particular shape. Under the concept of URI References, different notations are summarized.

Absolute URIs

An absolute URI consists of at least scheme and here -part ( ie an authority and / or a path). Examples are

Relative reference

When a URI reference does not begin with a scheme, it is assumed that there is a relative reference. The resolution of a relative reference to an absolute URI is done depends on the context according to standardized rules. A relative reference consists of a path and an optional query and fragment from. It can be distinguished from relative references three types:

  • If the path begins with no slash, it is a relative path reference, for example image.png. / Image.png and .. / images / image.png.
  • If the path begins with a single slash ( "/"), it is an absolute path reference.
  • If the path begins with double slashes ("/ /"), is a network -path reference.

Reference in the same document

URI references can refer to the same document, which they are part. The most common application is the hash mark ( "#") followed by a fragment identifier.

Suffix - References

Schemes

Among other things, the following schemes are defined:

On the website of IANA, there is a full list of official schemes.

In addition, some unofficial schemes for individual applications or common protocols have been established:

Subspecies

We distinguish the following sub- types of URIs:

Originally each URI should be divided into one of these two classes ( or other yet to be defined ). This strict division was abandoned because it is unnecessary and some schemes ( such as data or the earlier the URLs associated mailto ) fit into any of the two classes.

159653
de