Session Initiation Protocol

The Session Initiation Protocol (SIP ) is a network protocol for setting up, controlling and terminating a communication session between two or more participants. The protocol is specified in RFC 3261 et al. In the IP telephony SIP is a commonly used protocol.


Unlike H.323 which is derived from ITU-T, SIP has been developed by the IETF. H.323 can be greatly simplified referred to as ISDN over IP. While this allowed the particular PBX manufacturers, relatively quick and easy to overhaul its communications on IP networks, on the other hand the strengths and weaknesses of IP networks were not sufficiently taken into account. This is particularly apparent in the context of NAT, especially in the firewalls and end networks (eg DSL routers ) necessary translation of network addresses, which can be achieved in H.323 with much effort.

The design of the SIP, however, is based on the Hypertext Transfer Protocol at ( is this not compatible) and is much better suited for IP networks. The structure of SIP lets you add easily new extensions without all involved devices must understand this. Also, it is more general: While H.323 only intended for telephony, can be managed with SIP sessions of any kind. The "payload " of the session, so the actual data to be transmitted streams can be all streams that can be transmitted over a network. The main application area is found in the audio and video transmission, some online games use to manage the transmission also on SIP back.

To run an internet call, you need more than just SIP, because it only serves to settle or negotiate the communication modalities - the actual data for the communication need through other, for appropriate protocols to be replaced. For this purpose is widely used in SIP, the Session Description Protocol (SDP, RFC 4566, translation from English " session description protocol " is not commonly used ) embedded to negotiate the details of the video and / or audio transmission. Here, the devices share with each other, which methods of video and audio transmission they want (called codecs) protocol with which they want to do and where network address they send and receive control.

This media negotiation is therefore not directly part of SIP, but is achieved by embedding an additional protocol to SIP. This separation of session and media negotiation is one of the advantages of SIP, as it allows great flexibility in the supported payload: Use Want For example, a manufacturer SIP for a specialized application, it may, for designing your own media negotiation, if it is not yet protocol exists.

In the Internet telephony is for the media transfer the Realtime Transport Protocol ( RTP, German Real-time Transport Protocol, RFC 3550 ) use. SIP is here the session, the SDP is embedded media details, and is then the one RTP protocol, which ultimately transmits the video and audio streams.

Subscriber addresses are written in URI format, which is also used in e- mails and Web addresses. Mostly Such a device address follows a following three schemes:

  • Unencrypted SIP connection: sip: user @ domain.
  • Encrypted SIP connection: sips: user @ domain.
  • Phone: tel: number, for example tel: 49-69-1234567. This scheme is mainly used by devices that provide an interface to a "normal " telephone network, and may be converted to a SIP URI, if required, for example, sip: 49-69-1234567 @ domain.

Encryption and Security

The separation of the session and media both data streams can be encrypted independently. You can SIP over TLS protocol, also called SIPS, and encrypt the media stream ( voice data) are also on the SRTP protocol. Any combination of these is possible, but in terms of secure encryption does not make sense.

Purpose of a secure encryption both data streams (ie, meeting and media) must be encrypted. The symmetric keys of the media stream are exchanged via SDP (ie, SIP) and would therefore be vulnerable over an unencrypted SIP. The symmetric keys of TLS are even exchanged at the beginning of the session, but grab here the mechanisms of SSL certificates, in which the symmetric key is encrypted by the asymmetric key of SSL certificates again.

Network elements

  • User Agent
  • Proxy Server
  • Registrar Server
  • Redirect Server
  • Gateway


SIP support is found in many devices from different manufacturers and it seems to become the standard protocol for Voice over IP ( VoIP) to develop. SIP has also been selected by the 3rd Generation Partnership Project (3GPP ) as the protocol for multimedia support in the 3G (UMTS ). The specification of the Next Generation Network ( NGN) during the European Telecommunications Standards Institute (ETSI), the project group Telecommunications and Internet converged Services and Protocols for Advanced Networking ( TISPAN ) is based on SIP.

Pros and Cons

Among the advantages of SIP heard that this is an open standard, which has found broad acceptance. Since SIP servers are distributed, relates to an attack only the provider and not the entire SIP switched telephony. Another advantage of the SIP is the ability to modify an already established session. For this purpose, another INVITE message is sent with the new SDP session properties on the opposite side just within the session. Thus can be added or modified or removed an existing medium, a new medium. The corresponding message will be referred to as re- INVITE request.

A disadvantage of SIP is that it makes use of RTP for transmission of the voice data. UDP ports used for this purpose are dynamically assigned, which makes the use of SIP in conjunction with firewalls or Network Address Translation (NAT, RFC 2663 ) is difficult, since most firewalls or NAT router dynamically assigned ports can not be assigned to the signaling connection. Workaround for this problem provides the use of STUN (Session Traversal Utilities for NAT), NAT router which recognizes and penetrates, but other protocols such as IAX ( InterAsterisk eXchange ). By using the STUN protocol, the IP address and port are determined by the NAT firewall or NAT router goes ( in the public Internet, that is ) to the outside. A much simpler approach this problem is to bypass the proxy server or the called party back directly accesses the IP address and the port used in the IP header, whereby the NAT mechanism engages again without STUN server. IAX combined signaling and media data on a UDP connection. As H.323 IAX is a binary protocol, so the fix is ​​more difficult than with SIP. In addition, IAX is still in the standardization phase.

A newer method of the IETF to solve the NAT traversal problem represents Interactive Connectivity Establishment (ICE ), which is already supported by some SIP clients and usually can be installed via firmware upgrade.

Another solution to the NAT traversal issue are so-called Application Layer Gateways ( ALG) dar. These are intermediate SIP proxies that - installed on a NAT router or a firewall - for smooth transfer of SIP signaling and media streams provide. An ALG can provide in SIP calls automatically for opening the necessary ports on the firewall and mark RTP media streams with DiffServ bits. Thus, the media packets can be transported with a higher priority over IP networks, when a network supports this. The Internet offers basically no prioritization, see net neutrality


From: sip: [email protected], tag = 29ae1249

Max-Forwards: 70

To: sip: [email protected]

Call- ID: 48c7df2a9b4 @ myvoip1

Cseq: 1 INVITE

Contact: sip: [email protected]

Content-Length: 202

Supported: 100rel

Content-Type: application / sdp

O = 1234567890 1234567890 IN IP4 Anonymous

S = SIGMA is the best

C = IN IP4

T = 0 0

M = audio 6006 RTP / AVP 8 3 0

A = rtpmap: 8 PCMA/8000

A = rtpmap: 3 GSM/8000

A = rtpmap: 0 PCMU/8000

From: sip: [email protected], tag = 6248550609-457625817474016

To: , tag = 2e679cbc

Call- ID: 6248550609-781762546450147

Cseq: 15 INVITE

Contact: sip: [email protected]

Content-Length: 191

Content-Type: application / sdp

O = 1234567890 7894561230 IN IP4 Anonymous

S = SIGMA is the best

C = IN IP4

T = 0 0

M = audio 6006 RTP / AVP 8 0

A = rtpmap: 8 PCMA/8000

A = rtpmap: 0 PCMU/8000