Introduction

Over the past 10 years the Session Initiation Protocol (SIP) has moved from the toy of researchers and academics to the de-facto standard for telephony and multimedia services in mobile and fixed networks.

Probably one of the most emotionally fraught discussions in the context of SIP was whether Session Border Controllers (SBC) are good or evil.

SIP was designed with the vision of revolutionizing the way communication services are developed, deployed and operated. Following the end-to-end spirit of the Internet SIP was supposed to turn down the walled gardens of PSTN networks and free communication services from the grip of large telecom operators. By moving the intelligence to the end systems, developers were supposed to be able to develop new communication services that will innovate the way we communicate with each other. This was to be achieved without having to wait for the approval of the various telecommunication standardization groups such as ETSI or the support of incumbent telecoms.

Session border controllers are usually implemented as SIP Back-to-Back User Agents (B2BUA) that are placed between a SIP user agent and a SIP proxy. The SBC then acts as the contact point for both the user agents and the proxy. Thereby the SBC actually breaks the end-to-end behavior of SIP, which has led various people to deem the SBC as an evil incarnation of the old telecom way of thinking. Regardless of this opposition, SBCs have become a central part of any SIP deployment.

In this paper we will first give a brief overview of how SIP works and the features it supports such as NAT traversal, mediation, DoS protection and support for legal requirements.

A more detailed version of the paper is available under out web page.

A Short Introduction to SIP

By the mid nineties the IETF, which is playing the role of the standards organization of the Internet, had already produced different protocols needed for IP-based telephony services. The Real-Time Transport Protocol (RTP) [1] enabled the exchange of audio and video data. The Session Description Protocol (SDP) [2] enabled the negotiation and description of multimedia data to be used in communication session.

The Session Initiation Protocol (SIP) [3] was the attempt of the IETF community to provide a signaling protocol that will not only enable phone calls but can be also used for initiating any kind of communication sessions. Hence, SIP can be used for VoIP just as well as for setting up a gaming session or controlling a coffee machine.

The SIP specifications describe three types of components: user agents (UA), proxies and registrar servers. The UA can be the VoIP application used by the user, e.g., the VoIP phone or software application. A VoIP gateway, which enables VoIP users to communicate with users in the public switched network (PSTN) or an application server, e.g., multi-party conferencing server or a voicemail server are also implemented as user agents.

The registrar server maintains a location database that binds the users’ VoIP addresses to their current IP addresses.

The proxy provides the routing logic of the VoIP service. When a proxy receives a SIP request from a user agent or another proxy it also conducts service specific logic, such as checking the user’s profile and whether the user is allowed to use the requested services. The proxy then either forwards the request to another proxy or to another user agent or rejects the request by sending a negative response.

With regard to the SIP messages we distinguish between requests and responses. The INVITE request is used to initiate a dialog between two users. A BYE request is used for terminating this dialog. Responses can either be final or provisional. Final responses can indicate that a request was successfully received and processed by the destination. Alternatively, a final response can indicate that the request could not be processed by the destination or by some proxy in between or that the session could not be established for some reason. Provisional responses indicate that the session establishment is in progress, e.g. the destination phone is ringing.

In this paper we distinguish three types of SIP message exchanges, namely registrations, dialogs and out of dialog transactions.

A SIP registration enables a user agent to register its current address, IP address for example, at the registrar. This enables the registrar to establish a correlation between the user agent’s permanent address, e.g. sip:[email protected], and the user agent’s current address. In order to keep this correlation up to date the user agent will have to repeatedly refresh the registration. The registrar will then delete a registration that is not refreshed for a while.

A SIP dialog, a call for example, usually consists of a session initiation phase in which the caller generates an INVITE that is responded to with provisional and final responses. The session initiation phase is terminated with an ACK. A dialog is terminated with a BYE transaction. Depending on the call scenario the caller and callee might exchange a number of in-dialog requests such as reINVITEs or REFER.

The last type of SIP interactions is SIP transactions that are not generated as part of a dialog. These out of dialog messages can be observed when the SUBSCRIBE and NOTIFY requests are exchanged between two SIP user agents. This is the case when a SIP node wants to be informed about a certain event. In this case this node sends a SUBSCRIBE request to the server in charge of this event. Once this event occurs, the server will send a NOTIFY request to the SIP node carrying information about the event. Other out of dialog SIP requests include OPTIONS and INFO that are often used for exchanging information between SIP nodes or as an application level heartbeat.

What Do SBCs Do?

Since their introduction nearly 10 years ago, SBCs have been increasingly used to accomplish an increasing set of requirements [4]. This section will start with a brief why SBCs emerged and an overview of the general behavior of SBCs followed by a more detailed look on how an SBC provides different features such as NAT traversal or denial of service protection.

General Behavior of SBCs

SBCs come in all kinds of shapes and forms and are used by operators and enterprises to achieve different goals. Actually even the same SBC implementation might act differently depending on its configuration and the use case. Hence, it is not easily possible to describe an exact SBC behavior that would apply to all SBC implementations. However, in general one we can still identify certain features that are common for most of SBCs. For example, most SBCs are implemented as “Back-to-Back User Agent” (B2BUA).

A B2BUA is a proxy-like server that splits a SIP transaction in two pieces: on the side facing User Agent Client, it acts as server; on the side facing User Agent Server it acts as s client. While a proxy usually keeps only state information related to active transactions, B2BUAs keep state information about active dialogs, e.g., calls. That is, once a proxy receives a SIP request it will save some state information. Once the transaction is over, e.g., after receiving a response, the state information will soon after be deleted. A B2BUA will maintain state information for active calls and only delete this information once the call is terminated.

The SBC acts as a B2BUA that behaves as a user agent server towards the caller and as user agent client towards the callee. In this sense, the SBC actually terminates that call that was generated by the caller and starts a new call towards the callee. The INVITE message sent by the SBC contains no longer a clear reference to the caller. The INVITE sent by the SBC to the proxy includes Via and Contact headers that point to the SBC itself and not the caller. SBCs often also manipulate the dialog identification information listed in the Call-Id and From tag. Further, in case the SBC is configured to also control the media traffic then the SBC also changes the media addressing information included in the c and m lines of the SDP body. Thereby, not only all SIP messages will traverse the SBC but also all audio and video packets. As the INVITE sent by the SBC establishes a new dialog, the SBC also manipulates the message sequence number (CSeq) as well the Max-Forwards value.

Topology Hiding

As the result of a SIP session establishment the involved end points will know the IP addresses of where to send and receive media traffic. This means that a user using SIP for calling a PSTN number will know the IP address of the PSTN gateway that is responsible for bridging the VoIP service with the PSTN. Further, during the session establishment phase all the involved proxies will include their addresses in the Via headers.

A malicious user could use this information to either attack an operator’s proxies or even get access to the PSTN gateways directly. By having the ability to contact the PSTN gateways directly, an attacker might be able to misuse any security holes that might exist at the PSTN gateway. This allows the attacker to initiate calls to the PSTN with the costs being incurred on the operator.

To hide the internal components of an operator, all messages leaving the operator’s network would traverse an SBC. The SBC replaces the addresses of internal components with its own. Hence, headers such as Contact, Via, Record-Route, Route and so on would include the SBCs address only.

NAT-Traversal Support

Network Address Translators (NAT) are used to overcome the lack of IPv4 address availability by hiding an enterprise or even an operator’s network behind one or few IP addresses. The devices behind the NAT use private IP addresses that are not routable in the public Internet.

In case a user agent is located behind a NAT then it will use a private IP address as its contact address in the Contact and Via headers as well as the SDP part. This information would then be useless for anyone trying to contact this user agent from the public Internet.

There are different NAT traversal solutions such as STUN and ICE. Which solution to use depends on the behavior of the NAT and the call scenario. When using an SBC to solve the NAT traversal issues the most common approach for SBC is to act as the public interface of the user agents. This is achieved by replacing the user agent’s contact information with those of the SBC.

In order for a user agent to be reachable through the public interfaces of an SBC, the SBC will manipulate the registration information of the user agent. The user includes its private IP address as its contact information in the REGISTER requests. Calls to this address will fail, since it is not publicly routable. The SBC replaces the information in the Contact header with its own IP address. This is the information that is then registered at the registrar. Calls destined to the user will then be directed to the SBC. In order for the SBC to know which user agent is actually being contacted the SBC can keep a local copy of the user agent’s registration. The local copy includes the private IP address and the user’s SIP URI as well as the public IP address included in the IP header that was assigned to the SIP message by the NAT.

Similar approach is used for enabling the exchange of media. Instead of sending media to the IP address and port number advertised in the SIP SDP bodies, SBCs send media for a user agent symmetrically back to where the agent has sent its own media from. This symmetric communication typically works because it is the traffic pattern NAT manufactures have been used to before the arrival of VoIP.

It is important to know that while this mostly works, it has several limitations. First of all, it only works with clients that are built “symmetric way”, i.e., they use the same port for sending and receiving media. Nowadays that’s fortunately the majority of available equipment.

The other noticeable disadvantage is “triangular routing”: an SBC must relay all VoIP traffic for a call, to make the paths caller-SBC and SBC-callee symmetric. That is in fact quite an overhead for a VoIP operator. With the most common codec, G.711, a relayed call consumes four 87.2 kbps streams: two outbound, two inbound.

Denial of Service and Overload Protection

Like any other Internet-based service VoIP servers can be the target of denial of service attacks.

Attacks can be disguised as legitimate VoIP traffic so distinguishing between a denial of service attack or a sudden surge in traffic due to some event is not always possible. Hence, VoIP operators need to incorporate mechanisms that monitor the load and the incoming traffic, identify the overloaded resources and the cause of the overload and react in a manner that will prevent a complete service interruption.

In order to keep the malicious traffic and overload away from the core servers, e.g. applications servers, proxies and PSTN gateways, there might be protection mechanisms located at the SBCs. In this context one can often find SBCs offering some or all of the following features:

Regulatory Features

With the increased success of VoIP services, providers of VoIP services will have to consider an issue that the Internet has managed to successfully ignore for a long time, namely legal regulations. The traditional telecom market is one of the most regulated market segments. Current regulations describe in great detail how an emergency call must be dealt with in the network and how to intercept the call of a wrong doer.

To be able to support lawful interception an operator requires access to both signaling and media traffic. VoIP providers that do not offer IP access have only access to the signaling information. By using an SBC for controlling both signaling and media packets the operator has an obvious node for supporting lawful interception.

Access Control and Fraud Prevention

As the name already implies, SBCs are tasked with controlling which users and what messages can cross the borders of a VoIP infrastructure and use the offered VoIP services. Most SBCs will offer most if not all of the following mechanisms:

Interoperability Mediation

There are different standardization groups working on SIP. Different developers often interpret the same specifications differently. This means that interoperability between SIP products of different vendors is unfortunately not always guaranteed.

SBCs often have the capability to overcome some of these interoperability problems by manipulating the content of SIP messages so that they better fit the expectations of the receiving side. One can distinguish between three interoperability issues; namely SIP flavors, SIP content and transport protocols.

SIP is being used in both mobile and fixed networks as well as a transition protocol in the 3GPP R4 release. In the ISP environment, SIP as was specified by the IETF is used mostly. In the fixed environment, the TISPAN specifications are used. In the mobile network environment the 3GPP IMS specifications are the most favored. SIP-I is proposed for trunking scenarios in which SIP is used as the signaling protocol used to connect SS7 based networks over an IP core network.

Besides the differences in the SIP headers, SIP-I adds another body type to the SIP message; namely an ISUP part, which is added by a PSTN gateway after generating a SIP message from an incoming SS7 message. This ISUP body is then used by the receiving PSTN gateway for reconstructing the SS7 signaling messages towards the other part of the call.

In the context of interoperability of SIP flavors, SBCs can provide the following services:

Media Transcoding

Especially on the borders between fixed and mobile networks there might be some need for transcoding the audio or video compression system from one format to the other. Different SBCs offer the possibility to integrate specialized transcoding hardware. For the case when the expected need for transcoding is low, some SBCs offer software based transcoding solutions.

Last Words

Since their first introduction over 10 years ago SBCs have considerably gained in scope and capabilities. SBC of the first generation were dedicated devices with often off the shelf hardware that had the sole purpose of establishing a secure border between the subscribers and the operator’s PSTN gateways. These SBCs supported mainly NAT traversal and topology hiding. The second generation of SBCs offered a wider range of features including transcoding, support for more complex call flows as well as video communication.

Offering features such as DoS and overload prevention, IPSEC support and monitoring and fraud prevention solutions enhanced the security capabilities of SBCs. In addition to the UNI SBCs vendors started offering solutions for NNI and enterprise scenarios as well. To increase the performance and scalability of SBCs vendors started using dedicated hardware and to decompose an SBC into signaling and media control components that communicate with each other using a protocol like MEGACO. This decomposition allows operators to scale the signaling and media handling capabilities of an SBC independently.

We are currently seeing the third generation of SBCs. Vendors are starting to offer SBCs no longer only as a closed box but as a virtual machine that can be installed on the operator’s hardware or in a cloud. Further, SBCs are expected to offer open interfaces to enable a smooth integration into the operator’s multimedia service infrastructure.

Already the first generations of SBCs supported multi-protocol communication by supporting both SIP and H.323. The next generation of SBCs will enhance this feature by acting as a bridge between the emerging WebRTC implementations and SIP. Furthermore, in order to offer improved support for mobile users, SBCs will support integration with Apple and Google notification systems. This would enable mobile devices to remain in sleep mode but still be reachable to the rest of the world.

Acronyms

3GPP: 3rd Generation Partnership Project

B2BUA: Back to Back User Agent

IMS: IP Multimedia Subsystem

IP: Internet Protocol

ISUP: ISDN User Part

NAT: Network Address Translator

NNI: Network-Network Interface

PBX: Private Exchange

PSTN: Public Switched Telecommunication Network

RTP: Real-Time Transport Protocol

SBC: Session Border Controller

SAP: Session announcement Protocol

SCTP: Stream Control Transport Protocol

SDP: Session Description Protocol

SIP: Session Initiation Protocol

TCP: Transport Control Protocol

TISPAN: Telecommunications and Internet converged Services and Protocols for Advanced Networking

TLS: Transport Level Security

UAC: User Agent Client

UAS: User Agent Server

UDP: User Datagram Protocol

UNI: User-Network Interface

URI: Universal Resource Indicator

VoIP: Voice over IP

References

  1. Schulzrinne, H.; Casner, S.; Frederick, R.; Jacobson, V. “RTP: A Transport Protocol for Real-Time Applications (RFC1889)”, IETF, 1996
  2. Handley, Mark; Van Jacobson. “SDP: Session Description Protocol (RFC 2327), IETF, 1998
  3. J. Rosenberg; H. Schulzrinne; G. Camarillo; A. Johnston; J. Peterson; R. Sparks; M. Handley and E. Schooler. “SIP: Session Initiation Protocol (RFC 3261)” IETF, 2002.
  4. J. Hautakorpi, G. Camarillo, R. Penfield, A. Hawrylyshen, M. Bhatia, “Requirements from Session Initiation Protocol (SIP) Session Border Control (SBC) Deployments”, RFC5853, IETF, 2010