Over the past 10 years the Session Initiation Protocol (SIP) has moved from the toy of researchers and academics to the de-facto standard for telephony and multimedia services in mobile and fixed networks.
Probably one of the most emotionally fraught discussions in the context of SIP was whether Session Border Controllers (SBC) are good or evil.
SIP was designed with the vision of revolutionizing the way communication services are developed, deployed and operated. Following the end-to-end spirit of the Internet SIP was supposed to turn down the walled gardens of PSTN networks and free communication services from the grip of large telecom operators. By moving the intelligence to the end systems, developers were supposed to be able to develop new communication services that will innovate the way we communicate with each other. This was to be achieved without having to wait for the approval of the various telecommunication standardization groups such as ETSI or the support of incumbent telecoms.
Session border controllers are usually implemented as SIP Back-to-Back User Agents (B2BUA) that are placed between a SIP user agent and a SIP proxy. The SBC then acts as the contact point for both the user agents and the proxy. Thereby the SBC actually breaks the end-to-end behavior of SIP, which has led various people to deem the SBC as an evil incarnation of the old telecom way of thinking. Regardless of this opposition, SBCs have become a central part of any SIP deployment.
In this paper we will first give a brief overview of how SIP works and the features it supports such as NAT traversal, mediation, DoS protection and support for legal requirements.
A more detailed version of the paper is available under out web page.
A Short Introduction to SIP
By the mid nineties the IETF, which is playing the role of the standards organization of the Internet, had already produced different protocols needed for IP-based telephony services. The Real-Time Transport Protocol (RTP)  enabled the exchange of audio and video data. The Session Description Protocol (SDP)  enabled the negotiation and description of multimedia data to be used in communication session.
The Session Initiation Protocol (SIP)  was the attempt of the IETF community to provide a signaling protocol that will not only enable phone calls but can be also used for initiating any kind of communication sessions. Hence, SIP can be used for VoIP just as well as for setting up a gaming session or controlling a coffee machine.
The SIP specifications describe three types of components: user agents (UA), proxies and registrar servers. The UA can be the VoIP application used by the user, e.g., the VoIP phone or software application. A VoIP gateway, which enables VoIP users to communicate with users in the public switched network (PSTN) or an application server, e.g., multi-party conferencing server or a voicemail server are also implemented as user agents.
The registrar server maintains a location database that binds the users’ VoIP addresses to their current IP addresses.
The proxy provides the routing logic of the VoIP service. When a proxy receives a SIP request from a user agent or another proxy it also conducts service specific logic, such as checking the user’s profile and whether the user is allowed to use the requested services. The proxy then either forwards the request to another proxy or to another user agent or rejects the request by sending a negative response.
With regard to the SIP messages we distinguish between requests and responses. The INVITE request is used to initiate a dialog between two users. A BYE request is used for terminating this dialog. Responses can either be final or provisional. Final responses can indicate that a request was successfully received and processed by the destination. Alternatively, a final response can indicate that the request could not be processed by the destination or by some proxy in between or that the session could not be established for some reason. Provisional responses indicate that the session establishment is in progress, e.g. the destination phone is ringing.
In this paper we distinguish three types of SIP message exchanges, namely registrations, dialogs and out of dialog transactions.
A SIP registration enables a user agent to register its current address, IP address for example, at the registrar. This enables the registrar to establish a correlation between the user agent’s permanent address, e.g. sip:[email protected], and the user agent’s current address. In order to keep this correlation up to date the user agent will have to repeatedly refresh the registration. The registrar will then delete a registration that is not refreshed for a while.
A SIP dialog, a call for example, usually consists of a session initiation phase in which the caller generates an INVITE that is responded to with provisional and final responses. The session initiation phase is terminated with an ACK. A dialog is terminated with a BYE transaction. Depending on the call scenario the caller and callee might exchange a number of in-dialog requests such as reINVITEs or REFER.
The last type of SIP interactions is SIP transactions that are not generated as part of a dialog. These out of dialog messages can be observed when the SUBSCRIBE and NOTIFY requests are exchanged between two SIP user agents. This is the case when a SIP node wants to be informed about a certain event. In this case this node sends a SUBSCRIBE request to the server in charge of this event. Once this event occurs, the server will send a NOTIFY request to the SIP node carrying information about the event. Other out of dialog SIP requests include OPTIONS and INFO that are often used for exchanging information between SIP nodes or as an application level heartbeat.
What Do SBCs Do?
Since their introduction nearly 10 years ago, SBCs have been increasingly used to accomplish an increasing set of requirements . This section will start with a brief why SBCs emerged and an overview of the general behavior of SBCs followed by a more detailed look on how an SBC provides different features such as NAT traversal or denial of service protection.
General Behavior of SBCs
SBCs come in all kinds of shapes and forms and are used by operators and enterprises to achieve different goals. Actually even the same SBC implementation might act differently depending on its configuration and the use case. Hence, it is not easily possible to describe an exact SBC behavior that would apply to all SBC implementations. However, in general one we can still identify certain features that are common for most of SBCs. For example, most SBCs are implemented as “Back-to-Back User Agent” (B2BUA).
A B2BUA is a proxy-like server that splits a SIP transaction in two pieces: on the side facing User Agent Client, it acts as server; on the side facing User Agent Server it acts as s client. While a proxy usually keeps only state information related to active transactions, B2BUAs keep state information about active dialogs, e.g., calls. That is, once a proxy receives a SIP request it will save some state information. Once the transaction is over, e.g., after receiving a response, the state information will soon after be deleted. A B2BUA will maintain state information for active calls and only delete this information once the call is terminated.
The SBC acts as a B2BUA that behaves as a user agent server towards the caller and as user agent client towards the callee. In this sense, the SBC actually terminates that call that was generated by the caller and starts a new call towards the callee. The INVITE message sent by the SBC contains no longer a clear reference to the caller. The INVITE sent by the SBC to the proxy includes Via and Contact headers that point to the SBC itself and not the caller. SBCs often also manipulate the dialog identification information listed in the Call-Id and From tag. Further, in case the SBC is configured to also control the media traffic then the SBC also changes the media addressing information included in the c and m lines of the SDP body. Thereby, not only all SIP messages will traverse the SBC but also all audio and video packets. As the INVITE sent by the SBC establishes a new dialog, the SBC also manipulates the message sequence number (CSeq) as well the Max-Forwards value.
As the result of a SIP session establishment the involved end points will know the IP addresses of where to send and receive media traffic. This means that a user using SIP for calling a PSTN number will know the IP address of the PSTN gateway that is responsible for bridging the VoIP service with the PSTN. Further, during the session establishment phase all the involved proxies will include their addresses in the Via headers.
A malicious user could use this information to either attack an operator’s proxies or even get access to the PSTN gateways directly. By having the ability to contact the PSTN gateways directly, an attacker might be able to misuse any security holes that might exist at the PSTN gateway. This allows the attacker to initiate calls to the PSTN with the costs being incurred on the operator.
To hide the internal components of an operator, all messages leaving the operator’s network would traverse an SBC. The SBC replaces the addresses of internal components with its own. Hence, headers such as Contact, Via, Record-Route, Route and so on would include the SBCs address only.
Network Address Translators (NAT) are used to overcome the lack of IPv4 address availability by hiding an enterprise or even an operator’s network behind one or few IP addresses. The devices behind the NAT use private IP addresses that are not routable in the public Internet.
In case a user agent is located behind a NAT then it will use a private IP address as its contact address in the Contact and Via headers as well as the SDP part. This information would then be useless for anyone trying to contact this user agent from the public Internet.
There are different NAT traversal solutions such as STUN and ICE. Which solution to use depends on the behavior of the NAT and the call scenario. When using an SBC to solve the NAT traversal issues the most common approach for SBC is to act as the public interface of the user agents. This is achieved by replacing the user agent’s contact information with those of the SBC.
In order for a user agent to be reachable through the public interfaces of an SBC, the SBC will manipulate the registration information of the user agent. The user includes its private IP address as its contact information in the REGISTER requests. Calls to this address will fail, since it is not publicly routable. The SBC replaces the information in the Contact header with its own IP address. This is the information that is then registered at the registrar. Calls destined to the user will then be directed to the SBC. In order for the SBC to know which user agent is actually being contacted the SBC can keep a local copy of the user agent’s registration. The local copy includes the private IP address and the user’s SIP URI as well as the public IP address included in the IP header that was assigned to the SIP message by the NAT.
Similar approach is used for enabling the exchange of media. Instead of sending media to the IP address and port number advertised in the SIP SDP bodies, SBCs send media for a user agent symmetrically back to where the agent has sent its own media from. This symmetric communication typically works because it is the traffic pattern NAT manufactures have been used to before the arrival of VoIP.
It is important to know that while this mostly works, it has several limitations. First of all, it only works with clients that are built “symmetric way”, i.e., they use the same port for sending and receiving media. Nowadays that’s fortunately the majority of available equipment.
The other noticeable disadvantage is “triangular routing”: an SBC must relay all VoIP traffic for a call, to make the paths caller-SBC and SBC-callee symmetric. That is in fact quite an overhead for a VoIP operator. With the most common codec, G.711, a relayed call consumes four 87.2 kbps streams: two outbound, two inbound.
Denial of Service and Overload Protection
Like any other Internet-based service VoIP servers can be the target of denial of service attacks.
Attacks can be disguised as legitimate VoIP traffic so distinguishing between a denial of service attack or a sudden surge in traffic due to some event is not always possible. Hence, VoIP operators need to incorporate mechanisms that monitor the load and the incoming traffic, identify the overloaded resources and the cause of the overload and react in a manner that will prevent a complete service interruption.
In order to keep the malicious traffic and overload away from the core servers, e.g. applications servers, proxies and PSTN gateways, there might be protection mechanisms located at the SBCs. In this context one can often find SBCs offering some or all of the following features:
- Traffic limitation: Operators can limit the rate of incoming calls and registrations. Once these limits are exceeded, the SBC starts rejecting messages arriving in excess of these limits.
- Dynamic blacklisting: Static blacklists are usually used to drop traffic from certain sources without having to process it first. However, not all possible malicious sources are known in advance. Therefore, SBCs often monitor the incoming traffic and if certain characteristics were identified then user agents are dynamically added to a blacklist. These characteristics can be the number of messages sent by a source over a period of time, the content of the messages or the distribution of the called destinations -e.g., a source that calls a lot of different destination in a row is very likely to be scanning the network in search for a destination with some weakness. Once a source is blacklisted all messages from that source would be rejected or dropped.
- Content filtering: An attacker could try to get access to some protected resources by launching an SQL injection attack or try to bring a server down by sending SIP messages with malformatted content. By analyzing the content of incoming SIP messages and rejecting messages that seem to include malicious content, the SBCs can protect the core components of the network.
- Caller prioritization: Customers of a VoIP service expect that their provider will still handle their calls even under overload or attack scenarios. To achieve this an SBC can identify calls generated by registered customers of the operator by keeping a local registration database. Under overload scenarios the SBC would then only accept calls originating from registered users.
With the increased success of VoIP services, providers of VoIP services will have to consider an issue that the Internet has managed to successfully ignore for a long time, namely legal regulations. The traditional telecom market is one of the most regulated market segments. Current regulations describe in great detail how an emergency call must be dealt with in the network and how to intercept the call of a wrong doer.
To be able to support lawful interception an operator requires access to both signaling and media traffic. VoIP providers that do not offer IP access have only access to the signaling information. By using an SBC for controlling both signaling and media packets the operator has an obvious node for supporting lawful interception.
Access Control and Fraud Prevention
As the name already implies, SBCs are tasked with controlling which users and what messages can cross the borders of a VoIP infrastructure and use the offered VoIP services. Most SBCs will offer most if not all of the following mechanisms:
- White/Blacklists: By maintaining lists of trusted and untrusted users and sources an SBC can easily determine whether a certain message should be accepted or rejected without further processing.
- Media control: SBCs often replace the addresses included in the SDP parts with their own. On the one hand this is needed for supporting NAT traversal. On the other hand this enables the SBC to ensure that only users that have successfully established a call -e.g., their INVITE requests were accepted by the callee- are allowed to send media traffic. This way an SBC can prevent a malicious user from contacting a PSTN gateway or an application server directly.
- Fraud prevention: Prices for a flat rate service are determined based on a certain expected user behavior. However, operators often face the case that a user subscribes for a flat rate telephony residential service but then starts reselling telephony minutes. This kind of behavior causes financial losses to the operator and overloads the network. To suppress this fraud possibility, operators can use SBCs to limit the number of parallel calls generated by a user as well as the duration and frequency of calls.
There are different standardization groups working on SIP. Different developers often interpret the same specifications differently. This means that interoperability between SIP products of different vendors is unfortunately not always guaranteed.
SBCs often have the capability to overcome some of these interoperability problems by manipulating the content of SIP messages so that they better fit the expectations of the receiving side. One can distinguish between three interoperability issues; namely SIP flavors, SIP content and transport protocols.
SIP is being used in both mobile and fixed networks as well as a transition protocol in the 3GPP R4 release. In the ISP environment, SIP as was specified by the IETF is used mostly. In the fixed environment, the TISPAN specifications are used. In the mobile network environment the 3GPP IMS specifications are the most favored. SIP-I is proposed for trunking scenarios in which SIP is used as the signaling protocol used to connect SS7 based networks over an IP core network.
Besides the differences in the SIP headers, SIP-I adds another body type to the SIP message; namely an ISUP part, which is added by a PSTN gateway after generating a SIP message from an incoming SS7 message. This ISUP body is then used by the receiving PSTN gateway for reconstructing the SS7 signaling messages towards the other part of the call.
In the context of interoperability of SIP flavors, SBCs can provide the following services:
- Stateless SIP header manipulation: An SBC can be configured to remove certain headers and add others. This way, an SBC can for example delete headers that are useful in an IMS or TISPAN but not in an IETF SIP environment.
- Statefull message handling: Different SIP based deployments might expect different call flows. So while an ISP using SIP according to the IETF RFC3261 specification a mobile operator might be deploying the IMS specifications. One of the major differences between the two specifications is that IMS deployments heavily rely on provisional acknowledgments. (PRACK) -that is a user agent server sending a provisional response expects an acknowledgement from the user agent client that the response was correctly received. As the capability of generating PRACK requests is not widely used in IETF based deployments an SBC on the border between the ISP and the mobile operator could mediate between the two call flows by generating the appropriate PRACK requests.
- Message blocking: Certain SIP messages might be useful in one network as they provide a certain service. However, if this service is not provided across the interconnection points then exchanging them across the networks does not make sense. SBCs can be configured to reject certain messages such as NOTIFY if presence services are not provided across the network for example.
- SIP-I to SIP manipulation: SIP-I requests carry an ISUP part as part of the SIP body. This could cause problems for SIP components that do not understand ISUP and do not expect to see such information in a SIP message. Some SBCs can overcome this issue by removing the ISUP part when forwarding a message to the SIP side of the communication and adding the appropriate ISUP body before forwarding a message to the SIP-I part of the call. This will often require some understating of ISUP and keeping ISUP related state information.
Especially on the borders between fixed and mobile networks there might be some need for transcoding the audio or video compression system from one format to the other. Different SBCs offer the possibility to integrate specialized transcoding hardware. For the case when the expected need for transcoding is low, some SBCs offer software based transcoding solutions.
Since their first introduction over 10 years ago SBCs have considerably gained in scope and capabilities. SBC of the first generation were dedicated devices with often off the shelf hardware that had the sole purpose of establishing a secure border between the subscribers and the operator’s PSTN gateways. These SBCs supported mainly NAT traversal and topology hiding. The second generation of SBCs offered a wider range of features including transcoding, support for more complex call flows as well as video communication.
Offering features such as DoS and overload prevention, IPSEC support and monitoring and fraud prevention solutions enhanced the security capabilities of SBCs. In addition to the UNI SBCs vendors started offering solutions for NNI and enterprise scenarios as well. To increase the performance and scalability of SBCs vendors started using dedicated hardware and to decompose an SBC into signaling and media control components that communicate with each other using a protocol like MEGACO. This decomposition allows operators to scale the signaling and media handling capabilities of an SBC independently.
We are currently seeing the third generation of SBCs. Vendors are starting to offer SBCs no longer only as a closed box but as a virtual machine that can be installed on the operator’s hardware or in a cloud. Further, SBCs are expected to offer open interfaces to enable a smooth integration into the operator’s multimedia service infrastructure.
Already the first generations of SBCs supported multi-protocol communication by supporting both SIP and H.323. The next generation of SBCs will enhance this feature by acting as a bridge between the emerging WebRTC implementations and SIP. Furthermore, in order to offer improved support for mobile users, SBCs will support integration with Apple and Google notification systems. This would enable mobile devices to remain in sleep mode but still be reachable to the rest of the world.
3GPP: 3rd Generation Partnership Project
B2BUA: Back to Back User Agent
IMS: IP Multimedia Subsystem
IP: Internet Protocol
ISUP: ISDN User Part
NAT: Network Address Translator
NNI: Network-Network Interface
PBX: Private Exchange
PSTN: Public Switched Telecommunication Network
RTP: Real-Time Transport Protocol
SBC: Session Border Controller
SAP: Session announcement Protocol
SCTP: Stream Control Transport Protocol
SDP: Session Description Protocol
SIP: Session Initiation Protocol
TCP: Transport Control Protocol
TISPAN: Telecommunications and Internet converged Services and Protocols for Advanced Networking
TLS: Transport Level Security
UAC: User Agent Client
UAS: User Agent Server
UDP: User Datagram Protocol
UNI: User-Network Interface
URI: Universal Resource Indicator
VoIP: Voice over IP
- Schulzrinne, H.; Casner, S.; Frederick, R.; Jacobson, V. “RTP: A Transport Protocol for Real-Time Applications (RFC1889)”, IETF, 1996
- Handley, Mark; Van Jacobson. “SDP: Session Description Protocol (RFC 2327), IETF, 1998
- J. Rosenberg; H. Schulzrinne; G. Camarillo; A. Johnston; J. Peterson; R. Sparks; M. Handley and E. Schooler. “SIP: Session Initiation Protocol (RFC 3261)” IETF, 2002.
- J. Hautakorpi, G. Camarillo, R. Penfield, A. Hawrylyshen, M. Bhatia, “Requirements from Session Initiation Protocol (SIP) Session Border Control (SBC) Deployments”, RFC5853, IETF, 2010