Voice over IP


VoIP (voice over IP) is an IP telephony term for a set of facilities used to manage the delivery of voice information over the Internet. VoIP involves sending voice information in digital form in discrete packets rather than by using the traditional circuit-committed protocols of the public switched telephone network (PSTN). A major advantage of VoIP and Internet telephony is that it avoids the tolls charged by ordinary telephone service

VoIP derives from the VoIP Forum, an effort by major equipment providers, including Cisco, VocalTec, 3Com, and Netspeak to promote the use of ITU-TH 323, the standard for sending voice (audio) and video using IP on the public Internet and within an intranet. The Forum also promotes the user of directory service standards so that users can locate other users and the use of touch-tone signals for automatic call distribution and voice mail.

In addition to IP, VoIP uses the real-time protocol (RTP) to help ensure that packets get delivered in a timely way. Using public networks, it is currently difficult to guarantee Quality of Service (QoS). Better service is possible with private networks managed by an enterprise or by an Internet telephony service provider (ITSP).

A technique used by at least one equipment manufacturer, Adir Technologies (formerly Netspeak), to help ensure faster packet delivery is to us to contact all possible network computers that have access to the public network and choose the fastest path before establishing a Transmission Control Protocol (TCP) connection with the other end. Using VoIP, an enterprise positions a "VoIP device" at a gateway. The gateway receives packetized voice transmissions from users within the company and then routes them to other parts of its intranet (local area or wide area network) or, using a T-carrier system or E-carrier interface, sends them over the public switched telephone network.

What is Voice over IP?

The Basics

What is voice over IP? How is it different from unified communications? And how can it help your small business?

The Range of Services

VoIP is available in a wide range of services. Some basic, free VoIP services require all parties to be at their computers to make or receive calls. Others let you call from a traditional telephone handset or even a cell phone to any other phone.


For VoIP, you need a broadband Internet connection, plus a traditional phone and an adapter; a VoIP-enabled phone; or VoIP software on your computer.

Security and Service Quality

Most consumer VoIP services use the Internet for phone calls. But many small businesses are using VoIP and unified communications on their private networks. That's because private networks provide stronger security and service quality than the public Internet.

Versus Unified Communications

Unified communications systems offer more features and benefits than VoIP, yet many are still priced for small businesses. Unified communications brings together all forms of communication regardless of location, time or device. Faxes, e-mail, and voicemail are all delivered to a single inbox. You can integrate your phone and customer relationship management (CRM) systems to improve your customer service, and much more.

The Benefits

Signal Processing


Voice over Internet protocol (VoIP) systems have become a basic tool with ever growing popularity. However, they commonly rely on an unreliable communication channel, such as the Internet, and are therefore subject to frequent events of data loss. These events are usually realized as lost data packets carrying audio information. This, in turn, leads to temporal gaps in the received audio sequences, as illustrated in Fig. 1. Left untreated, such gaps create breaks in the audio (e.g. missing syllables in speech signals). High percentage of packet loss (above 20%) can often render speech unintelligible [1]. For this reason, VoIP applications regularly incorporate a packet loss concealment (PLC) mechanism, to counter the degradation in audio quality, by filling in for the missing audio data, using various techniques.
A PLC mechanism should not impose high computational loads or extensive memory usage. Specifically, PLC should operate in real-time. Moreover, intense computations consume more power, which is a limited resource in mobile devices.
Most existing PLC techniques have difficulties handling long audio gaps. This paper presents an approach for hand­ling such gaps, corresponding to high packet loss rates. We suggest using an example-based principle that exploits audio examples collected from past audio signals. Once an audio gap is encountered, our algorithm harnesses the audio data surrounding this gap to look for the most suitable audio example to fill this gap. A mixture of audio features and prior knowledge on the statistical nature of the audio signal is used for finding the most appropriate set of examples that could be used for filling the gap. Once found, our solution presents a series of steps for isolating the best fitted example to use and pre-processing the exact portion of the audio to be extracted from the chosen example. This portion is smoothly inlaid to fill the audio gap.
Inpainting is a term commonly used in the context of filling in missing pixels in images. It was borrowed by Adler et al.to describe filling short audio gaps in a signal, by using the intact portions surrounding each gap. Our work has a similar flavour, but it differs from in several important aspects. The novelty in our work lies in using a self-content-based approach, while exploiting a higher level model for the audio signal. These enable handling longer temporal audio gaps which cannot handle, as observed when experimenting with such long gaps.

Packet drop cenario.Some packets are dropped during transmission, causing sr to have sequences of missing samples.

Figure 1

Loosing network packets

The building block of VoIP is an Internet packet which encapsulates a segment of a digital audio signal. Let Lpacket be the number of audio samples contained within each packet. Packets may have various sizes, corresponding to different values of Lpacket. mentions packets corresponding to 10, 20, 30 and 40 ms of audio. For a sampling rate of 8 KHz, these packets have Lpacket = 80,160,240 and 320 samples, respectively. Packets frequently get dropped, often due to deliberate action in times of network congestion. This results in loss of the encapsulated data they carry.
In, Ding and Goubran showed the dramatic influence of lost packets (in various loss rates and packet sizes), and examined different PLC techniques. Some techniques are described in and can be roughly divided into sender- based and receiver-based methods. Sender based methods (e.g. FEC) involve sending auxiliary information to allow later reconstruction. The auxiliary information maximizes redundancy while consuming minimal bandwidth. Such PLC methods require modifications in both sender and receiver.
Our proposed method only involves the receiving side, hence it is receiver-based. Receiver-based methods typically require lower bandwidth, or allow higher quality for a given bandwidth. There is a variety of receiver-based methods, some substitute a missing packet, either by a repetition of the adjacent preceding packet or by a predefined audio (noise or silence segment). Other methods include waveform linear predictions and codec-dependent spectral interpolation. All of these are reported to perform adequately for short temporal losses (up to around 20 ms), but brake down for longer periods of time.
However, long gaps are common. The Gilbert model for Internet packet lossimplies that packet dropping tends to occur in bursts, mainly when network congestion is experienced. This model fits packet loss statistics rather accurately. Using the model with standard parameters suggests two important characteristics, which are taken into consideration in this work:

  1. Dropping bursts of more than 5 consecutive packets are highly improbable, even in a poor quality communication channel.
  2. When dealing with larger packet sizes (corresponding to longer encapsulated audio segments), gaps longer than 40 ms are highly probable.

Algorithm sketch

The PLC process starts by continuously capturing a streaming digital audio signal. This audio signal is divided on the fly into overlapping segments of constant length. We call these segments audio blocks (ABs). Audio blocks in our system are substantially longer than a packet, for reasons that will be clarified later on. Each AB undergoes a feature extraction process, which yields a feature vector represen­tative of this AB.
During time periods where packets are not dropped, our system collects ABs and saves them as reference example ABsto be used at a later stage. Once a packet is dropped, the received audio has a missing sequence of samples. This missing sequence is a hole in all partially overlapping ABs that contain this sequence (qn and qn+1 in Fig. 2).
ABs that contain the hole constitute a set of optional query ABs, which share the same length as example ABs. In queries only, the part of the query AB corresponding to the hole is blank. The unharmed portions within these queries undergo a feature extraction process, similar to the one applied to example ABs. This process yields query feature vectors, to be compared to example feature vectors.
The remainder of this section provides a cursory description of our algorithm (see Fig. 2). Readers seeking rigorous formulation are encouraged to skip directly to Problem formulation. For each query, we pick the most suitable example to fill the hole. This selected example is the one best satisfying a weighted combination of the following requirements:

  1. Low feature space distance: This reflects the demand that for each hole, the intact portions of the query AB and its corresponding portions of the chosen example AB are similar.
  2. High prior probability for the resulting AB sequence: We model the AB sequence as a hidden Markov chain. Then, the prior is the probability of the chosen example AB appearing between the ABs that proceed and succeed the query AB.

Algorithm's sketch

Figure 2