macVoIP.com
the web home of Ted Wallingford
An excerpt of the chapter on Network Infrastructure for VoIP from the book Switching to VoIP by Ted Wallingford.

WAN Layout

Hub and spoke

If your WAN is a hub and spoke layout, you most likely have a single (or a few), centralized data center where all of your business locations connections are made with the rest of the network. Hub and spoke networks, like the one at lower left in figure 13-2, allow for easy central network administration and often have lower facilities costs than other layouts. In hub and spoke networks, all remote locations connect directly to the data center without any intermediate hops.

The benefit of hub and spoke layouts to VoIP is that they minimize latency because thereÕs only one hop between the remote office and the data center (where the softPBX and primary switching equipment is). Other layouts may contribute more to latency than hub and spoke does. If thereÕs a drawback to the hub and spoke approach, itÕs that thereÕs a single point of failure on the WAN--the data center.

Figure 13-1. WAN layouts, clockwise from top-left: partially meshed; circular peered; linear peered; hub and spoke.

Peered

Sites in a peered network form a chain along which data travels. The data hops from one site to the next until it reaches its intended destination. The sites furthest from the destination use the ones that are closer to the destination as their Ōnext hopsĶ along the path, getting the data closer to its destination as it travels through each consecutive site.

The benefit of this technique is cost-reduction. Peering keeps point-to-point connectivity costs down by decreasing the mileage of each link. In a hub and spoke layout, you might use 6 remote site links with an average length of 300 miles apiece. But, if the distances between each site and its closest peer are all lower than the distance to the data center, a peered layout can drastically lower that average distance and the associated cost. This is because each site is connecting to nearby peers, which are much closer than the HQ, which is further away. In a peered layout, only one peer needs to be connected to the HQ.

But there is a big drawback to this layout. Peered networks like the lower-right example in figure 13-2 are prone to cascading failure patterns. When an office that supplies a route to the HQ for other office experiences an outage, so do the offices that rely upon that route. An ingenious way to eliminate this problem, if the geography is suitable, is to make a circular peer layout, like the upper-right example in figure 13-2. When dynamic routing is employed, circular peer layouts are very resilient to isolated failures.

Exclusively peered layouts impose a higher number of router hops on call-paths that traverse the WAN than a hub and spoke or meshed layout would. This might be OK for data applications, which arenÕt as sensitive to delays, but for VoIP, excessively peered networks can be show-stoppers. If you had a ten-hop chain of peers with T1s between them, you could easily have 50 ms of latency from end-to-end, not accounting for higher-layer sources of latency. Making a quality VoIP phone call in this scenario just wouldnÕt be possible.

Meshed

In meshed networks, remote sites may be connected to each other, like a peered network, and connected to the data center, like a hub and spoke network. Or, a single remote site might connect with two or more other remotes. This provides diversity for the transport to the data center, and reduces the risk of widespread outage.

Depending on how meshed the network is (how much redundancy it offers), the threat of network downtime can be nearly eliminated. The Internet is a highly meshed network in the sense that many of its backbone carriers connect to multiple other backbone carriers.

In VoIP, a meshed network provides the greatest protection against unwanted system downtime. Meshed layouts, however, are the most expensive, and arenÕt always practical. They might be needed in very demanding scenarios, such as a call center with offices in multiple states, or highly redundant military or intelligence applications.

Quality of Service measures must be more sophisticated when used in a highly-meshed network. Indeed, one thing RSVP is designed to do is negotiate the best-quality path through the network for a phone call when there are multiple potential paths. This is quite different from dynamic routing alone, which chooses paths across the network based on policies that have no direct relationship with the voice application.

Layout and PBX Placement

Most networks employ a combination of distributed and client-server models for different applications, and a combination of different layout techniques for different balances of cost and resilience to failure. As long as the total number of remote locations is under ten or so, a circular peer network is a great solution to the downtime issue, because if an upstream route fails, the other arc of the circle is still up and also leads back the data center.  This functionality is handled by--you guessed it--IP routing. So it takes two points of failure on the circle in order to bring the voice transport down for any one site, as illustrated in 13-3. If a particular arc on the circle canÕt handle the traffic being thrown at it by the voice application, then itÕs a matter of adding a mesh link between that arc and the DC.

Figure 13-2. Even though a failure has occurred on one of the WAN links, all the remotes on this circular peer network can still reach the data center.

The old saying from the real estate business, Ōlocation, location, locationĶ is also true in IP telephony. Indeed, your choice of WAN layouts and computing models are always tied to the locations of your application servers and user groups. Since you canÕt move your users around to meet the needs of your network design, you must strategically locate VoIP resources (like PBXs and PSTN gateways) to meet their needs.

Locate to conserve network availability

It would seem that itÕs ideal to take the existing WAN and just pick the best locations within it for all of these elements--but thatÕs not always the right approach. The WAN footprint may need to change shape to optimally support the VoIP network youÕre about to overlay upon it.

For example, the places where large amounts of traditional network traffic (say, database traffic) are transported may not always be the places where huge amounts of phone calls travel.  Look at figure 13-3. If a majority of voice traffic travels between the remote sites at the top of the diagram, it might make sense to put a PBX there--rather than in the existing data center at the bottom of the diagram.  The last thing you want to do is decrease network availability to existing apps in order to add voice. This is exactly what youÕd be doing if you unnecessarily overlaid a voice pathway onto an already-busy data pathway.

Locate to save money

There may also be geo-economic reasons to place a telephony resource at an otherwise unlikely location. Consider international call centers. Lately, itÕs become very fashionable for insurance, mortgage, and collection companies to house big groups of low-cost outbound telephone operators in countries such as India and Mexico. These English-speaking employees call American households on behalf of American companies.

It would be very expensive for all of these calls to traverse the international LD network from India to the U.S. Instead, these companies may use VoIP to trunk calls over a comparatively low-cost international WAN, to a PSTN connect point in the United States. Calls that originate inside the U.S. PSTN are much cheaper when destined for U.S. destinations.  So, depending on your line of business and the needs of your particular application, locating a PSTN connect point a great distance from your call center, or perhaps a great distance from your PBX, might be a good idea.

Locate for capabilities

The location of telephony equipment is often dictated by the equipmentÕs purpose and interfacing capabilities. For example, PBX servers with built-in PSTN interfaces may need to be in the same building as the PSTN connect point. But a PBX server with an outboard PRI chassis could` be located several hops away, and perhaps miles away, from the connect point. The PRI chassis would need to be near the connect point, but, WAN bandwidth notwithstanding, the PBX server itself could be anywhere on the private network.

Voicemail servers, like CiscoÕs Windows-based Unity, may need to communicate with an e-mail server if youÕre going to use integrated messaging. In this event, very high amounts of bandwidth between the Unity server and the e-mail server will likely be required. ItÕs quite common for voicemail and e-mail servers to be installed in the same rack, or to be running on the same PC.

Since weÕre talking about layout, all of these issues must be taken into account when looking at how your VoIP network will overlay your IP network layout. Is there enough bandwidth to support the necessary loads between all endpoints? Would adding a new connection solve a capacity problem imposed by VoIP, or would it be better to place a PBX somewhere solve the problem? Which solution would be more cost-effective? Is your vision of Voice over IP even feasible given todayÕs load on the network? If itÕs a peered network, would the outlook for VoIP be better with a hub and spoke layout or a few new mesh links?  How would such a change affect other network systems?

DonÕt locate for convenience

The VoIP network ought to drive the IP network design. A PBX shouldnÕt be placed in a particular location because Ōthat office already has a server rack and a UPSĶ or because ŌthatÕs the office where the old phone system isĶ. Ultimately, while these issues are an influence, they canÕt be deciding factors in locating VoIP resources. The VoIP networkÕs design must not be retrofitted around the current networkÕs pre-existing topography. If this were a good way to approach IP telephony, then VoIP-over-Internet would long ago have replaced the PSTN.  Clearly, the existing layout of the Internet isnÕt suitable to replace the PSTN (yet).

If it makes sense to have a PSTN connect point in another country with VoIP trunking back to your PBX, build your WAN to accommodate that. The bottom line is this: the IP network you have in place today probably wonÕt be the IP network youÕll have in place when migration to VoIP is complete.

Disaster Survivability

The ability of a network to survive isolated equipment and link failures is called survivability. ItÕs a subject that, like networking itself, is addressed at every layer of the OSI model. Backup power supplies, dynamic routing, and remote-survivable dial-plans (groups of phones that can call each other even when their access to the PBX is cut off) operate at the physical, network, and application layers, respectively.

Surviving Power Failures

The most fundamental survivability measures occur at the physical layer, starting with backup power. Without backup power, your siteÕs primary source of electricity may go dead, taking your phone system with it.  Whether you use standalone battery systems or a combination of batteries, a fossil fuel generator, and a transfer switch, backup power is a requirement in all data centers and at all crucial network connection points. 

Unlike analog phones on the PSTN, VoIP systems donÕt get their power from the PSTN. VoIP systems will fail during power outages unless they have adequate backup power.

Multi-phase power

Most small offices and residences receive their AC power in the form of a single 120/240-volt connection. This connection feeds a circuit-breaker block that distributes individually limited power circuits throughout the premise. When the power fails at the breaker block, the power fails for the entire premise.

But when power is delivered in multi-phase, it can create redundancy. Multi-phase power means that the same connection to the electric company can deliver two or three AC supplies to the subscriberÕs premise. The supplies are connected to sections of the breaker block, or to different breaker blocks. So, when a single phase fails, the other phases are still in tact, and equipment on the failed phase can be moved to them. This wonÕt eliminate all failures, but it can protect you against certain kinds of failures that occur within the electric companyÕs facilities.

Uninterruptible power supplies

In order to survive a power failure, all of your network equipment must remain running--switches in phone closets, servers at the data center, and IP phones themselves. This means you either have to back up every device individually, using Uninterruptible Power Supplies (UPS), or create a centralized power distribution system. One way to do this is to place a backup switch with battery and/or generator at a central location, and then pull AC wiring from the backup system to each of your phone closets. This way, each critical phone closet has an AC source thatÕs backed up centrally.

For IP phones, use PoE, and make sure the powered-switches or injectors are backed up, too. The moral is this: it wonÕt do any good to have your Linux PBX server on a quality backup system if your phones and switches arenÕt on one, too.

Surviving Network Link Failures

Redundancy is your best defense against network link failures--those that affect only an individual link like a single T1 or an Ethernet switch. If a network link is absolutely critical, there should be, if at all possible, a redundant alternate link that provides an identical logical path.

Point-to-point T1s can be made more resilient to failure by bonding them together into multilink bundles. This way, one of the T1s can fail without totally downing the networking pathway. Moreover, two T1s running through two different providersÕ networks are more resistant to failure than a pair that runs through only one network.

But redundancy costs money. It may be tough to justify a completely redundant network, and even tougher to manage one so that, when failures occur, it behaves as originally envisioned. MooreÕs Law infers that whatever capacity you make available, your application will become dependent upon it and grow to exhaust it--even if itÕs placed there for backup reasons to begin with. So, even if you have double the capacity needed for every link--in the name of redundancy--you may still find yourself in a state of panic when that capacity is merely reduced.

Minimizing the Havoc of a PBX Crash

There are few disaster scenarios more frightening than the loss of a single, critical serverÉ Except, perhaps, a single critical PBX server. In the age of distributed computing and PC components, the PC chassis is becoming the new home of the private branch exchange.

The PC brings its well-known characteristics to telephony: cheapness, modularity, extensibility, and, unfortunately, instability. Better PC servers equals better stability, of course, but PC backplanes will never be constructed with the untouchable reliability goals of old-school PBX systems.

The mere fact that PC servers rely upon hard disks means that PC-based PBXs have a pretty good chance of a downtime-inducing crash. So what can be done to prevent your next-generation dial-tone from dying unexpectedly?

á       Back up your dial-plan regularly and have a standby server ready to go in the event of a failure.

á       Use redundant, mirrored hard drive arrays on your softPBX servers, or a central, redundant network-attached drive array to eliminate the threat of hard disk failures.

á       If you use a commercial VoIP platform like Meridian or Avaya Media Server, invest in failover equipment. The biggest advantage of commercial systems over open-source ones is that they have reliable, well-tested automatic failover ability.

á       If using Asterisk, you can create emergency contexts in a secondary serverÕs dial-plan--one that matches the active dial-plan of a primary server. This way, when the primary server goes down, you can ŌpromoteĶ a secondary just by including the emergency context in its dial-plan. If you wanted to get fancy, you could use the Asterisk Manager API (described in chapter 17) to trigger the failover automatically, and notify an administrator by doing a Dial() to his cell phone.

á       Use a distributed call-switching technology such as DUNDi (discussed later in this chapter) to minimize the effect of a single PBX serverÕs downtime.

á       Use IP-based connections to the PSTN rather than PRIs or POTS lines. If a PBX crashes, itÕs easier to redirect IP-based connections than it is PRIs or POTS lines.

á       If youÕre using a PRI attached to a crashed PBX, you can automatically redirect it to a secondary PBX by way of a mechanical T1 failover switch, also called a trunk bypass switch.

á       If an H.323 gatekeeper crashes, itÕs easy to failover to a backup. When using multicast locate requests from IP phones, you can configure a backup gatekeeper to listen for requests on the same multicast address as the primary gatekeeper, but only enable it to respond to those requests when the primary server has failed.

 

PSTN trunk failures

Some types of network links are easier to make redundant than others. IP links can be automatically failed over using dynamic routing at the network layer, but voice T1s and phone lines arenÕt so simple. A PRI, for example, may go down--and when it does, all of its DID numbers and inward signaling configuration will become unavailable to the PBX. Even if a second PRI exists that the PBX can use for outbound calls, some emergency switch at the telephone company will have to occur in order to re-route inbound calls to the second circuit.

The same is true of POTS and Centrex lines. If you have 10 POTS lines in a hunt group, and the line with the published number (the lead line) experiences a failure, youÕll have to contact the phone company to get all calls to that line forwarded to the next line in the group, until the problem with the first line is resolved. Phone companies do offer high-availability solutions for these scenarios at your expense, so contact your local phone company to see what they offer.

Hot failover--instant, user-transparent switching from one telco circuit to another--is difficult to achieve. There are some trunk bypass switches that can redirect private trunks from one T1 to another, but this can create challenges for DID, caller ID signals, and call routing.  Plus, it isnÕt exactly cheap to maintain backup PRIs merely for the sake of failover.

HereÕs what to do if your PRI or POTS trunk goes down:

á       Have the phone company forward calls from your lead number to a backup line.

á       If the failure is in a POTS hunt group, have them Ôbusy outÕ the failed lines so calls will roll to the next line in the group, which is presumably still working fine.

á       Some phone companies let you manage your Centrex groups by software or web interface. Make the appropriate changes yourself.

 

(C) 2003 - 2006 Ted Wallingford