Dual Location Redundancy

Dual Location Redundancy works with VoIP by writing to a disk, queuing to memory, and then sending to the remote site. If you were to send directly to the remote site, each file would be affected with 25ms latency – even just 5ms is too much and would affect IO and system performance greatly.

This is why we need spare bandwidth to push queued data and keep it in the memory queue the least amount of time possible, so that in case of failure, data loss is minimal. The system must be able to handle peeks without filling up the memory queue because as soon as the queue is full, it will cause delays and slow everything down, affecting all calls and producing sound quality issues. Aside from that, there are always other things that can affect bandwidth, noise or whatever.

Also, all is coupled from hundreds of programs; consequently, it is hard to calculate the exact maximum or average resources requirements for all usage scenarios. We can only test and see, as we usually do.

There are a couple of things involved all the time:
– Asterisk will constantly write logs about what is happening
Pbxware and Asterisk will query and write to database during calls processing
– Recording will be writhed down and maybe transcoded to some other format like mp3
– There are more other utilities constantly running all together making up the system we call PBXware.

In theory, 50 Mb/s should be fine; but for best quality, I ask for 100 Mb/s.

Below please see our measurement graphs in a one hour period from one of our live systems (two remote locations connected with 40Mb/s link). This system has mixed calls with and without recordings and up to 30 concurrent calls. So this is how it looks like for real:

The first image shows the number of concurrent calls on the first graph and on graph below (at same time) network bandwidth used to sync data.

Faruk1

The second image shows how big data flow is on the hard disk and shows how much data is in the memory queue at the moment (data out of sync).

Faruk2

Ring groups are used a lot on this system so there are peeks during group ringing on the graph, so the actual maximum is 20 calls on these graphs.

UPDATE: One of our customers emailed us the following:

You may be aware that we had an issue with our PBXware system this morning that left us offline for about 2 hours. Part of the issue appears to be that we are running across two data centers using SERVERware which I do recall you telling me was unsupported . At the time we decided to give it a try using asynchronous communication as it’s a gigabyte link but this clearly isn’t a viable solution. I also recall you saying about another way of doing things to provide resilience across two sites. Can you remind me?

At times, even with the above, it is hard to explain why this does not work. The simplest summary we can provide is : Voice is not Data.

Leave a Reply