Share the article
twitter-iconcopy-link-iconprint-icon
share-icon

The necessity of continuity

The vulnerability of time-critical trading operations has been highlighted by recent infrastructure failures in the London and Tokyo stock exchanges. Michelle Price examines how key players are battling to ensure their business runs smoothly in unusual circumstances.
Share the article
twitter-iconcopy-link-iconprint-icon
share-icon

For those individuals tasked with running time-critical trading operations, unscheduled downtime is the stuff of professional nightmares. During the past six months, however, a number of incidents have underlined how vulnerable time-critical trading operations really are to even fairly minor operational disruptions.

In November 2007, for example, the London Stock Exchange – which has spent the past four years building a state-of-the-art trading environment – experienced a connectivity problem with its real-time market data system Infolect. The disruption, it was reported, resulted in some traders leaving for the day with positions unexecuted.

Next it was the turn of the embattled Tokyo Stock Exchange, which was forced, on February 8, to suspend trading on part of its new derivatives platform (also the result of a multi-million dollar investment) due to a software bug. The Banker has also learned that JPMorgan experienced a power outage at one of its London locations in late February, forcing the bank to invoke its business continuity plans and move traders to its backup facility.

Growing risk of failure

In its Financial Risk Outlook 2008 report, the UK Financial Services Authority (FSA) says that the risk of such infrastructural failures is growing in line with market developments. Increasing automation, the rise of electronic trading; continued growth in transaction volumes; and greater reliance on straight-through processing, whereby manual intervention is eliminated, “increasingly expose financial markets to infrastructural failure”, because manual workarounds grow fewer and less feasible.

Dr Andrew Foster, a manager in the market infrastructure department at the FSA, says that, as the structure of markets become increasingly complex, with a proliferation in the number of players, the points of failure also multiply. “There’s something of a double-edged sword here: with more connectivity, there’s more complexity and even at the statistical level, more probability that something will fail. But you’ve also got more diversity, so if one element fails, its impact on the markets is less than if you have a single entity [predominating].”

If increasing competition – much of which has been prompted by the Markets in Financial Instruments Directive (MiFID) – means trading disruptions are less likely to devastate the economy at large, they are, conversely, more likely to devastate the player affected. This prospect looms large in the minds of market participants – particularly new entrants and growing contenders.

Planning for disaster

Take for example Turquoise, the long-awaited bank-owned multi-lateral trading facility due to go live in September 2008. For Yann L’Huillier, the nascent platform’s recently appointed chief technology officer (CTO), who has been tasked with building the platform from scratch, business continuity is a basic business necessity. “If we have an outage, and we don’t have a business continuity plan, that will be a short lifecycle for our business,” he says phlegmatically. For this reason, Mr L’Huillier and many of his peers responsible for the smooth running of the trading lifecycle are developing sophisticated business continuity strategies by which resilience is being built into the fabric of their operations.

Plus Markets is one such example of this. In 2007, the London-based equity exchange overhauled its in-house-developed trading platform in readiness for MiFID. By moving to X-Stream, a platform provided by Nordic exchange operator OMX, Brian Taylor, acting CTO of Plus Markets, has been able to reinforce the platform’s reliability by exploiting OMX’s state-of-the-art ‘active-active’ capability. In this model, all information relating to trading activity is synchronously replicated in real-time in two or more separate hardware locations – or data centres as they are commonly known.

Multi-site approach

“One of our goals was to ensure we had a highly scalable and fault-tolerant system with no single point of failure,” says Mr Taylor. “So we now have three [data] centres: the third site controls the other two sites and if it detects a problem it will just reroute the messaging. So in the event of an outage of power or if, for example, a primary trading system fell over, nothing happens in terms of degradation of performance. The traders just carry on in real-time, trading against the secondary site: they don’t know that anything has happened.”

But this level of resilience is not the preserve of trading venues. According to Chris Keeling, partner at Acuity Risk Management, an independent consultancy that specialises in advising investment banks on their business continuity strategy, trading desks are also starting to build out this level of capability. The trend can also be witnessed elsewhere in the trading supply chain.

Euroclear, the world’s largest provider of domestic and cross-border settlement services for equity, bond, derivatives and fund transactions, has spent the past four years building a comprehensive, €100m business recovery programme, which will allow the organisation to resume technical operations for clients within an hour of a local disruption – such as fire or an explosion. But it has not been easy, admits John Trundle, managing director for risk management at Euroclear Group. “It required a lot of effort and a lot of work between contracting providers – which was a challenge.”

Mr Trundle’s team now regularly switches Euroclear’s live operations between its three data centres to ensure that theory meets with practice. “We’ve done it in practice and we know we can do it for real,” he adds. Testing the integrity and functionality of both infrastructural and software provisions has become an ever-critical piece of the business continuity strategy – particularly when trying to discern where, in the marketplace, bottlenecks or points of failure are likely to occur.

Plus Markets’ Mr Taylor has some salient experience in this area. “In the testing process we tuned [the platform] up and we got trading customers to tune up their algorithms. We got huge volumes in and we never collapsed.” However, he continues, the downstream systems were not so robust. “Third-party providers that we feed information to when a trade occurs could not handle the problems that we could.” In such a highly connected marketplace, one player’s resilience, it seems, can prove another’s downfall.

Replication limitations

But replication and stress-testing is not “the end of the game”, says Turquoise’s Mr L’Huillier. In order to distribute risk and guard against the impact of a localised disaster, many financial services organisations are now locating their primary and back-up data centres substantial distances apart. E*Trade Financial, the online trading platform, for example, locates its technology systems in multiple sites that are at least 100 miles apart.

This model, says Mr Keeling, has also come to dominate thinking in the front office. “The investment banks are looking at building much more resilience into their IT and technology capability: so, whereas in the past they had a lot of their trading applications located in the office from which they are operating, they now see this as too much of a risk. They therefore want to locate their trading applications in another data centre, elsewhere.”

Time constraints

The model is not without its limitations, however. Chief of these is the issue of latency, that is the time it takes for the trading transaction to take place, and the response that subsequently appears on the trader’s screen. Because the transaction has to travel to another location and back again, the speed of the trade can often be degraded. “There can actually be movements in the market price between the time at which the button is pushed and the time the final result comes back,” says Mr Keeling.

“The investment banks therefore have some concern over how far away they can locate their main production sites,” he adds. The question of how far is too far, currently represents a major issue that has yet to be resolved: no size, as it were, fits all. Plus Markets, for example, whose operations are more time critical than those of providers such as E*Trade, has positioned its two primary data centres far closer, at 30 miles apart. As such, finding the optimum point between performance and resilience remains a unique challenge for each trading organisation.

This tension – between business agility and performance on the one hand and long-term sustainability on the other – is found in other areas of the trading landscape. Ensuring business continuity across trading desks that do not operate in real-time is a case in point, not least because the multiplication and increasing complexity of trading instruments is often not supported by officiated business processes.

“Desks tend to go off and do their own thing,” says Mr Keeling, adding that they are – and with good reason – more concerned by making money in the short term than with business continuity. “You might create a product for a short time and therefore, as a senior trader, take a view that you’re not going to worry about business continuity for that particular product.”

But the imperative to take advantage of short-term trading opportunities can, in some instances, leave desks exposed. “This is where business continuity people have quite a challenge,” says Mr Keeling. “They want to keep up with what the business wants to do because if you stop them they won’t make money: but if you allow them, and you don’t have recovery plans in place, you are in effect running at a risk.” In so doing, he concludes, many trading desks are at risk of violating compliance and regulation requirements.

People protection

Ensuring the resilience of infrastructural assets might be vital, but it should not eclipse the importance of what Euroclear’s Mr Trundle calls “people resilience”. Following the attack on the World Trade Centre in 2001, organisations have been encouraged – under best practice guidelines – to disperse the risk to their people, as well as their hardware. Under Euroclear’s business continuity strategy, its workforce is spread across multiple locations, while each site operates a ‘live dual-office’ mode.

“This means the people who support the operations and clients are capable of taking over different critical functions,” says Mr Trundle, and of operating systems from different locations. Plus Markets has also divided its workforce across two sites, neither of which house any hardware. The main London office is in effect a “shell”, says Mr Taylor. If one location was demolished in a major disaster, Plus Markets – like Euroclear – would still have adequate resources to continue trading operations unhindered – at least in the short term.

In this way, much business continuity planning aims to strategise across multiple facilities. But this approach is not always as successful in reality as it promises to be on paper – particularly when one facility is hosted by an outsourced back-up provider. Ken Emerson, head of IT for British Airways’ pension fund, knows this all too well.

In June 2006, a fatality at the fund’s central London offices at 9am in the morning – when a lift engineer fell down an elevator shaft – forced the traders to vacate the building for two days. Meanwhile, the fund’s trading systems were still running. Using a reserved back-up facility hosted by a major business continuity service provider, the workforce became triangulated between three locations: the back-up facility, the fund’s second office in outer London, and several home locations where traders were able to log into the system remotely.

Under this business continuity plan, however, it had been assumed that the system would not be running, and that an entirely new network could be created at the back-up facility. Because the system was running, however, Mr Emerson’s team was forced to build a new network across three different sites. “We had immense trouble with the kit,” says Mr Emerson, “and the users struggled”.

Hindrance not help

Furthermore, when the traders at the back-up facility returned to the central London office, they had to have all of their data files rebuilt on the existing network. “So the business continuity plan actually caused more trouble than it solved,” says Mr Emerson, who has since stopped using a recovery provider.

“The plan was that people would convene and then know what to do next depending on the circumstances of the disaster – so you have a big plan, which is interesting, but you can’t plan for every eventuality.” And planning for the worst case scenario, as Mr Emerson’s experience indicates, is not always the most appropriate or successful strategy.

Like Mr Emerson, Turquoise’s Mr L’Huillier is philosophical on this point: “Your business continuity plan is never invoked for the reasons you expect: you know how you test a business continuity plan? Have a disaster.” However, he warns it is far easier, more successful and far less costly to build resilience and continuity into trading operations up front. “If you start without and you want to implement it afterwards, it becomes much more complicated.”

Was this article helpful?

Thank you for your feedback!