The push for transparency, real-time information and improved risk management in financial services has led to the adoption of new computing infrastructure, which means that running a Google search to check credit risk exposure might be around the corner.

A development in the way that data is stored and retrieved is promising to give banks previously unheard of insight. The finance industry is under pressure from both regulators and customers to improve transparency, with the consequence of non-compliance ranging from increased capital charges to being barred from certain lines of business. At the same time, the continuing growth in electronic transactions across both retail and wholesale businesses has meant that banks must run increasingly higher levels of calculations at faster speeds.

“Over the past five years, the importance of scale-out architecture to running our business has increased substantially,” says Don Duet, global co-chief operating officer in the technology division at investment bank Goldman Sachs. “The amount of computing that is required to run a large global market making firm and the amount required to run risk management at scale, given the size of positions and complexity of our business, means that we have seen an exponential curve in the amount of computing needed to run and execute the business on a daily basis.”

Running at real time

The push for real-time information and improved risk management is challenging for relational databases, the workhorses of data analysis. The process of extracting, transforming and loading data into relational databases is slow and requires the data to be structured. That means the user must predetermine which characteristics of the subject will be of interest in the future, based on the queries that will be demanded of it, then place numerical values around those characteristics so the information becomes quantifiable.

If circumstances change and new queries are demanded, the effectiveness of the database will be questionable. For example, the mandated use of central counterparties for clearing over-the-counter derivatives has meant that the calculation of variation margins must be done intraday, rather than on a quarterly basis. Clearly a system set to gather the data on a bank’s position and then calculate this figure every three months may not be able to cope with the new demands.

However, the systems used to provide search engines with rapid access to the content of millions of documents are being tested by banks as a solution to challenges of this sort. The online and social media community – Google and Facebook, for example – has been faced with an infinitely larger challenge, in terms of storing and retrieving data that has not been structured.

“The circles we were ending up in were often the same circles as people building scalable computing architecture for search or social architectures," says Mr Duet. "The focus from a design and management perspective was leading us to spend as much time with dot-com firms as with classic infrastructure firms.”

Goldman Sachs joined the Open Compute project, which has the target of building “one of the most efficient computing infrastructures at the lowest possible cost”.

Paul Davis, a global banking industry leader at IBM, says: “Large institutions always had the capability to do this if they wanted to spend the money; what has made this more attractive is that they are able to do this cost-effectively using commoditised hardware.”

Source code

Google operates using a system called MapReduce, which breaks information down into equal-sized chunks and stores it in parallel across multiple commoditised servers. That has two advantages. First, searches can be carried out in parallel, with all of the chunks of data at the same time rather than sequentially, making the process of running a check very fast. Second, the lack of structure in the stored data means that searches can be open with the ability to apply more in-depth searches of data once an initial search has delivered results.

Open-source software group Apache released its own system on December 27, 2011, which had been inspired by a 2004 whitepaper in which Google explained the MapReduce design. Called Apache Hadoop, it has been picked up by Yahoo! and Facebook among others, and as an open-source platform is allowing firms to test the potential that big data technology holds for them. Similar to MapReduce, it stores data on low-cost server clusters delivering a cheap method of rapidly searching through unstructured data.

“It allows you to get this weapons-grade analytical capability at a low-cost entry point,” says Mr Duet. Few banks are beyond the proof-of-concept (PoC) stage yet; most big players are currently exploring the technology’s limits and strengths to identify where it can be used.

In-house action

Alasdair Anderson, global head of IT Architecture at HSBC, says: “The concept we are exploring with big data is the ability to query the entire enterprise in a single environment. We have been kicking the tyres. Could this technology be used productively by our offshore development teams? Or does big data require a team of Silicon Valley geniuses to make it work? The answer to the offshore question was a resounding 'yes'.”

To test it, the bank set up a 'hack-a-thon' competition; 18 teams were given 24 hours over a weekend on a big Hadoop cluster with terabytes of cleansed data sets, to see what they could do with it. At the end of the 24 hours, 17 teams had delivered solutions. The winning team created a solution that used predictive analytics to determine how customers might make choices about their investment portfolios.

Internally, the bank is now examining where it can best be used. Its properties mean that the first insight gleaned from a set of data is far from the last and that will allow banks to identify greater and greater numbers of use cases as they experiment with and apply the systems to different challenges.  

“You can continually advance measurement and quality of information once the data sets are in a file system,” says Mr Anderson. “It is not the case that you analyse it once, then it is structured and you never change it again. You have always got capacity to work on it. The team that developed the predictive analysis were using statistical methods to gain insights; they wanted to take the results of what the customer sees on a screen, the choice they make, then modelling in a similar way to machine learning to predict likely customer choices in the future.”

Big data in action

Goldman Sachs has been applying big data approaches and technology to help its security model evolve, in the face of rapidly changing threats. “How do we continuously improve data management, ensuring that the assets of the firm are protected and that we protect ourselves against external threats?” asks Mr Duet. “We have built a meaningful size Hadoop cluster, where we can bring in security content from web logs to firewall data to packets on the network and then run different types of analytics and queries to look for unusual behaviours."

Known internally as the Hunter Programme, it makes use of the flexible querying that big data technology provides to help the firm understand whether it is under attack.

Credit Suisse is also applying big data models to security. Ed Dabagian-Paul, lead architect for enabling technologies at Credit Suisse, says that security is effectively a cost of doing business, and so the ability to run it on low-cost hardware and open source software is appealing. “It is nothing the industry hasn’t been doing for years, but we are now able to do it at a lower cost point and across types of data that we weren’t able to run it across before,” he says.

The bank is using the commercial product Splunk, which has a scale-out architecture similar to Hadoop, to analyse security and other logs. It has also run a PoC with ETH Zurich (Swiss Federal Institute of Technology), looking at attempts to access databases with non-parsable queries, to check whether there was an attempt to access restricted information.

IBM's Mr Davis says: “The traditional use cases that we see fall into three areas: risk, fraud and customer data. We saw the most initial interest around risk management and compliance. Banks are using this for financial risk analysis, providing insight into credit and market risk, running Monte Carlo simulations [computational algorithms that rely on repeated random sampling to compute their results]. It is also used for risk assessment on small and medium[-sized] businesses where firms are pulling in information from news feeds to supplement structured data, and also on larger institutions that are operating as a bank’s counterparty, where a lot of information about them is released in unstructured or semi-structured formats such as press releases.

“Second, it is used for anti-fraud, anti-money laundering and market surveillance. It is amazing what people will put into Facebook or Twitter that can be correlated with other data. Once you have established something suspicious you can deep dive into the data to see if there is something to substantiate the suspicions. Finally, we have seen a lot of interest around customer and marketing insights, for example BBVA is looking at sentiment analysis, but also banks are looking at contact-centre analytics, taking voice, e-mails, surveys, customer data that you can factor into you segmentation and next best action activities.”

Still in PoC with other banks are situations such as the examination of derivatives contracts held in PDF format to determine the firm’s position in credit default swaps, for example.

No substitute

This does not mean that banks are looking to replace old systems wholesale. Typically, once a query has been run across unstructured data, the banks will have quantifiable data that can be analysed using traditional databases.

“It can be hard to prove the return on investment [ROI] in some of these technologies,” says Neil Palmer, partner in the advanced technology practice at SunGard Consulting Services. “The ability to discover unseen patterns is not a [good] ROI to get you on board in the first place. The other challenge we are seeing is that you need more tools that operate at the Hadoop layer. As patterns emerge, that will drive the development of other applications to deal with them.”

Mr Dabagian-Paul at Credit Suisse says that there are some aspects of Hadoop that are not as resilient as the abilities that can be provided with a traditional database, which will require Hadoop to develop or the banks to change how they do business, with a bit of both most likely to occur. The existing support for data analysis in the enterprise will also need to change if firms are to make best use of big data technologies.

“The cost structure of storing all of this data on commodity hardware without licence fees is very interesting but most banks have enterprise-wide agreements with their database vendors, so bringing in one Hadoop cluster will either not change or even increase their overall spend slightly,” he says. “There needs to be a tipping point where enough data is running through Hadoop and not relational databases and I think we are a while off from that.”

Philosophical shift

From a hardware perspective, banks have not been geared towards the stripped-down servers that are used by the social media and search firms to support big data technology and Mr Anderson believes this may slow adoption.

"There is a dichotomy in the data centre,” he says. “You might have 1000 servers on the floor, then you implement virtualisation and you have 4000 servers, but that means your diversity and complexity increases and that is taking you further away from where you want to get to. Big data takes those 1000 computers and makes them one. So that is a real sticking point for everyone within the enterprise corporate world; data centres are not set up to do this. So at the moment adoption is challenging."

However, the model supports the evolution of risk mitigation to the software layer of the technology stack, where traditionally a lot of risk mitigation has been handled in hardware. A data centre would be built with two power supplies and a significant supporting infrastructure; a big scale-up server would have all of the bells and whistles attached to make sure its risk of failure was as close to zero as possible. But that was the old world, argues Mr Duet.

"The new world is to take as much of that risk mitigation and put it into software,” he says. “Your application is smart enough to know it needs to be backing itself up persistently. Or, rather than spending $20m on building in critical power and electrical resiliency, you distribute data though your cloud so that information is distributed into multiple places by default, so failure of any one place can be easily recovered. There is a secular shift and Hadoop is a great example of that. With MapReduce you write every piece of data to multiple places so if you lost any one node, that data is replicated in enough places that you never lose the actual data."

PLEASE ENTER YOUR DETAILS TO WATCH THIS VIDEO

All fields are mandatory

The Banker is a service from the Financial Times. The Financial Times Ltd takes your privacy seriously.

Choose how you want us to contact you.

Invites and Offers from The Banker

Receive exclusive personalised event invitations, carefully curated offers and promotions from The Banker



For more information about how we use your data, please refer to our privacy and cookie policies.

Terms and conditions

Join our community

The Banker on Twitter