My Frame Work

How Investment Banks Can Optimise Their Cost of Data

The surge in both creation and usage of data at Investment Banks means that banks have to make substantial investments in the way they manage and maintain their data. As banks start to realise the true value and worth of the data that they have, they are also cognisant of the price they are paying for the data. A look at how data cost is being managed in the industry shows that Banks need to measure the cost of data and optimise that cost in a wholistic manner i.e. cost from data creation to data destruction. 

Data is your most important asset but what does it cost?

‘Data’ is widely recognised as the primary fuel that drives the economic engines of the new world. The financial services industry is increasingly realising that data sits at the very heart of their success. At the same time, they are also realising that issues with data sits at the heart of every crisis they have faced which is the reason for new / expanded roles like those of CDOs. Data serves as a lifeblood for Investment Banks who rely on huge streams of market data to churn the trading floors.

Over the last 3 years, banks have
been the biggest producer and
consumer of financial data

Both the demand and investment on data have been increasing for Investment Banks. Over the last 3 years, banks have been the biggest producer and consumer of financial data. Increase in regulatory scrutiny have further increased the ‘data demands’ across banks. 

While data is an asset for all firms, the key is to control the cost of creating and maintaining this asset. It will not be worth much if it becomes too expensive to mine and maintain.  In my experience managing large-scale data projects across major investment banks, I have observed the need for a holistic approach to benchmarking data costs and identifying cost efficiencies.

This article looks at the cost of data for Investment Banks and tries to identify a wholistic framework which IBs can use to benchmark their cost of data and to identify areas of creating data cost efficiencies. This article relies on the authors experience in managing various large-scale data projects across major investment banks.

Cost of data is a significant cost for IBs

The cost of data is a significant cost for an organisation as a portion of its overall cost structure. As per various discussions with 8 G-SIBs, cost of data contributes to approximately 23to 25% of their overall cost. Specially for IBs, cost of data is increasing at exponential rate across the following three key areas:

Sourcing

Infrastructure

People

Cost of market and external data feeds

Cost of infrastructure to ingest, store, process and distribute data

Cost of people engaged in creating and modifying the data. In our experience, IB firms have been reducing 6-10 percent of their workforce involved with data management

Cost of data is increasing at
exponential rate

Considering the current business climate for the banking industry, a dedicated cost optimisation effort is required to contain the cost of data at investment banks. 

Who should care for cost of data? 

The short answer is; Everyone. Margins for IB firms have remained flat. A recent look at IB business in 2019 shows revenue from the equities business is drastically reducing (down by 10%) while revenues from rebounded are down by 3% while investment banking revenues fell 3%. Average Return on Equity is 9.3% for major investment banks against an aspiration of 13% to15%. While operating costs for business are increasing, other players are emerging with lower cost base (including lower cost of data).

Sourcing

Infrastructure

People

Cost of market and external data feeds

Cost of infrastructure to ingest, store, process and distribute data

Cost of people engaged in creating and modifying the data. In our experience, IB firms have been reducing 6-10 percent of their workforce involved with data management

Cost of data is increasing at
exponential rate

Revenues and margins are flat
across industry.

Banks need to focus on measuring
and reducing their cost

With macroeconomic indicators not showing clear signs of significant improvement, it can be reliably stated that the flat margin and increasing operating cost of business trend is likely to continue in the near future. This means reducing cost of data becomes a firm-wide goal rather than just a KPI for the CDO or CIO.

An approach to measure and optimise cost of data

While most of the cost of data models look at data cost in a segmented manner, we are proposing a comprehensive framework that analyse cost of data across lifecycle of data in a firm. We approach cost of data using the data lifecycle in a typical IB firm and observe cost and accompanying efficiency drivers from data creation to data destruction. 

Banks need a dynamic cost
and efficiency
measurement framework
for a moving target

How We Propose to Look at Data Cost 

The framework looks at the cost measures and cost optimization opportunities of data through the various data lifecycle phases:

Figure 1: Introduction to Assessment Framework

Key aspects of cost of data

The framework focuses on identifying various aspects of the cost of data, which is analysed over data lifecycle stages and dissected into cost drivers and accelerators: 

Cost drivers

Major components of cost including market data and external feeds, infrastructure and people.

Cost example

Access fees, user fees (20 users), non-display fees for 1 category and redistribution fees for NYSE, Nasdaq and CBOE is £89,500.

Cost accelerator

Specific factors that influence cost at a data lifecycle stage, such as data volume, amount of manual processes and platforms.

Key aspects of cost optimization

The framework, using the same data lifecycle stages, identifies key categories of efficiency improvement opportunities to optimize the cost of data. It helps identify efficiency levers and provides probable benefits for each phase of the data lifecycle. For cost optimization, the framework documents:

Efficiency levers

Major components of efficiency including outsourcing, automation and people skills.

Efficiency outcomes

Expected benefits from effective use of efficiency levers at a specific data lifecycle.

Efficiency domain

Identification of type of benefit, i.e., cost reduction, revenue increase or cost avoidance.

Application of the Framework

This framework has been applied to the data lifecycle of a typical investment banking firm. The costs and related accelerators and efficiency measures are based on experiences of the authors while working with various data teams of IBs in the UK.

Cost Assessment Framework

Below is an applied version of the cost assessment of the framework. 

Figure 2: Applied Cost Assessment Framework

Efficiency Assessment Framework

Below is an applied version of the efficiency assessment of the framework. 

Figure 3: Applied Efficiency Assessment Framework

We will now look at each stage of the data lifecycle and look at cost and efficiency measures in detail.

Sourcing

What is sourcing? Data Sourcing is the process of extracting and obtaining data from external or other internal sources to input into core systems. A data source may be the initial location where data is born or where physical information is first digitized, however even the most refined data may serve as a source, if another process accesses and utilizes it. Data sourcing contributes to c.15% of the overall cost of data across enterprise

Major Sourcing Activities

Market Data FeedsData Validation
Description: There are two ways in which data can be sourced, either through market feeds or creating the data internally. To source the data externally (market feeds), a data vendor must be selected. Data vendors traditionally fall into one of three categories: compiled, crowd-sourced, or self-reported. The way the data is gathered is correlated to the accuracy of the data.Description: Data validation is the process of examining the data available from an existing information source and collecting statistics or informative summaries about that data. In this step we ensure that the data sourced has the profile that meets the business needs. This includes checking the structure, granularity, age, frequency and availability of the data.
Cost Driver: Market Data Cost Driver: Technology & People
Cost Accelerator: Data Volume Cost Accelerator: Platform 
Efficiency Levers: People SkillsEfficiency Levers: Automation
Efficiency Outcomes: % decrease in costEfficiency Outcomes: % decrease in cost
Efficiency Domain: Cost ReductionEfficiency Domain: Cost Reduction
Example Cost: Access fees, user fees (20 users), non-display fees for 1 category and redistribution fees for NYSE, Nasdaq and CBOE is £89,500  Example Cost: Tools can be bought for anything from £300 to £1,500 with an additional subscription fee of 20% of purchase price

 

Major cost drivers and potential cost optimisation opportunities

Below is a breakdown of the major cost drivers in the storing stage and corresponding cost reduction opportunities that we believe exist by use of efficiency levers.

Potential cumulative cost saving
potential of 29% exists in the
Storage stage

Data storage cost saving levers

  • Normalisation of data model
  • De-duplication of data across data sets
  • Migration of data from on-prem infrastructure to cloud hosted platforms
  • Virtualisation of data storage infrastructure by combining physical devices into ‘logical pools’

 

  • Automation of data maintenance tasks such as data retrieval services and data reconciliations between different data tables
  • Consolidation of data management applications and software

 

Cleanse

What is Cleanse? Data cleansing is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database and refers to identifying incomplete, incorrect, inaccurate or irrelevant parts of the data and then replacing, modifying, or deleting the dirty or coarse data. Data cleansing may be performed interactively with data wrangling tools, or as batch processing through scripting. Data cleansing contributes to c.10% of the overall cost of data across enterprise

Major Cleanse Activities 

Data Auditing

Exception Management

  

Description: The data is audited with the use of statistical and database methods to detect anomalies and contradictions: this eventually indicates the characteristics of the anomalies and their locations. Several commercial software packages will let you specify constraints of various kinds and then generate code that checks the data for violation of these constraints. 

Description: The detection and removal of anomalies are performed by a sequence of operations on the data known as the workflow. In order to achieve a proper workflow, the causes of the anomalies and errors in the data have to be closely considered. the workflow is then executed after its specification is complete and its correctness is verified.

Cost Driver: Infrastructure / People

Cost Driver: Infrastructure

Cost Accelerator: Data Volume 

Cost Accelerator: Data Volume

Efficiency Levers: Outsourcing

Efficiency Levers: Outsourcing

Efficiency Outcomes: % decrease in cost

Efficiency Outcomes: % decrease in cost

Efficiency Domain: Cost Reduction

Efficiency Domain: Cost Reduction

Example Cost: Per project, organising the style, format and names can be done by a student assistant at level 1 salary (~£15 per hour) or a data manager at level 2 salary. (~£52 per hour). 

 

Major cost drivers and potential cost optimisation opportunities

Below is a breakdown of the major cost drivers in the cleanse stage and corresponding cost reduction opportunities that we believe exist by use of efficiency levers. 

Potential cumulative cost saving
potential of 19% exists in the Cleanse
stage

Data cleansing cost saving levers

De-duplication of the data at the point of ingestion e.g. perform check for duplicate transaction records in the data repositories before ingesting new transaction data

Automate data cleansing and exception management activities using automated data profiling tools and workflow applications

Consolidate data quality and workflow application licenses

Outsource data cleansing to third party

Standardise data models to improve quality of data

Access & Security

What is Access & Security? Data security means protecting digital data, such as those in a database, from destructive forces and from the unwanted actions of unauthorized users, such as a cyberattack or a data breach. This involves putting a robust set of standards and technologies that protect data from intentional or accidental destruction, modification or disclosure. Data Access and Security activities contribute c.8% of the overall cost of data across enterprise

Major Access & Security Activities 

Encryption

Security & Resilience

  

Description: Data encryption applies a code to every individual piece of data and will not grant access to encrypted data without an authorized key being given. Masking specific areas of data can protect it from disclosure to external malicious sources, and also internal personnel who could potentially use the data. For example, the first 12 digits of a credit card number may be masked within a database

Description: The most effective and efficient way to protect personal data is to use only (– no bring your own devices) UU approved IT hard- and software. The IT department offers different services. Resiliency is the ability of a server, network, storage system, or an entire data centre, to recover quickly and continue operating even when there has been an equipment failure, power outage or other disruption. 

Cost Driver: Infrastructure

Cost Driver: Infrastructure / People 

Cost Accelerator: Platform 

Cost Accelerator: Platform / Manual Process

Efficiency Levers: Automation

Efficiency Levers: People Skills, Automation

Efficiency Outcomes: % decrease in cost

Efficiency Outcomes: % decrease in cost

Efficiency Domain: Cost Avoidance

Efficiency Domain: Cost Reduction

Example Cost: Existing encryption services could be used at no costs. 

Example Cost: TTP (trusted third party), dependent on pseudonymisation type, ca. £1,000 – £30,000.

 

Major cost drivers and potential cost optimisation opportunities

Below is a breakdown of the major cost drivers in the Access & Security stage and corresponding cost reduction opportunities that we believe exist by use of efficiency levers.

Potential cumulative cost saving
potential of 17% exists in the Access
& Security stage

Data access and security cost saving levers

Building failover capability on cloud hosted platforms

Outsourcing of data encryption and security management to a managed data security service provider

Automation of data encryption and data access management

Improve data classification across the firm to optimise the cost of data security hardware and applications

Consolidation of data sources to optimise the coverage of data security and resilience requiremen

Data Processing

What is Data Processing? Data processing is the conversion of data into usable and desired form. This conversion or “processing” is carried out using a predefined sequence of operations either manually or automatically. Most of the processing is done by using computers and thus done automatically. The output or “processed” data can be obtained in different forms like image, graph, table, vector file, audio, charts or any other desired format depending on the software or method of data processing used. When done itself it is referred to as automatic data processing. Data processing activities contribute c.14% of the overall cost of data across enterprise

Major Data Processing Activities 

Merge Data

Enrich Data

  

Description: Once the data has been sourced, either through external vendors or internally, the data needs to be merged. This is the feeding of raw and sieved data for processing. If the input is not done properly or done wrong, then the result will be adversely affected. 

Description: Though there are many types of data enrichment, the two most common types are Demographic data enrichment and Geographic data enrichment. For example, demographic data involves obtaining data such as marital status and income level, and adding that into an existing customer data set. What matters with demographic enrichment is what your end purpose is. Data enriched in this way can be leveraged to improve targeting of marketing offers overall, which is vital in an age where personalized marketing holds sway.

Cost Driver: Infrastructure

Cost Driver: Infrastructure

Cost Accelerator: Platform 

Cost Accelerator: Platform 

Efficiency Levers: Outsourcing

Efficiency Levers: Outsourcing

Efficiency Outcomes: % decrease in cost

Efficiency Outcomes: % decrease in cost

Efficiency Domain: Cost Avoidance

Efficiency Domain: Cost Reduction

Example Cost: Cost varies from £300 to £1,000 per month for data enrichment tools

Major cost drivers and potential cost optimisation opportunities

Below is a breakdown of the major cost drivers in the Data Processing stage and corresponding cost reduction opportunities that we believe exist by use of efficiency levers.

Potential cumulative cost saving
potential of 22% exists in the Data
Processing stage

Data archive and destruction cost saving levers

Transition archive of the data on cloud hosted platforms

Automate data destruction and archive activities as per pre-agreed business rules

Improve data classification across the architecture for automated data archive/purge capability

Apply data compression technique to reduce the archive storage space

Consolidate data purge and data retrieval application licenses

What should Investment banks do to reduce cost of data?

Looking across silos: Firms need to look at the cost of data across its lifecycle across the organisation and not just a specific business unit. This creates a complete picture of the cost of data for the entire firm and helps executives to focus on larger cost drivers.

Deeper understanding of data cost: Firms shouldn’t limit themselves to document cost drivers, but they should go a step further and identify cost accelerators. This enables stakeholders understand the nature and structure of data costs and helps in efficiency management.

Identification of efficiencies: While some approaches to data costs are limited to documentation of costs only, the proposed framework is extended to cover the identification of efficiencies and identification of cost optimisation opportunities. 

Actions that IB firms can start taking

We recommend the following 5 critical actions that firms can take to optimise the cost of data

Technology & Automation

To avoid falling behind, banks need to digitize more functions and processes. The initial investment will drive up costs, but banks can halve the number of employees in back-office and support functions using technology that is already available, such as artificial intelligence and robotics. Indeed, given the direction in which these technologies are advancing, banks could aim to have a back office with no employees and realize spectacular operational cost savings.


Outsourcing

Outsourcing data storage tasks to data cleansing, data processing and data archiving can help firms to reduce fixed people and licencing costs. Scaling outsourcing tasks can help in managing sudden rise in data demands.

People & Skills

Upskilling existing workforce is essential to improve efficiency. Inculcating best practices and better skills in the workforce will not only reduce task completion times but will also reduce human errors which are a major source of re-work. Upskilling is also an opportunity to rationalise and right-size the workforce leading to further cost saving benefits.

Standardised Data Models

Consolidating firm-wide data models and visualisation licences can also lead to costs saving opportunities. Firms need to invest in re-usable data models that can help multiple business units with their data needs.

Market Data Rationalisation

By far, the biggest data cost savings for Firms exist in rationalisation of market data. This can be done by smart contract negotiations, creating better data catalogues for business units and investing in solutions such as creation of data lake for market data.

As margins and revenues flatten across the industry, IBs are now under pressure to keep investing in managing larger amounts of data in an efficient manner. Although one might be tempted to think the cost of data is a simple addition of infrastructure and people cost however, the reality is that the true cost of data as it is created and maintained in the bank is dynamic with hidden costs that, if not carefully documented in a structed manner, may escape any form of efficiency redressal exercise. In short, firms will not be able to reduce the cost of data if they cannot appropriately measure the true cost of data. Added to the complexity is that fact that the cost of data is moving target as data flows from phases of creation to transformation to destruction.

firms will not be able to reduce the
cost of data if they cannot
appropriately measure the true cost of data