My Frame Work
How Investment Banks Can Optimise Their Cost of Data
The surge in both creation and usage of data at Investment Banks means that banks have to make substantial investments in the way they manage and maintain their data. As banks start to realise the true value and worth of the data that they have, they are also cognisant of the price they are paying for the data. A look at how data cost is being managed in the industry shows that Banks need to measure the cost of data and optimise that cost in a wholistic manner i.e. cost from data creation to data destruction.
Data is your most important asset but what does it cost?
‘Data’ is widely recognised as the primary fuel that drives the economic engines of the new world. The financial services industry is increasingly realising that data sits at the very heart of their success. At the same time, they are also realising that issues with data sits at the heart of every crisis they have faced which is the reason for new / expanded roles like those of CDOs. Data serves as a lifeblood for Investment Banks who rely on huge streams of market data to churn the trading floors.
Over the last 3 years, banks have
been the biggest producer and
consumer of financial data
Both the demand and investment on data have been increasing for Investment Banks. Over the last 3 years, banks have been the biggest producer and consumer of financial data. Increase in regulatory scrutiny have further increased the ‘data demands’ across banks.
While data is an asset for all firms, the key is to control the cost of creating and maintaining this asset. It will not be worth much if it becomes too expensive to mine and maintain. In my experience managing large-scale data projects across major investment banks, I have observed the need for a holistic approach to benchmarking data costs and identifying cost efficiencies.
This article looks at the cost of data for Investment Banks and tries to identify a wholistic framework which IBs can use to benchmark their cost of data and to identify areas of creating data cost efficiencies. This article relies on the authors experience in managing various large-scale data projects across major investment banks.
Cost of data is a significant cost for IBs
The cost of data is a significant cost for an organisation as a portion of its overall cost structure. As per various discussions with 8 G-SIBs, cost of data contributes to approximately 23to 25% of their overall cost. Specially for IBs, cost of data is increasing at exponential rate across the following three key areas:
Sourcing |
Infrastructure |
People |
Cost of market and external data feeds |
Cost of infrastructure to ingest, store, process and distribute data |
Cost of people engaged in creating and modifying the data. In our experience, IB firms have been reducing 6-10 percent of their workforce involved with data management |
Cost of data is increasing at
exponential rate
Considering the current business climate for the banking industry, a dedicated cost optimisation effort is required to contain the cost of data at investment banks.
Who should care for cost of data?
The short answer is; Everyone. Margins for IB firms have remained flat. A recent look at IB business in 2019 shows revenue from the equities business is drastically reducing (down by 10%) while revenues from rebounded are down by 3% while investment banking revenues fell 3%. Average Return on Equity is 9.3% for major investment banks against an aspiration of 13% to15%. While operating costs for business are increasing, other players are emerging with lower cost base (including lower cost of data).
Sourcing |
Infrastructure |
People |
Cost of market and external data feeds |
Cost of infrastructure to ingest, store, process and distribute data |
Cost of people engaged in creating and modifying the data. In our experience, IB firms have been reducing 6-10 percent of their workforce involved with data management |
Cost of data is increasing at
exponential rate
Revenues and margins are flat
across industry.
Banks need to focus on measuring
and reducing their cost
With macroeconomic indicators not showing clear signs of significant improvement, it can be reliably stated that the flat margin and increasing operating cost of business trend is likely to continue in the near future. This means reducing cost of data becomes a firm-wide goal rather than just a KPI for the CDO or CIO.
An approach to measure and optimise cost of data
While most of the cost of data models look at data cost in a segmented manner, we are proposing a comprehensive framework that analyse cost of data across lifecycle of data in a firm. We approach cost of data using the data lifecycle in a typical IB firm and observe cost and accompanying efficiency drivers from data creation to data destruction.
Banks need a dynamic cost
and efficiency
measurement framework
for a moving target
How We Propose to Look at Data Cost
The framework looks at the cost measures and cost optimization opportunities of data through the various data lifecycle phases:
Figure 1: Introduction to Assessment Framework
Key aspects of cost of data
The framework focuses on identifying various aspects of the cost of data, which is analysed over data lifecycle stages and dissected into cost drivers and accelerators:
Cost drivers | Major components of cost including market data and external feeds, infrastructure and people. |
Cost example | Access fees, user fees (20 users), non-display fees for 1 category and redistribution fees for NYSE, Nasdaq and CBOE is £89,500. |
Cost accelerator | Specific factors that influence cost at a data lifecycle stage, such as data volume, amount of manual processes and platforms. |
Key aspects of cost optimization
The framework, using the same data lifecycle stages, identifies key categories of efficiency improvement opportunities to optimize the cost of data. It helps identify efficiency levers and provides probable benefits for each phase of the data lifecycle. For cost optimization, the framework documents:
Efficiency levers | Major components of efficiency including outsourcing, automation and people skills. |
Efficiency outcomes | Expected benefits from effective use of efficiency levers at a specific data lifecycle. |
Efficiency domain | Identification of type of benefit, i.e., cost reduction, revenue increase or cost avoidance. |
Application of the Framework
This framework has been applied to the data lifecycle of a typical investment banking firm. The costs and related accelerators and efficiency measures are based on experiences of the authors while working with various data teams of IBs in the UK.
Cost Assessment Framework
Below is an applied version of the cost assessment of the framework.
Figure 2: Applied Cost Assessment Framework
Efficiency Assessment Framework
Below is an applied version of the efficiency assessment of the framework.
Figure 3: Applied Efficiency Assessment Framework
We will now look at each stage of the data lifecycle and look at cost and efficiency measures in detail.
Sourcing
What is sourcing? Data Sourcing is the process of extracting and obtaining data from external or other internal sources to input into core systems. A data source may be the initial location where data is born or where physical information is first digitized, however even the most refined data may serve as a source, if another process accesses and utilizes it. Data sourcing contributes to c.15% of the overall cost of data across enterprise
Major Sourcing Activities
Market Data Feeds | Data Validation |
Description: There are two ways in which data can be sourced, either through market feeds or creating the data internally. To source the data externally (market feeds), a data vendor must be selected. Data vendors traditionally fall into one of three categories: compiled, crowd-sourced, or self-reported. The way the data is gathered is correlated to the accuracy of the data. | Description: Data validation is the process of examining the data available from an existing information source and collecting statistics or informative summaries about that data. In this step we ensure that the data sourced has the profile that meets the business needs. This includes checking the structure, granularity, age, frequency and availability of the data. |
Cost Driver: Market Data | Cost Driver: Technology & People |
Cost Accelerator: Data Volume | Cost Accelerator: Platform |
Efficiency Levers: People Skills | Efficiency Levers: Automation |
Efficiency Outcomes: % decrease in cost | Efficiency Outcomes: % decrease in cost |
Efficiency Domain: Cost Reduction | Efficiency Domain: Cost Reduction |
Example Cost: Access fees, user fees (20 users), non-display fees for 1 category and redistribution fees for NYSE, Nasdaq and CBOE is £89,500 | Example Cost: Tools can be bought for anything from £300 to £1,500 with an additional subscription fee of 20% of purchase price |
Major cost drivers and potential cost optimisation opportunities
Below is a breakdown of the major cost drivers in the storing stage and corresponding cost reduction opportunities that we believe exist by use of efficiency levers.
Potential cumulative cost saving
potential of 29% exists in the
Storage stage
Data storage cost saving levers
- Normalisation of data model
- De-duplication of data across data sets
- Migration of data from on-prem infrastructure to cloud hosted platforms
- Virtualisation of data storage infrastructure by combining physical devices into ‘logical pools’
- Automation of data maintenance tasks such as data retrieval services and data reconciliations between different data tables
- Consolidation of data management applications and software
Cleanse
What is Cleanse? Data cleansing is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database and refers to identifying incomplete, incorrect, inaccurate or irrelevant parts of the data and then replacing, modifying, or deleting the dirty or coarse data. Data cleansing may be performed interactively with data wrangling tools, or as batch processing through scripting. Data cleansing contributes to c.10% of the overall cost of data across enterprise
Major Cleanse Activities
Data Auditing | Exception Management |
Description: The data is audited with the use of statistical and database methods to detect anomalies and contradictions: this eventually indicates the characteristics of the anomalies and their locations. Several commercial software packages will let you specify constraints of various kinds and then generate code that checks the data for violation of these constraints. | Description: The detection and removal of anomalies are performed by a sequence of operations on the data known as the workflow. In order to achieve a proper workflow, the causes of the anomalies and errors in the data have to be closely considered. the workflow is then executed after its specification is complete and its correctness is verified. |
Cost Driver: Infrastructure / People | Cost Driver: Infrastructure |
Cost Accelerator: Data Volume | Cost Accelerator: Data Volume |
Efficiency Levers: Outsourcing | Efficiency Levers: Outsourcing |
Efficiency Outcomes: % decrease in cost | Efficiency Outcomes: % decrease in cost |
Efficiency Domain: Cost Reduction | Efficiency Domain: Cost Reduction |
Example Cost: Per project, organising the style, format and names can be done by a student assistant at level 1 salary (~£15 per hour) or a data manager at level 2 salary. (~£52 per hour). |
Major cost drivers and potential cost optimisation opportunities
Below is a breakdown of the major cost drivers in the cleanse stage and corresponding cost reduction opportunities that we believe exist by use of efficiency levers.
Potential cumulative cost saving
potential of 19% exists in the Cleanse
stage
Data cleansing cost saving levers
De-duplication of the data at the point of ingestion e.g. perform check for duplicate transaction records in the data repositories before ingesting new transaction data
Automate data cleansing and exception management activities using automated data profiling tools and workflow applications
Consolidate data quality and workflow application licenses
Outsource data cleansing to third party
Standardise data models to improve quality of data
Access & Security
What is Access & Security? Data security means protecting digital data, such as those in a database, from destructive forces and from the unwanted actions of unauthorized users, such as a cyberattack or a data breach. This involves putting a robust set of standards and technologies that protect data from intentional or accidental destruction, modification or disclosure. Data Access and Security activities contribute c.8% of the overall cost of data across enterprise
Major Access & Security Activities
Encryption | Security & Resilience |
Description: Data encryption applies a code to every individual piece of data and will not grant access to encrypted data without an authorized key being given. Masking specific areas of data can protect it from disclosure to external malicious sources, and also internal personnel who could potentially use the data. For example, the first 12 digits of a credit card number may be masked within a database | Description: The most effective and efficient way to protect personal data is to use only (– no bring your own devices) UU approved IT hard- and software. The IT department offers different services. Resiliency is the ability of a server, network, storage system, or an entire data centre, to recover quickly and continue operating even when there has been an equipment failure, power outage or other disruption. |
Cost Driver: Infrastructure | Cost Driver: Infrastructure / People |
Cost Accelerator: Platform | Cost Accelerator: Platform / Manual Process |
Efficiency Levers: Automation | Efficiency Levers: People Skills, Automation |
Efficiency Outcomes: % decrease in cost | Efficiency Outcomes: % decrease in cost |
Efficiency Domain: Cost Avoidance | Efficiency Domain: Cost Reduction |
Example Cost: Existing encryption services could be used at no costs. | Example Cost: TTP (trusted third party), dependent on pseudonymisation type, ca. £1,000 – £30,000. |
Major cost drivers and potential cost optimisation opportunities
Below is a breakdown of the major cost drivers in the Access & Security stage and corresponding cost reduction opportunities that we believe exist by use of efficiency levers.
Potential cumulative cost saving
potential of 17% exists in the Access
& Security stage
Data access and security cost saving levers
Building failover capability on cloud hosted platforms
Outsourcing of data encryption and security management to a managed data security service provider
Automation of data encryption and data access management
Improve data classification across the firm to optimise the cost of data security hardware and applications
Consolidation of data sources to optimise the coverage of data security and resilience requiremen
Data Processing
What is Data Processing? Data processing is the conversion of data into usable and desired form. This conversion or “processing” is carried out using a predefined sequence of operations either manually or automatically. Most of the processing is done by using computers and thus done automatically. The output or “processed” data can be obtained in different forms like image, graph, table, vector file, audio, charts or any other desired format depending on the software or method of data processing used. When done itself it is referred to as automatic data processing. Data processing activities contribute c.14% of the overall cost of data across enterprise
Major Data Processing Activities
Merge Data | Enrich Data |
Description: Once the data has been sourced, either through external vendors or internally, the data needs to be merged. This is the feeding of raw and sieved data for processing. If the input is not done properly or done wrong, then the result will be adversely affected. | Description: Though there are many types of data enrichment, the two most common types are Demographic data enrichment and Geographic data enrichment. For example, demographic data involves obtaining data such as marital status and income level, and adding that into an existing customer data set. What matters with demographic enrichment is what your end purpose is. Data enriched in this way can be leveraged to improve targeting of marketing offers overall, which is vital in an age where personalized marketing holds sway. |
Cost Driver: Infrastructure | Cost Driver: Infrastructure |
Cost Accelerator: Platform | Cost Accelerator: Platform |
Efficiency Levers: Outsourcing | Efficiency Levers: Outsourcing |
Efficiency Outcomes: % decrease in cost | Efficiency Outcomes: % decrease in cost |
Efficiency Domain: Cost Avoidance | Efficiency Domain: Cost Reduction |
Example Cost: Cost varies from £300 to £1,000 per month for data enrichment tools |
Major cost drivers and potential cost optimisation opportunities
Below is a breakdown of the major cost drivers in the Data Processing stage and corresponding cost reduction opportunities that we believe exist by use of efficiency levers.
Potential cumulative cost saving
potential of 22% exists in the Data
Processing stage
Data archive and destruction cost saving levers
Transition archive of the data on cloud hosted platforms
Automate data destruction and archive activities as per pre-agreed business rules
Improve data classification across the architecture for automated data archive/purge capability
Apply data compression technique to reduce the archive storage space
Consolidate data purge and data retrieval application licenses
What should Investment banks do to reduce cost of data?
Looking across silos: Firms need to look at the cost of data across its lifecycle across the organisation and not just a specific business unit. This creates a complete picture of the cost of data for the entire firm and helps executives to focus on larger cost drivers.
Deeper understanding of data cost: Firms shouldn’t limit themselves to document cost drivers, but they should go a step further and identify cost accelerators. This enables stakeholders understand the nature and structure of data costs and helps in efficiency management.
Identification of efficiencies: While some approaches to data costs are limited to documentation of costs only, the proposed framework is extended to cover the identification of efficiencies and identification of cost optimisation opportunities.
Actions that IB firms can start taking
We recommend the following 5 critical actions that firms can take to optimise the cost of data
Technology & Automation
To avoid falling behind, banks need to digitize more functions and processes. The initial investment will drive up costs, but banks can halve the number of employees in back-office and support functions using technology that is already available, such as artificial intelligence and robotics. Indeed, given the direction in which these technologies are advancing, banks could aim to have a back office with no employees and realize spectacular operational cost savings.
Outsourcing
Outsourcing data storage tasks to data cleansing, data processing and data archiving can help firms to reduce fixed people and licencing costs. Scaling outsourcing tasks can help in managing sudden rise in data demands.
People & Skills
Upskilling existing workforce is essential to improve efficiency. Inculcating best practices and better skills in the workforce will not only reduce task completion times but will also reduce human errors which are a major source of re-work. Upskilling is also an opportunity to rationalise and right-size the workforce leading to further cost saving benefits.
Standardised Data Models
Consolidating firm-wide data models and visualisation licences can also lead to costs saving opportunities. Firms need to invest in re-usable data models that can help multiple business units with their data needs.
Market Data Rationalisation
By far, the biggest data cost savings for Firms exist in rationalisation of market data. This can be done by smart contract negotiations, creating better data catalogues for business units and investing in solutions such as creation of data lake for market data.
As margins and revenues flatten across the industry, IBs are now under pressure to keep investing in managing larger amounts of data in an efficient manner. Although one might be tempted to think the cost of data is a simple addition of infrastructure and people cost however, the reality is that the true cost of data as it is created and maintained in the bank is dynamic with hidden costs that, if not carefully documented in a structed manner, may escape any form of efficiency redressal exercise. In short, firms will not be able to reduce the cost of data if they cannot appropriately measure the true cost of data. Added to the complexity is that fact that the cost of data is moving target as data flows from phases of creation to transformation to destruction.