03.11.2019

8 Steps of a Machine Learning Project

03.11.2019

By Kristina Fan, CEO and Founder, and Roy Lowrance, Chief Scientist, 7 Chord

Kristina Fan, 7 Chord

Previously we wrote about when to bother with machine learning. Once that bridge has been crossed, the question becomes: how much time and money should one budget for an experimental machine learning project? The real answer will drive your boss bananas, so let’s keep this between us.

Experiment. Fail. Learn. Repeat.

Building a machine learning system is a highly iterative and experimental process. While it does go through the well-known phases we describe below, what is found in one phase often informs decisions previously made in earlier phases, and that means revisiting and refining your earlier decisions.

Makes sense, right? Well, in many organizations with traditional linear software development mindset, the backward movement is interpreted as failure. Therefore, how you structure the program and build expectations within your organization will be critical. This cultural conflict maybe the biggest reason why some organizations are more successful at adopting AI than others.

When you consider the cost of building an ML system, there are at least four main budgetary considerations: a) outright cost of development; b) computing power; c) risk of execution or project extension; d) maintenance. The risk of project extension is very hard to estimate – and is the strongest argument for buying machine learning application instead of building it in-house.

Method to the Madness.

If you do decide to “try machine learning at home”, here’s the actual roadmap we followed at 7 Chord along with the effort it took us to build the commercial version of BondDroidTM 2.0 which we have ultimately soft-launched in July 2018.

Overall Project Timeline: Jan 2016 – June 2018

BondDroid Beta: Jan 2016 – June 2017. Fully functional batch system.

BondDroid 1.0: Jun 2017 – Jan 2018. Fully functional real-time system, limited scope and delivery methods, basic cyber-security.

BondDroid 2.0: Jan 2018 – July 2018. Enterprise-ready cloud-based app.

BondDroid 3.0: Summer 2019. Stay tuned!

Through each of the 3 phases, we have followed the following steps, although the last 3 were refinements rather than brand new decisions:

Pick the target: Determine the decisions that the modeling exercise will inform and ensure that those decisions can be implemented and will be economically viable. This is when the problem statement is agreed upon, and the accuracy objectives are set.

Pick the yardstick for success: The final product should be fit for purpose. Picking the accuracy measure that is achievable yet sufficient to add economic value is very important.

Define Data Acquisition Strategy: Identify critical data sets and analyze the ecosystem of your data providers. Can this data be sourced elsewhere? What is your competitive relationship with the key data providers? Can the dataset be replaced later and if it is unique, what are the terms of your agreement with the data provider? Does your organization have unique sources of data because of your informational advantage?

Prepare Data: This least enjoyable step is critical. Most reduce it to mechanics of data cleaning, normalization and defining your data management framework. But the designer needs to determine to what extent the data reflects the economic reality, so that noise can be filtered out in a systematic manner. Also, if your system works in real-time like ours, it is of paramount importance to make sure that the noise filtering procedure can handle uncertainty with regards to validity of your data.

Build Model: The designer picks a machine learning method and ascertains how accurate, computationally expensive and latent it is likely to be in the operational environment.

Test and Evaluate: Test your system in the operational environment. Because improvements to the model will be ongoing, it is important to set a clear back-testing strategy and roll-back policy.

Deploy: Move the prediction engine from the evaluation environment to the operational environment. Continue to measure accuracy. Prepare and implement contingency plans to be used as the model’s performance degrades.

Maintain: Many forget to budget for this one. Implement procedures to ensure that the model is performing as expected. In our case this includes real-time and end-of-day accuracy reports, as well as voluminous diagnostics that constantly measure the health of the system.

Our advice is to iterate around the project cycle as quickly as possible. The team with the most iterations, all other things being equal, has the best chance of producing the best model. Start with a crude prototype that will be easy to implement and move to the next level from there. Even if you are planning to outsource ML design, building a crude prototype in-house will be critical for successful engagement.

It is important that your design team includes highly-skilled software engineers from the start, as this will ensure that the project is implementable in practice. We will discuss the optimal composition of your machine learning squad in the next part of the series: Who to hire if you are building machine learning in-house?

Markets Media Follow

Digital publisher covering trading & technology in capital markets. @marketsmedia @traders_tweets @FIXGlobalOnline @TheBondDesk @BestExecution @DerivSource

Markets Media @marketsmedia ·

13 Sep

As Technology Evolves, Asset Managers Adapt and Innovate

Reply on Twitter 1702005834081821046 Retweet on Twitter 1702005834081821046 Like on Twitter 1702005834081821046 3 Twitter 1702005834081821046

Markets Media @marketsmedia ·

13 Sep

Citi Changes Organizational Structure

Reply on Twitter 1701949103419048274 Retweet on Twitter 1701949103419048274 Like on Twitter 1701949103419048274 Twitter 1701949103419048274

Markets Media @marketsmedia ·

13 Sep

SEC Charges Virtu for Disclosures Relating to Information Barriers

Reply on Twitter 1701875573344157802 Retweet on Twitter 1701875573344157802 Like on Twitter 1701875573344157802 Twitter 1701875573344157802

Markets Media @marketsmedia ·

13 Sep

ICE Futures Singapore Partners with CoinDesk Indices

Reply on Twitter 1701870991540895961 Retweet on Twitter 1701870991540895961 Like on Twitter 1701870991540895961 Twitter 1701870991540895961

Daily Email Feature

Trends in Trading

Insights from two recent industry conferences provide a snapshot of the state of innovation on the trading des...

04.16.2024 By Markets Media ,
From The Markets

Linedata Acquires Ai Startup DreamQuark

DreamQuark provides enhanced advising, strengthened compliance, and smart document retrieval.

04.09.2024 By Terry Flanagan , Managing Editor
Daily Email Feature

Shining Light on Liquidity Lamp

With Andy Lee, Director of Quantitative Research, Exegy

03.06.2024 By Traders Magazine ,
Daily Email Feature

2024: A Year for Doubling Down on Data-Driven Decision Makin...

Banks can seize an advantage by harnessing the power of untapped transaction data.

12.18.2023 By Matthew Hodgson, Mosaic Smart Data ,
From The Markets

Liquidnet Taps BondIT for AI-Driven Credit Research

Collaboration aims to empower traders to anticipate market trends and mitigate credit risk.

12.18.2023 By Terry Flanagan , Managing Editor

Want the latest news on securities markets -- FREE?

8 Steps of a Machine Learning Project

NEWSLETTER SIGN UP

Related articles

Trends in Trading

Linedata Acquires Ai Startup DreamQuark

Shining Light on Liquidity Lamp

2024: A Year for Doubling Down on Data-Driven Decision Makin...

Liquidnet Taps BondIT for AI-Driven Credit Research