It's easy to burn time, money and the good will of stakeholders when starting with Machine Learning. In this article I give 6 tips that can help save your Machine Learning project before it has even begun!
There is a lot of hype around Machine Learning and a lot of misinformation. This increases the risk of Machine Learning projects failing due to unreasonable expectations, lack of skills and incorrect choice of use cases.
To help you avoid some of the most common errors I've put together the following tips. They are technology agnostic and focus on what you'll need to have in place before any code is written.
The tips are as follows:
Machine Learning is all about enabling computers to build their own rules for working with data, without being explicitly programmed. The difference between Machine Learning and Traditional Programming is summed up in the following diagram:
Traditional Programming requires a developer to manually define the Rules for processing your Data. In Machine Learning the computer creates it's own Rules based on the Data and optionally your historical Output data.
The creation of Rules in Machine Learning is performed by the application of algorithms and applied statistics. These algorithms allow Machine Learning to find patterns in your Data, which then can be used to give insight and predict new Output's.
Machine Learning is often conflated with Artificial Intelligence or Deep Learning. But how do these terms actually relate to one another?
As you can see, the terms are hierarchically linked:
Why are these distinctions important to understand?
The next step is to understand what you can actually achieve with Machine Learning. To do this you should know the three basic types of Machine Learning outlined below:
To further explain:
Why is all of this important? By understanding the common applications you can begin to understand how Machine Learning can be applied to your organization! I would personally begin by looking into Supervised and Unsupervised Learning. Reinforcement Learning is pretty hardcore, and you'll need a lot of data!
As mentioned earlier in this article, Machine Learning is the application of algorithms and statistics to data. Therefore, In order for your Machine Learning initiative to be successful, you'll need to have access to the correct skills. The following diagram illustrates this:
A common mistake is to under-evaluate the importance of the Math and Statistics knowledge when working with Machine Learning. The success of your initiative will depend on choosing the correct algorithm for your solution, knowing how to massage your data and finally correctly tuning the hyperparameters of your chosen algorithm. These tasks all require a solid grounding in Math and Statistics.
When working with Machine Learning I strongly recommend that you have a Data Scientist on board.
A good Data Scientist will have the combination of skills described above, and often have a different academic background to Computer Scientists. The Data Scientists at Bouvet Oslo include Astrophysicists, Mathematicians, and Statisticians, all of whom possess solid programming skills in addition to strong Math and Statistics knowledge.
The role of the Data Scientist becomes even more clear when one looks at the process for working with Machine Learning.
A Machine Learning project will normally go through the following steps:
This process is an amalgamation of the common steps that can be found in many Data Science / Machine Learning processes, such as Crisp DM, Microsoft's Team Data Science Process and Bouvet's own Data Science Methodology.
The steps are as follows:
Some more notes on this process:
You may have noticed that this process doesn't mention deploying the model. That's because I chose to focus on the Data Scientists job. Deployment is someone else's job - as described in the next section!
A common mistake is to focus on recruiting Data Scientists to your Machine Learning team, without thinking of the other roles that are required.
But how will you get your Machine Learning algorithm in production? Who will build your data pipelines? What about visualizing the results? How will you ensure that your team members get the support it needs? And what about evangelizing the team and its results in your organization?
To address these myriad concerns, I would argue that any Machine Learning project has at least the following 4 roles:
These roles are summed up in the below image and discussed in more detail in my previous article about Data Science teams.
Of these roles, I would argue that the Data Engineer role is currently the hardest one to fill in today's market - and also the easiest way for a Computer Scientist to get involved in Machine Learning.
When starting out with Machine Learning you need to accept that it is hard and that there are no guarantees that your specific hypothesis or use case can be solved with Machine Learning.
Aim for the low hanging fruits. Find a simple use case and try to solve it with simple algorithms such as Linear Regression, Decision Trees or K-Means Clustering. By doing so you can test your team and your process to see if everything works as planned. Make the necessary adjustments and try again.
If your initial projects fail, don't be disillusioned. Turn your failures into a learning process. If the initial projects succeed, build up to more advanced use cases and Machine Learning techniques.
Finally, remember that the Machine Learning Process (see Tip 4) takes time, especially the initial steps (i.e. defining a Question, collecting Data and Exploratory Data Analysis). Work with your team to identify quick wins along the way. Such quick wins can be communicated to the rest of your organization, giving the team a positive reputation and increasing their motivation.
In summary, my 6 tips for getting started with Machine Learning were:
Addressing these early will give your Machine Learning initiative a solid start.
Do you disagree with my suggestions? Or perhaps you have something to add? Feel free to add your opinion in the comments below! Alternatively get in contact if you would like to talk more about Machine Learning or Data Science!
Mark West leads the Data Science team at Bouvet Oslo. In his spare time, he also leads the Norwegian Java User Group.