Krzysztof Sopyla, Head of Machine Learning and Data Engineering at STX Next, offers advice on addressing the complexities of undertaking a machine learning project
The popularity of artificial intelligence (AI) and machine learning (ML) has skyrocketed in recent years. In fact, STX Next’s 2021 CTO report found that around two-thirds of businesses have now implemented ML in some form and a further three-quarters of CTOs see it as the most likely technology to come to prominence in the next two years.
According to Krzysztof Sopyla, Head of Machine Learning and Data Engineering at STX Next, maximising the potential of machine learning means being fully aware of the complexity and unique challenges that come with these types of projects, and taking the necessary steps to overcome them.
Sopyla said: “The volume of structured and unstructured data created every day has grown exponentially in recent years, in large part due to the increased usage of technology to support business. For businesses and end users, AI- and ML-powered products can assist in processing it, thus streamlining operations, eliminating human error and saving on resources.
“While the benefits of these projects are widely accepted, for developers and data engineers there are a number of crucial differences between an ML project and a regular IT project.”
Below, Sopyla outlines six challenges that can be encountered in a typical ML project and advises on how best to combat these.
1. Understanding the business goal and communicating it well
Sopyla: “Before coding begins, it’s vital to have a detailed understanding of what is desired by the client and be able to communicate that to the rest of the team who might not always be present at every meeting.
“This relates to understanding a business’s purpose, budgets and being able to recommend the best course of action. There could be a simple solution to a client’s problem that takes a project in a completely new direction. At the same time, budget restraints could mean that you need to plan to cut corners in some areas and prioritise others.”
2. Document everything
Sopyla: “Documenting properly is key to budgeting and knowing exactly where money is spent. Where projects might be funded from different sources, it’s important to keep a record of what costs have been incurred.
“Having the right documentation also makes it easier to perform maintenance on projects further down the line.”
3. Data availability, bad data and bias in data
Sopyla: “Businesses will often use a variety of technologies and software that create data in its different shapes, forms and masses. ML is then crucial to organising it, understanding it and using it in a way that can add value to a business.
“The reality is, when we first begin working with customers, the right data is often not available or in the wrong format. Collecting and annotating data is a very expensive process, and we rely on data engineers to put this right. This is why it’s so crucial that they have a good understanding of the project itself, and have taken steps to prepare their data accordingly so that the best possible results are achieved.”
4. Utilise experts and share wisdom
Sopyla: “If at all possible, ensure that you have access to an expert in the field. In an ML project, the code will often replace something that was previously performed by a human. It’s important then that you have someone who can confidently tell you whether your code can perform to the expected standard.
“There is no feedback like feedback from someone who will actually use the product every day.”
5. Setting the right goals and keeping it simple
Sopyla: “Managing the expectations of your client as well as those placed on your teams is a delicate task. While it’s a good idea to set goals for yourself and the people you’re working with, be cautious because you can quickly find yourself burned out if you fall behind.
“It’s worth being aware of the limitations and the fact that for different ML tasks, the simpler the solution, the better. This will mean no big data processing, no complicated transformations of that data before it goes into the model and ultimately, a much simpler implementation.”
6. Performance management is key
Sopyla: “When deploying a model to production, performance matters. Once deployed, take time to measure and test performance and use this data to make any necessary improvements and changes.
“Work doesn’t stop after a product is deployed. Occasionally you’ll have to prioritise speed or accuracy but this can be managed according to judgement: both yours and that of your clients and relevant experts.”