Accessibility Tools

  • Content scaling 100%
  • Font size 100%
  • Line height 100%
  • Letter spacing 100%

{text msg} launches the Machine Learning Catalogue:{br}understanding machine learning and developing effective solutions

{text msg} launches the Machine Learning Catalogue:{br}understanding machine learning and developing effective solutions



Thursday, 14 March 2019

Machine learning solutions are developed using a variety of algorithms that are trained using a variety of methods. Consultants, developers and architects need an overview of the diverse options available to them to enable them to meet specific demands. This is where msg’s Machine Learning Catalogue Machine Learning Catalogue comes into play: an industry-neutral list of the various building blocks that contains an explanation of each one as well as an overview of the interrelationships between them. The Machine Learning Catalogue can also be used as a reference tool in which to look up methods encountered in articles, programs or lectures.

An interview with Richard Hudson, the creative mind behind the Machine Learning Catalogue, about the concept.

Why did msg decide to put together the Machine Learning Catalogue?

In our machine learning projects we realized that finding the right algorithm was anything but simple. Although the Internet offers a great deal of information about machine learning methods, it is hard to find the right answers quickly and effectively. Once we realized we were not the only people struggling with this issue, but that many other developers were too, we made it our goal to put together a compilation that would have three key features: structure, relationship and application view.

What practical advantages does the Machine Learning Catalogue offer its users?

The catalogue is structured along clear formal lines that are documented by a meta-model. The terms we use have clear definitions that are employed consistently throughout. The lack of such consistency is often a problem when dealing with other sources: a term like “regression” can refer to a specific algorithm, to a group of algorithms or to a business function. That makes it very hard for beginners to gain a quick overview. The catalogue also lists synonyms and subtypes: there are algorithms with as many as five different names and as many as 14 subtypes. Although the interrelationships between the various terms are the key to understanding them, most sources do not make them explicit. In addition, the catalogue examines the techniques from a user perspective. Wikipedia articles do exist for many of the algorithms, but they mostly focus on providing a mathematical explanation of the inner workings of each algorithm: they explain how to program it. In day-to-day development work, on the other hand, readers rarely wish to implement a technique themselves. Instead, they want to know when it is helpful, what its pros and cons are and what they need to be aware of when using it. Once they have selected a technique, they usually make use of existing software libraries at least for the more mathematically complex aspects of the task.

Can the components described be used to create any conceivable machine learning solution?

That is certainly not the case, and never will be. Machine learning is a highly creative process and the best solution for any given problem often arises from an innovative modification or combination of existing techniques. This means the algorithms described should not be understood as fixed recipes, but rather as archetypes. At the same time, the fact that our archetypes have been suggested by a number of different experienced colleagues gives us confidence that we will have collected the most important points by now.

Can the Machine Learning Catalogue be used to find out which building blocks are of central importance to a specific machine learning solution such as predictive maintenance?

The catalogue is not built around that principle and is unable to offer such information because a term like predictive maintenance can cover a wide range of different use cases; the appropriate algorithms can vary considerably depending on the specific problem being addressed. The use cases listed in the catalogue are intended rather as a source of inspiration. Once a concrete problem has been identified from that inspiration, the next step is to identify the learning style and the input and output data types. By filtering on these criteria, the user can obtain an easy-to-use list of the algorithms that might be relevant for the task at hand. The comments and tips in the descriptions inform the decision as to which of the algorithms from this list are worth trying out.

Are there already plans to expand the content of the catalogue and what form do they take?
The version of the Machine Learning Catalogue that is currently available online already represents the results of the second iteration. The first was created in 2017. The catalogue was then expanded and revised in 2018 based on input from a variety of colleagues, particularly suggestions for new components to add. The catalogue should continue to undergo development and to grow; the plan is to add new components successively as we become aware of them.