[
back
]

2455 Words, 12 Minutes

Become a Machine Learning Engineer in Five to Seven Steps


A couple of people slid into my Twitter DM’s asking for advice on how to become an ML Engineer. I am not Karpathy, but I am a solid mid-level machine-learning engineer, I became one by deliberate effort and my transition from hobbyist to professional happened only recently.

If you want to follow a similar path, I want to provide you with an actionable roadmap and practical resources to pivot or build the foundation for your career. I hope my advice is useful.

Before blindly following any roadmap, let us answer the question: what is an ML Engineer?

Job titles in and around machine learning are a mess. (So much that I wrote a 34-page whitepaper about them.)
In the context of this guide, an ML Engineer is someone who works in an organization and uses ML to solve a business case, this means to create or improve a product or to make the organization work more efficiently. This is distinct from an ML researcher who develops novel methodology in a scientific pursuit but doesn’t address an immediate business need.

In the end, both roles have considerable overlap, but since I work as an ML engineer, that’s what I am talking about here.

The necessary skills

ML Engineering is an interdisciplinary profession, and you need skills in different areas. Namely software engineering, data science, and mathematics, plus some knowledge of your application domain.

Software Engineering

ML Engineers must be able to “code”, sure, but on top they must be capable software engineers. This is for several reasons.

Firstly, since Machine learning is uncovering patterns in data, ML engineers must be able to handle sufficient amounts of data. Usually that means more data than is handleable manually.

Secondly, because the performance of an ML Engineer is measured via business impact, they need the ability to deploy models and integrate them into the larger context of a product. Without serving anything to users an engineer failed at their job.

Lastly, understanding the inner workings of the computer and the ability to build tailored tools greatly improve development speed. Intuition and domain knowledge can indeed help you tremendously in designing your models but it’s always necessary to build machine learning models in an experimentative process. ML Engineers try a lot of educated guesses and probe what performs best. The faster this process and the faster the iterations, the better the final output will be. Skills in software engineering help to automate, to iterate faster and they make the individual experiments more efficient and effective.

I am not the only one who preaches this. Listen to someone with more laureates, Greg Brockman, the CTO of OpenAI.

Ultimately, machine learning is a computer science discipline, and software engineering is how computer science becomes an effective application.

Data Science

Since ML is learning patterns from data, ML Engineers need to be capable of working with data. They need to be able to handle the messiness of real-world records, know how to collect it, understand it, be able to engineer useful features, and interpret the sensibility of a model’s output.

The most tricky bugs are not out-of-memory errors, it’s when the training loop runs through and the model output sort of seems right but is wrong in some subtle but substantial way. Again and again, data scientists learn the hard way: the single most effective thing to build great models is spending a lot of time with your data.

Another part of data science that ML Engineers need is research skills. They need to be able to identify publications that are relevant to their problem at hand and to be able to replicate those methods and apply them in their domain.

Mathematics & Statistics

The amount of mathematical skills necessary for an ML Engineer is difficult to quantify. For the most part they don’t explicitly need it day-to-day. Yet, knowing the proper mathematics is essential to understanding the data of the problem at hand and to choose the suitable algorithms to apply to it. Therefore, it’s always implicitly needed.

To be effective, ML engineers need the foundations of real calculus, linear algebra, and probability theory. These are the core mathematical theories used to build and train many machine learning models. When they train large models or work with large datasets, ML Engineers also benefit from knowledge in Numerics methods and Optimization Theory. Lastly, to understand the data of a specific problem, ML Engineers need knowledge of statistics.

Application Domain

While ML is more of a general toolbox, ML Engineers greatly benefit from specific domain knowledge. On one hand, this means knowing your use case, its users, and the data available. On the other hand, they develop expert knowledge in working with specific types of data and suitable models. For example language models for text, CNNs for vision, or RNNs for time series.

Routes to Become an ML Engineer

Typically, there are two routes into ML Engineering

  1. The Data Science Route. You first, become proficient with math and data work, begin to use machine learning, and then learn the necessary software engineering skills.
  2. The Software Engineering Route. You become a capable software engineer and at some point pivot your career to learn math, data, and machine learning skills.

For self-learners I believe route 2) to be the better one. That is because you are useful to an organization, even with rudimentary data and ML skills. Many business problems are relatively straightforward and a deployed simple model can already generate value whereas a great model that is stuck in a Jupyter notebook is just a toy (albeit a very interesting one). This is not, however, an invitation to put off the math part indefinitely. You don’t stay mediocre.

If you happen to be in university for a quantitative degree, you will more or less follow route 1) with your curriculum by default. In that case, take some time during or after your studies to learn the software engineering side of things.

Studying computer science and specializing in machine learning while doing a bunch of internships to learn industry-level collaborative development is taking both routes at once and probably the best choice if this option is available to you.

Practical Resources

The following is a collection of structured courses to follow your path into ML engineering. They are more a suggestion to give you an idea about relevant skills than a difinite curriculum. Any time you like, switch things up, use resources you like better, or learn the skills by jumping head first into building projects. You know best how you learn most effectively. It’s more important that you cover the content of this roadmap than how you cover it.

Nonetheless, I promised you an actionable roadmap and concrete resources, so here they are. Let’s go.

Learn to Code

The foundation for both routes to become a Machine Learning Engineer is learning to code and working with computers. Since the ML and data science ecosystem is strongest in Python and it has the most resources available, choosing Python is a safe bet. Harvard’s CS50 is a great high-level introduction to programming and software engineering that covers the fundamentals of Python. You may want to dive a little deeper though, for which Programming Fundamentals by the University of Helsinki is great. You can probably skim through the first few chapters if you have worked through CS50.

While you don’t need to understand Python’s inner workings to use it for data science and machine learning, those things are incredibly helpful down the road. To learn them, put a book like Dead Simple Python on your bedside table and read a chapter every so often.

Learn Shallow ML

Now, that you can code, start learning ML. You should start with shallow learning algorithms. They are more intuitive than neural networks and you can build your skills working with data without the added complexity. A great resource is the Machine Learning Specialisation by Andrew Ng which has been a gateway into AI for many before.

Learn Deep Learning

With the basics of ML covered, move to deep learning, the current industry standard, and a very powerful toolbox. If you liked Andrew Ng’s teaching style, you can continue with the Deep Learning Specialisation. For a more university-style course, I recommend Yann LeCun’s NYU lecture on deep learning. For a more practical approach, fast.ai and the accompanying book Practical Deep Learning for Coders got you covered.

These resources also cover some of the necessary maths. If you find your knowledge lacking, deeplearning.ai has a course on the Mathematics on Deep Learning. During my studies this book was great. It has a lot of digestible chapters with practical examples that work for both, education and as a reference.

Build Domain Expertise

Once you have the foundations of deep learning down, it’s time to choose a domain for a deep dive. Huggingface has a great collection of courses which you can try if you are unsure yet. These courses are not enough on their own, but they are great primers to give you a foundation, context, and vocabulary to research publications and come up with ideas for projects you can build.

Projects? Software, coding, machine learning - for all of these some theoretical knowledge is necessary, but engineering is a practice and you learn by doing. Of course, you did the exercises of the courses and built some small projects already to cement your learning. Now is the time to become more ambitious. Start exploring your interests more freely to graduate from novice to expert by building up a portfolio.

It’s generally better to have one, or a few, impressive, well-architected, and innovative projects worth a month of work than to have many basic ones. You also learn more when doing those. To become a standout applicant, it’s important to make these projects tangible. That could mean writing a blog or some tweets about your learnings. But the most impressive thing you can do is to build a frontend so others can try what you built.

Learn Software Engineering

This leads us to the software engineering side of things. The Fullstackopen course is a great entry point in web development and distributed systems. It does not touch machine learning but covers a lot of tools and practices that will become very valuable as an ML engineer such as architecting distributed systems, databases, and containerization. This knowledge is extremely valuable to deploy your models and provide an interface to your users. The course uses JavaScript because that is the language of the web. While this may seem daunting at first, you have already come a long way and at this point, it’s worth it to bite the bullet and add another language to your toolkit.

Learn MLOps

There are also ML-specific software engineering and development practices (MLOps). To learn how to manage and engineer ML products throughout their whole lifecycle fullstackdeeplearning is a great resource to get you an overview. Pick and choose the practices that make your life as an ML Engineer easier and apply them to your projects. The effort will be well worth it.

Closing words

That’s it. If you follow this guide, I believe that you can become a strong candidate for an entry-level ML engineering position. You will have the necessary theoretical knowledge by working through the materials above and your projects will have made you an expert in a few focus areas.

To get a job, however, having skills is only half the work. You also need to display and communicate your skills. You can do this by opting for internships and getting good recommendations (or offers to stay) and portfolio projects. Quincy Larson, the founder of FreeCodeCamp, wrote a great book on his journey to become a software engineer. Despite his target role being slightly different, the experiences are directly applicable to the journey ahead of you. The book is also available as an audiobook as episode 100 of the FreeCodeCamp podcast, for example on Spotify.

One word of caution: while the roadmap is simple, it’s not easy. Learning ML and software engineering is hard, but it’s not witchcraft. Others have done it before you and if you commit, you can do it (and I have another article that may guide you on effectively learning hard things).

To give you a rough idea, this is how long I would expect it to take depending on your starting point.

Learning from Zero

If you commit to this roadmap full-time, I would estimate it to take around 18 months to learn everything from scratch.

If you are in a life stage where it is possible to go to university and can afford to do so, I think that is the easiest route. A university will give you a community, guidance, a curriculum, support for internships, and calm any concerns from parents or others who want you to make something out of yourself.

If you are making a career change from an unrelated industry, be sure to leverage your prior experience. Even if you want to leave your current industry, your domain knowledge is something that sets you apart. Once you have that role, you get paid to learn, and moving companies will be considerably easier.

Switching Careers as a Developer

If you already work as a developer, you will be valuable fast. Spend around six months after work to learn shallow and deep ML plus the maths you lack. Your prior exposure to software engineering is very valuable and will be highly regarded by employers. Maybe even so much, that you won’t need to sacrifice any seniority. Once you switch, you get paid to learn on the job.

Getting into ML as a Data Scientist

If you are a data scientist you will probably sooner than later feel a career ceiling because of your lack of ability in software engineering. At least I did. For those in data science, moving to ML is more or less a natural career progression and if you put in the extra time to learn can accelerate your career. Seek ML projects in your current role or cherry-pick resources from above, spend a couple of months to work through them, and then, build a portfolio to apply away to another role.

Summary

You can become a strong entry-level candidate by following this roadmap:

1) Learn the basics of computer science and how to code with Python using CS50 and a dedicated Python resource
2) Lear classical (shallow) ML to build your foundations and develop an intuition for data work
2.1) Build math fundamentals in Calculus, Linear Algebra, and Probability theory (bonus points for Numerics and Optimization)
3) Learn deep learning following a specific course like Yann Le Cuns NYU lecture, fast.ai, or deeplearning.ai’s deep learning specialization
4) Learn MLOps from fullstackdeeplearning
4.1) If necessary, preface this by working through fullstackopen to learn software engineering through learning web development as well as the basics of distributed systems, DevOps, and relational databases
5) Search for a niche you want to work in and develop expertise by building your portfolio. You can find a starting point from Hugginface courses follow the rabbit hole of your interest and build some interesting projects with paper implementations.

Then go out and snatch that job.

Good Luck!