Recently, we have seen a boom in Generative AI-the fantastic responses from ChatGPT, the mind-blowing images from Adobe Firefly, Midjourney and Dall-E, the announcement of "Soundstorm" by Google and also the integration of these services into different search engines and ultimately in day-to-day life is something the large corporations are fighting for. However, at this point, people start to think differently.
Though we are enthusiastic about these new tools, some fear future dangers, while some think about the changes in our livelihoods. There are countless such viewpoints. However, among these viewpoints are a few that think about the background of these services, some who dream of creating even better or more innovative products.
All such enthusiasts and dreamers search for a possible starting point. I was one of them too. However, I could not move forward because of the overwhelming requirements and resources for the machine learning field.
I started my journey in 2020, right after the end of covid first wave. I looked into different requirements like Math, DSA, Ops etc. And multiple people from different backgrounds posted their roadmaps, shared their Trello boards etc. I was utterly overwhelmed and postponed these for months.
However, it wasn't until I dived mindlessly out of frustration that I recognized that these roadmaps or Trello boards are part of a journey. The entry bar was surprisingly low.
So what should someone learn exactly?
If you are just entering the field, there are surprisingly low requirements. If you are clear about your essential calculus and simple linear algebra, then you are good to set off. I, myself, only had calculus when I started. I remember a minimal amount of Linear Algebra. Of course, I had no probability and statistics knowledge. It wasn't until the latter half of 2022 that I seriously thought about probability and statistics.
Think of it as starting as a level 1 character. Try to understand the basic game mechanics, the environment and your allegiance. As you progress through the game, you level up and start focusing on the lore and know why you are doing your missions and what your missions affect. You will also learn more about the secret dungeons and the legendary weapons, and you might as well discover some that the community has not yet known. As you do that, you find your role in the game. No matter your role's general requirements, nobody expects a level 1, i.e. a fresher, to know how to defeat a level 100 boss.
In a field like ML, people from different domains actively switch into the role. They only learn some things before entering the job. The recruiters, however, have a baseline. There are some things that one should know before starting a career.
And yes, like how early access players start writing a beginner's guide, multiple roadmaps can help you get started. I have tried my share to write an essential roadmap that I feel I would need if I have to restart learning machine learning from scratch.
I have divided the entire beginner roadmap into five sections. We will start discussing what programming language to use and where to learn it and end our discussion with the essential Infrastructure tools and commands we use in general.
The above flow chart shows no specific order is recommended for Math, Python and SQL. The particular order of these things is up to you. You can first focus on Python, then pick up Math, or start with the simplest thing, SQL, and then choose what you want to do. However, concepts in the subsequent layers need prerequisites.
1. The Programming Language
While numerous programming languages are useful for machine learning, most searches lead to Python and R. The two most common terminologies used by Data Scientists, Data Analysts etc. However, what should you choose as your starting point? (If you haven't.)
I will not go into the details and cut to the chase. Python is recommended to beginner aspirants who aim for an ML Engineer role. There are multiple reasons, but the main reason is Python's versatility. Now that this choice is out of the question, let us go to the next question in the series. The "What".
You only need to know the basics like the if-else conditions, the for and while loops, writing functions and creating classes. If you are familiar with these, then the rest would be learnt on the go.
If you want to learn those, then the "Automate the boring tasks with Python" book by Al Sweigart is the best start I recommend. You can also take his Udemy course to start. This is one of the best beginning books with a fun project-based approach for Python projects.
Other than this, if you want structured and clean directions, you can try the Python programming playlist by Corey Schafer or the Python playlists by SentDex. If you are more of a text-based learner, go ahead with the Real Python website for informative articles for blogs.
Math for ML is a multi-faceted discussion. The way you tackle this affects your journey in the long term. If you need to work in an academic setting, I recommend not to take this part lightly. Else, I recommend you only one place Khan Academy. You only need to check out three sections: the Linear Algebra section, Probability Statistics, and Calculus.
On the other hand, you can also check out the latest "Mathematics for Machine Learning and Data Science Specialization" specialization from DeepLearning.ai. It has an excellent curriculum, and I enjoy listening to Luis Serrano's lectures.
Finally, if you are from an engineering background like me, get familiar with statistics by following StatQuest. This resource needs no explanation. If you only want statistics, then you can follow his playlist on youtube. The video introduction silently directs people to take the content differently from regular academic lectures. So, check them out!
3. SQL - Structured Query Language.
After covering Mathematics and Python essentials, SQL is another vital aspect to note in your journey to becoming an ML Engineer. Structured Query Language (SQL) is, simply put, the language used to speak to a database. Often in our projects, we come across massive datasets that need filtering and cleaning before we start modelling. One's understanding of SQL is a critical factor in handling data efficiently.
Luckily, SQL is straightforward to understand. Here are a few resources that I found good starting points:
- Khan Academy's Intro to SQL course: This comprehensive and beginner-friendly course can ease you into interacting with databases.
- The Complete SQL Bootcamp, Jose Portilla, Udemy: As the name suggests, this is a single stop for learning SQL.
- SQL Fundamentals skill track, DataCamp: The advantage of this skill track comes from its interactive exercises that follow each lesson, which can help you retain more of what you learn.
My whole request for you is to stop running behind DBMS concepts and courses. Instead, focus on SQL and how you utilize them practically.
4. Machine learning.
Well, I believe there is no need to discuss the importance of this section. Many excellent materials exist to serve as an entry point into machine learning. However, I recommend SentDex's Machine Learning playlist and the great Andrew Ng's Machine Learning Specialization in Coursera to start your journey. These resources provide the context in simple terms and offer a concise overview of the topics involved.
FastAI's Intro to Machine Learning for coders is also recommended because it covers the functional areas while also covering relevant theory when necessary.
I recommend solidifying and refining your theoretical understanding once you're comfortable implementing machine-learning concepts. For this journey stage, I suggest Pattern Recognition and Machine Learning Book by Prof. Christopher M. Bishop or The Probabilistic Machine Learning books by Prof. Kevin P. Murphy.
4.5 Neural Networks and Deep Learning Frameworks.
While the basics of machine learning are essential, we should also note that getting an idea about the different Deep learning concepts is also an essential requirement. When I asked a few recruiters I know on LinkedIn, the most common projects and skills are Sentiment classification, Landmark classification or others related to CNNs or RNNs.
The Deep Learning specialization from Deeplearning.ai is an unparalleled source for understanding Deep Learning concepts. This is a one-stop solution for understanding concepts from simple perceptron to complex transformers. Prof. Andrew Ng has taken care of every critical aspect and explained it in simple terms.
If you want to nitpick the specialization, there is only one central area where it is not that fulfilling. That area would be the practical part. The majority of the learning happens through practically working on different projects. I am not going to lie. Though I learned some fantastic things in the specialization, I couldn't retain them, as well as the concepts I picked up through different projects.
5. Basic Infrastructure knowledge.
This part is unsurprisingly one of the most critical aspects. In my first few days, I was utterly overwhelmed. Luckily, I had some basic knowledge regarding docker, Linux commands etc.
There was even an accident I might have caused if not for the timely suggestion of my mentor. I will not go into the details, as I still work there. But here is the checklist of different parts that I recommend.
- Basic Docker knowledge - At least learn how to build an image and utilize a container. Deal with the advanced stuff when you need them.
- Linux commands - I recommend getting as familiar as possible with the terminal. It is valid even if you don't use it often. You will feel as if you have a superpower in your hands. I recommend this fantastic playlist by Corey Schafer for the basics.
- Cloud Computing - This part generally should go unmentioned. Get yourself familiar with any one resource like Azure, GCP or AWS. The organizations themselves provide the resources for these. They even offer badges or certificates for you.
- Git - You will be working with a team. I used to work solo, so most of my interaction with Git was to create repos, commit them and push them. If you do the same, then you might think it is not that important to learn Git. However, many people will work on the same project in a team. Each will be identifying and working on different issues. If the main codebase changes without everyone knowing what has changed, then all hell breaks loose. Hence, make sure you are familiar with Git.
With these points, we have reached the end of the basics section of the roadmap. The following flowchart can summarize the entire roadmap,
In the meantime, I will do a few surveys on LinkedIn. These polls ask different questions regarding the roadmap, skills and resources needed to become a machine learning engineer. So, expect the results of these surveys in part-2.
If you want to know how people have started their careers and what they generally recommend you to start with, you can check out my previous article on the results of my LinkedIn polls. You can check that out here.
I hope you enjoyed reading this article. Comment about your recommendations and viewpoints about the roadmaps. You can even message me on LinkedIn regarding any issues with the article. Thanks for your patience in reading the article. Let's meet again in the following article! Till then, ✌️.