In the last article, Part 1 of this road map, we briefly discussed the starting tools and directions for machine learning. We discussed a simple plan that I believed would help beginners establish a solid foundation. After talking about this, at the end of the article, I said I was running a few polls on LinkedIn to find out what suggestions the community had for newcomers.
For example, I've asked the machine learning groups on LinkedIn a few questions about where an enthusiast should start, the best resource for machine learning theory, linear algebra, statistics, etc., and the best starting place for an enthusiast. After taking my time to consolidate all these results, I am finally presenting them through this article. So, let's dive into the article without wasting any more time.
How to start the journey?
In the last roadmap, I suggested that people start with Python programming and learn maths as they go (if they already knew basic maths). But I also wanted to know what they thought would be a good place to start. I asked them about machine learning courses instead of core Python programming because I thought that would be better.
I asked the groups where their journey started as my first question. What resource did they follow? What area did they focus on? Did they start with the basic foundations like Math or dive directly into Machine learning or Deep learning? Now, after gaining some experience and insights, what do they recommend for people who want to start fresh?
Before I talk anymore, look at how the polls turned out.
When you look at the poll results, you can see that most people started by taking online or in-person classes. I thought that was normal and the second poll would be the same. But when I asked people for their suggestions, they suggested starting by doing projects.
Also, about the same number of people said that people should start with maths. After looking at this, I brainstormed a few reasons along with what the community said. Here are some reasons people might not recommend beginning with ML classes but rather with projects.
- Projects give beginners hands-on training and help them learn from their mistakes, and make it easier for people to work together, especially on sites like Kaggle.
- Though courses still give a clear place to start, most courses are considered too vague or academic and out-of-date with the current demands.
- Students who don't have a background in math need well-organized math and basic statistics to understand what is happening behind the hood.
- Assessments should include multiple-choice questions and live projects.
Other than these, there can be multiple different reasons. However, I believe these points contribute to most of the reasons.
Where to start learning Python?
Now let's go back to the road map that we drew. In that roadmap, you can see the horizontal importance of Python, Math and SQL. Starting with Python, we will closely look into each aspect before going to the next level of concepts.
Python is one of the most recommended starting points for any machine learning enthusiast, as it provides a wide flexibility to play around with. With Python, you can scrape the data, clean and pre-preprocess it, train a model on the cleaned data, deploy it on a web app, and monitor the deployed app's status using custom dashboards. You can work on a project end-to-end with only Python and nothing else.
Besides its flexibility, Python is one of the simpler languages to understand. All you need to remember is the colon(:) and the indentations, and then you are good to go.
Now that we have discussed why Python is a good recommendation, let's look at what is recommended as a starting point.
Python for Everybody Specialization from the University of Michigan is the most recommended starting point for beginners. This is a 5-course specialization on Coursera that can be purchased or audited. I have personally taken the Django course by Prof. Charles Severence, and I can say that you need not think twice about his teaching style. You can find more details about the course here.
Along with these, people recommended their favourite youtube playlists by different channels, like CodeWithHarry, CodeBasics, Codanics, Corey Schafer, Telusko etc. People also recommended their sources like DataCamp, CS50 by Harvard University and Python books published by Packt and other publishers.
Where to learn Math?
Math is one of the wider areas to discuss. We essentially focus on three aspects: Linear Algebra, Calculus and Statistics. I asked more about Linear Algebra and Statistics for the beginner level, as high-school level calculus (especially the Chain rule) is sufficient for most parts of your beginning parts of the journey.
Most of whatever we do in machine learning is a combination of Linear Algebra and Statistics. Most of what we do in ML is convert high-dimensional data into interpretable lower dimensions. We utilize Linear Algebra to visualize and manipulate this high-dimensional data.
The following are the recommendations from the ML enthusiasts,
One of the major recommendations for learning in-depth Linear algebra is MIT OCW, 18.06 Linear Algebra, taught by Prof. Gilbert Strang. However, the depth of the course is out of the scope for people who are just beginning their ML journey. The poll results can reflect this recommended Khan Academy or DeepLearning.ai courses as a better starting point for beginners.
Along with these, one of the most common recommendations for Linear Algebra is the playlist from the 3Blue1Brown channel. Of course, if you have less time, I recommend going along with 3Blue1Brown. It is one of the more recommended resources for linear algebra.
Probability and Statistics
Probability and Statistics are fundamental to machine learning and any ML professional's career in the long term. Even though the beginning stages don't require deep knowledge, in the long run, people will start facing the wall of statistics (Assuming you will be working in the technical field of ml rather than going into managerial roles). Hence, getting a basic foundation and understanding the statistical view of each model is one of the key things to keep in mind as you travel along the field.
So, let's take a look at what the community recommends,
Based on the community recommendations, an equal number of people have voted for StatQuest and the Probability Statistics course from DeepLearning.AI. Since we have already discussed the Coursera specialization, let's look at the StatQuest playlist.
The most important highlight of this playlist is the simplicity with which Josh Starmer teaches stuff. He breaks down topics into small bite-sized pieces and explains them in a simplified style. This makes your journey more modular and also convenient.
To know more about why people recommend Statquest, you can look at the following Reddit discussion. Here is just a small snippet from the conversation,
Structured Query Language (SQL)
SQL is one of the controversial topics for a data scientist. Depending on the place and the domain you work in, you either completely use SQL or sometimes never touch SQL for months. To go into more detail, if you land in a team focused more on Image, Video or Speech type of samples, you rarely use SQL; however, in some areas of recommendations systems and other applications, you might have to work with SQL. So, it is application dependent. (Remember that this is a roadmap focussed more on Machine learning rather than data science or data engineering. Hence this discussion will be short).
However, regarding interviews, you will most definitely be discussing SQL. Hence, it's better to have a basic idea of it. Here are the recommendations that are given by the community,
The recommendations are quite straightforward, with most people recommending SQL for Data Science course on Coursera and a minority advising people to start with DataCamp in the comments. I personally haven't tried the course as I worked with DataCamp for my SQL, so I can't accurately give an opinion on the course or the recommendations. If you want to know more about my recommendations, you can find them in part-1 of this roadmap.
Finally... Machine Learning.
I don't think you would need an introduction to this section, as you probably clicked this article for the Machine learning roadmap. Hence, let us look into the recommendations for starting the Machine learning journey.
There is no need for any confusion here. Anyone who wants to get started with machine learning and doesn't want to go through the highly theoretical way with all the complex math and stats can directly start with the Machine learning specialization on Coursera taught by Prof. Andrew Ng. This is one of the best starting points; anyone can explain why this is the best.
Later in the journey, if you want a more theoretical introduction, Prof. Andrew Ng has you covered with his CS229 playlist on youtube. Hence, there is no need for you to look around. For more info, you can check the several resources out there or, once again, check part-1 of this series.
Bonus: Job vs Higher Studies in ML?
As a recently graduated student, I had difficulty making this decision. I was completely confused about what to choose, Masters's degree in Ml or Data Science or getting a job and learning things practically. And in reality, this is the confusion of many students who choose to work in ML. Hence, I thought this would also be one of the questions that need more than one person's suggestion. (And the response for this poll was massive. Nearly 500 working professionals voted for this by the end of the survey apply). So, what did the professionals finally recommend?
From this, you can see that people are recommended to work more than pursue higher studies. However, there is a downside to it too. Getting a job after a bachelor's is very difficult, especially since people with a Master's will also apply for the same position. (I was rejected for the same reason in some of my interviews last year). Conversely, a person who comes to the professional space after a master's needs more time to adjust to the production setting as he will be used to a more academic environment. So, the decision must be taken after much consideration and evaluation of what is better for you.
With this, I think I have covered all the areas for the roadmap. It took me three weeks to conduct these polls and consolidate the results from multiple polls. If you find this article valuable, sign up to get notified when more interesting articles pop up on the world wide web.
In the meantime, I have started taking the NLP specialization and working on interesting projects. Based on what I have been learning, I plan to create an article series on NLP while returning to my UG notes and refining a few topics related to Adversarial Machine learning and Model explainability. If any of these articles excite you or are relevant, don't forget to subscribe to the site to get them directly to the mailbox!
To get the latest on my ML and data science journey, don't forget to subscribe and follow me on LinkedIn and Instagram! Thanks for reading till the end. You are amazing! ✌️