Unfolding the universe of possibilities..

Dancing with the stars of binary realms.

From Linear Algebra to Deep Learning in 7 Books (Winter 2023 Update)

Seven of my Favourite Machine Learning Books

Photo by Laura Kapfer on Unsplash

In my first ever blog post for Towards Data Science in 2019, I wrote about five of my favorite machine learning books — books that cover every aspect from basic linear algebra to modern deep learning.

They were:

Linear Algebra Done Right by Sheldon AxlerMathematical Statistics and Data Analysis by John A. RiceElements of Statistical Learning by Trevor Hastie et al.Neural Networks and Deep Learning by Michael NielsenDeep Learning by Ian Goodfellow et al.

Linear Algebra to Deep Learning in 5 Books

Each of those five books I think are really great, for their very own reasons. Elements of Statistical Learning for example is what got me excited about data science in the first place, back when I was an undergraduate student. In fact, all five books over the years helped me learn so much and I wouldn’t be where I am now without them.

However, they were just a subset of my favorite data science and ML books that I discovered and learned from, so this is why I want to write about another batch of seven books that I think are just as amazing and worth adding to your (virtual) shelf. For each of them, I will tell you why I think they are so great, what they taught me, and some advice on how to read them.

In addition, many of the books have recently received an update, so I will also tell you what you can expect from the new version, for example if you already know the previous one.

Like in my last post, the books I chose vary in their level of difficulty, so I will start with those focused on fundamental concepts and will venture into the realm of advanced machine learning literature down the road.

Here we go…

Introduction to Statistical Learning by Trevor Hastie et al.

In my original blog post, I mentioned the book Elements of Statistical Learning, which is considered a classic and one of the most influential and amazing machine learning books ever written.

While being surprisingly accessible despite covering a lot of quite advanced topics, it does generally assume a solid grasp of mathematics, and especially statistics.

For this reason, the same authors also published Introduction to Statistical Learning, which is a more accessible version of the book while still covering (mostly) the exact same topics. Don’t get me wrong, the book still introduces the fundamental pieces of math and stats wherever necessary or helpful, but its main focus is building the reader’s intuition for how the statistical methods in machine learning work.

In my opinion, this is the single best book to learn data science and machine learning. It’s an absolute must-read if you’re starting out to learn data science. It is also the basis for some of the best introductory data science courses at universities around the world.

What I learned from the book: A lot of my intuition for machine learning stems from reading ESL and ISL. There is simply no other book that illustrates these concepts so well. This is especially true for many of the more advanced concepts like cross validation and support vector machines, which can be difficult to grasp initially.

Advice on how to read the book: Read this one from top to bottom if you are new to data science. The chapters build upon each other, so it’s important to pick up at least he basics from each chapter before moving on. If you feel like you need a challenge, you can read ISLR in parallel to the corresponding ESL chapters. That will instantly catapult you from merely grasping a topic to an expert-level understanding.

The book also comes with an entire R library so you can practice all the practice questions easily, which is an amazing educational resource.

What’s new in the most recent version: An updated second edition of the book was released just last year which contains entirely new chapters that were not contained in the previous version. They are mostly advanced topics such as deep learning, survival analysis, and multiple testing.

In addition, this year yet another version of the book was released, this time an official Python version, while the original relies on R as already mentioned.

An Introduction to Statistical Learning

R for Data Science by Hadley Wickham et al

Speaking of R, R for Data Science is another invaluable open source book. You can learn the “whole game” of data science — from data ingestion to data cleaning to training models and communicating your results.

The book doesn’t require any prior knowledge in coding or data science, you can learn it all with this book. In comparison to ISLR (the book above), the approach in R for Data Science more practical. You will find a lot of code in every chapter.

The book also introduces and heavily relies on the tidyverse, which is a beautifully designed collection of R libraries for doing data science. So it’s also a great (maybe the best?) guide if you’re coming from Python and want to learn how to do things in R.

What I learned from the book: The books proves that R is an amazing language for doing data science, and even more so for learning it. This is a highly practical, hands-on guide from which I learned the tidyverse-way of doing data science. It has a particular focus on data wrangling, which other books (such as ISLR) often don’t have. Still today, I rely on the data wrangling knowhow I gained from this book.

How to read the book: You can read it front-to-back and learn an amazing amount of new knowelge and techniques. If you are new to data science, I would definitely recommend that. However, it wasn’t my first data science book, so I ended up doing things a little differently. Instead, I read the chapter in whatever order I felt like. If you already have some data science experience (for example because you come from Python), you can do the same — simply pick a topic that you want to learn more about and deepen your knowledge by reading the corresponding chapter. Each section is reasonably self-contained so you can easily do that.

What’s new in the new version: The second edition of the book was released this year, and it comes with a revamp of all the code examples. The tidyverse has been developing rapidly in the last few years, so this required an update to the book.

R for Data Science (2e)

(I linked the second edition of the book which should be complete by now, but you can find the original first edition here.)

Mathematics for Machine Learning by Marc Peter Deisenroth et al.

If you’re looking for a comprehensive book to study all of the math you need to get a grip on machine learning, Mathematics for Machine Learning is a great option.

Machine learning builds upon quite a few subfields of mathematics, including linear algebra, statistics, probability theory, and optimization, to name just the main. Therefore, in order to learn it all you would typically go through various different textbooks.

Exactly that was the motivation for writing this book — to unite all the math commonly required for machine learning into one book.

What I learned from the book: This is a modern mathematics textbook, so I used it as a reference when doing related courses at uni. Even though it was not the main textbook of my courses, it covered some of the same topics. Therefore, I was able to read the relevant chapters and that helped me understand them better. The book contains good explanations, always relates them to machine learning concepts (unlike generic math textbooks), and includes nice visualisations that also helped me grasp the ideas.

How to read this book: It’s probably best to treat this book for what it is — a mathematics textbook. If you are taking a course in Linear Algebra, Probability, Stats, or a math-heavy ML class, this can be a good additional resource for you. Pick the chapters that align with your course and see if the book helps you as it helped me.

Mathematics for Machine Learning

The Hundred-Page Machine Learning Book by Andriy Burkov

First of all, it’s actually not 100 pages but a few more. Anyway, no hard feelings there because the book is exactly what it promises to be, a concise yet comprehensive overview of the key concepts and methods in machine learning.

Because of that, it’s probably not a go-to resource for learning data science from scratch, nor is it very practical. Rather, its purpose is to provide concise descriptions of important concepts, and to introduce the math and theory behind them.

So it’s a great reference if you ever need to understand a concept or method in depths but quickly.

What I learned from the book: For me the book sometimes acted as a sort of self-check. I thought that if I can understand a concept from it’s bare-bones description, I should have a good understanding of the topic. That’s what I feel this book is about. It doesn’t teach you data science or machine learning from scratch, but mentally, you can wrap up (or rephresh) a concept by reading about it in the book.

How to read this book: The book is one of the best compact summaries of machine learning concepts out there, and I’d suggest you do something similar to me, ie to use this book to check your own understanding of a topic or concept. If you understand the concept, perfect. If not, you might have to go back to another book (eg ISLR or ESL) that explains the concept in more details.

The Hundred-Page Machine Learning Book by Andriy Burkov

Machine Learning: A Probabilistic Perspective by Kevin Murphy

This is one of the more advanced machine learning books out there. Machine Learning: A Probabilistic Perspective takes a Bayesian (ie probabilitistc) view on machine learning, while also contrasting it to the classical, frequentist view.

Bayesian machine learning is different in that it incorporates prior knowledge and uncertainty into the learning process. One of the great advantages is that this allows generating probabilistic predictions rather than single point estimates

The book cover’s many advanced topics that none of the previous books attempt to explain, such as Gaussian processes or variational inference. It’s a great book if you want to take you ML knowledge to the next level, and you will learn to see it from a different angle.

What I learned from the book: At uni, I took a class in Bayesian Machine Learning, and this was one of the indicative readings. For example, I learned about the Bayesian version of linear and logistic regression. The book also has a great chapter on the Expectation-Maximisation algorithm. Still today, I find myself coming back to this book whenever I need to understand a method from a probabilistic point of view.

How to read this book: Again, this is a classical textbook, so you will most likely encounter it as part of a course. But even outside of a classroom you can use this book to gain a different view on many of the same machine learning methods that other books such as ESL teach, plus many additional methods that only really make sense in a probabilistic framework. If this is the first time you are learning about Bayesian ML, I’d suggest you read the introduction first to really understand why this is such a useful topic.

Machine Learning

What’s new in the new versions of the book: The author, Kevin Murphy, has two additional ML books in the works. For one, Probabilistic Machine Learning: An Introduction, which is focused on deep learning, and a more advanced version thereof, with the title Probabilistic Machine Learning: Advanced Topics.

Deep Learning with Python by François Chollet

Deep Learning with Python is a fantastic book if you want to understand (not just learn) neural networks and how to implement them using Python.

It’s written by François Chollet, who is the creator of Keras, a deep learning framework that sits on top of TensorFlow. Chollet is an exceptional author, those are not just my words. He’s doing a remarkable job at illustrating the mechanics behind the various concepts in deep learning and how they come together to form the basis of the field that is currently taking the world by storm (think of ChatGPT).

What I learned from the book: This one is all about how to think about deep learning, starting from first principles. I mainly used this book to learn about Keras, but I also found myself re-reading many other parts of the book (e.g. about convolutional neural networks, best prectices, etc). Even though I already knew most of the topics by the time I stumbled upon this book, I found them too good to miss out on, and so they reinforced my understanding of deep learning greatly.

How to read the book: If you are just starting out in deep learning, this is probably again a read-it-from-the-front-to-back kind of book, simply because François is such a great author who does an amazing job at illustrating the motivating all the concepts. If you already have some experience in deep learning, you might as well just pick a chapter to improve you understanding of a specific topic.

What is new in the second edition: A second edition of the book just came out earlier this year. Deep learning is moving extremely fast, so the book required some updates. It also comes with new chapters and topics. For example it contains new content to cover to the recent advancements in language models and text generation. Also, Keras has changed throughout the years, so the book had to adapt too.

Deep Learning with Python, Second Edition

(There is also a very similar book called Deep Learning with PyTorch which covers a lot of the same content but with a focus on PyTorch rather than TensorFlow/Keras.)

Reinforcement Learning: An Introduction by Richard Sutton and Andrew Barto

Admittedly, this one is the odd one out here, but definitely worth a final spot. Reinforcement learning (RL) is an absolutely fascinating field, which complements the kind of machine learning that the above books teach.

In case you are not familiar with it, reinforcement learning the field of AI that is concerned with autonomous agents and decision making, with the goal of learning by interacting with the real or a simulated world.

This book is the standard textbook for RL classes around the world, so definitely one of the best if you want to learn about it. Knowledge of machine learning is nevertheless helpful as it plays a central role in reinforcement learning too, so think fo this this book as a the cherry on top of the cake.

What I learned from this book: So much. This was also a book for one of my classes at uni. In fact, I did my course project on Monte-Carlo Tree Search, and my MSc thesis on Contextual Bandits. Both are explained very well in the book. They are incredibly interesting and powerful RL methods.

How to read this book: This one is probably again a front-to-back book as well since the chapters very much build on top of each other. Importantly, you should probably have a very good understanding of machine learning and even deep learning before reading this book as a lot of the RL concepts build on them.

There is no new edition of the book yet, but if you ask me it’s high time there is. A lot has changed in the field over the last few years, and it’s driving some of the state-of-the art language models (eg through reinforcement learning from human feedback or RLHF). I would definitely be the first on to read that.

Reinforcement Learning: An Introduction | MIT Press eBooks | IEEE Xplore


These seven books, alongside the five from my original blog post, are some of the best in the field that you can find, I can confidently say that since I’ve spent (and still do) a lot of time curating my (virtual) bookshelf. The best part? Most of them are available online for free.

I’m really grateful for the fact that the field is so open and enables free and easy access to knowledge. I don’t think this is said enough and sometimes just taken for granted, which it shouldn’t.

If there’s one final tip from me — I’ve always been a fan of using multiple sources to learn the same concept. Different books (or courses, videos, and tutorials for that matter) all take a slightly different approach to teaching a topic. It’s like having multiple teachers, one always makes it click in your head, and once that happens, you’ll also understand what the others were saying. I think you get an incredibly profound understanding by doing that.

Thank you so much for reading. I really hope you enjoyed the post and find the books as useful as I, no matter what stage of learning you are in.

If you have any thoughts or questions feel free to reach out to me in the comments or on LinkedIn.

From Linear Algebra to Deep Learning in 7 Books (Winter 2023 Update) was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

Leave a Comment