Free Resources for Learning Data Science

This article is by Sydney Firmin and originally appeared on the Alteryx Data Science Blog here: https://community.alteryx.com/t5/Data-Science-Blog/Free-Resources-for-Learning-Data-Science/ba-p/354182

 

Courses in Mathematics, Machine Learning, Programming, and Data Science

Having a solid foundation in mathematics and computer science will only make you a stronger analyst and data scientist. If you need to brush up on some concepts, or even get exposed to them for the first time, many companies and universities have posted courses around mathematics and machine learning.

MIT’s Open Courseware is a massive library of many different courses taught at MIT. Many of the classes can be very useful for brushing up on mathematics and computer science. There are two sections of courses particularly worth checking out.

  • Mathematics > Probability and Statistics
  • Engineering > Computer Science > Algorithms and Data Structures / Artificial Intelligence / Data Mining

Kahn Academy is a non-profit educational organization dedicated to creating online resources for students. There are some posted mathematics courses that you might find helpful:

Three Blue 1 Brown is an awesome collection of videos posted on YouTube, focusing primarily on mathematical concepts. The Neural Networks series is particularly excellent.

Learn with Google AI is another vast catalog of resources for machine learning, including tutorials, videos, documents, and courses. Google also offers a Python course designed for people with a little bit of programming experience interested in Python.

Amazon AWS recently made all of their online training courses free to take. I recommend checking out the Machine Learning section. There are multiple curated paths under Machine Learning, including a Data Science path and a Developer path (more focused on data engineering). The curated paths can provide a template for learning the skills to become a data scientist.

Andrew Ng’s popular Machine Learning course offered through Coursera has two options: you can audit the course for free, or purchase the course and earn a certificate. This is widely considered to be a thorough introductory course in machine learning.

Another great open-source course is Practical Deep Learning for Coders. Put on by fast.ai, the goal of this organization is to make AI accessible for everyone (hence the slogan “Making neural nets uncool again”). This course also has an associated forum of weekly-challenge style posts for hands-on practice.

 

Blogs 

A popular way to get into data science is through blogs (both reading and writing them). Here are a couple of blogs I’ve found useful or interesting.

KD Nuggets is a massive blog-aggregator. There is always something new to find here.

R-bloggers is another blog-aggregator, focusing on analysis, tutorials, and examples in the R programming language.

Kaggle's No Free Hunch highlights data science news, as well as interviews from Kaggle competition (more details under the hands-on practice section) winners, and data analysis highlights posted on Kaggle.

Medium's Towards Data Science features articles on different aspects of data science from a large number of individual contributors. Some articles are great; some articles are less great.

 

Books (and Papers)

Books are great. Consider looking for them at your local library or searching for open-source copies online.

If you’d like to learn R, the book The Art of R Programming will give you a strong foundation.

Think Python (available for Python 2 or Python 3) is written for people with no coding background that would like to learn Python as well as how to think like a computer scientist (there is a sister book called Think Java, written for the Java programming language).

Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems is an excellent book that includes hands-on exercises (in Python) as well as an overview of many popular machine learning algorithms.

For a more extensive list of data science books, check out this Github page with 40 contributors.

Papers are a great way to understand what areas of analytics are being actively researched. Two Minute Papers is a YouTube channel that reviews research papers in around two minutes.

If you’d prefer to read than listen, this curated index of technical papers might be right up your alley (complete with ratings).

 

Hands-on Practice and Datasets

For many people, the best way to learn is by doing. Step one to trying hands-on data science is getting data. The following are a few suggestions on where to get data or hands-on practice.

The UCI Machine Learning Repository is a database of datasets that have been used for research in AI and machine learning.

Google AI Datasets is another repository of datasets used for research in a wide range of computer science disciplines.

If you haven’t heard of Kaggle, you’re missing out. Kaggle is a site for online data science competitions. A benefit of Kaggle is that in addition to posting datasets to analyze (you don’t have to submit to a competition if you don’t want to), you can also learn from how other users are approaching different problems.

Finally, there's a thread on the Alteryx Community that started in 2015 and is actively updated with good freely available data.

Another (recommended) option is to make your own dataset! You may have already heard the statistic that data collection and preparation makes up about 80% of most data science work – a fabulous way to take data science learning head-on is to gather and prepare your own data for analysis. Don’t forget to post your process and code on your GitHub, and write about your results somewhere (maybe even submit it for publication here!).