Home Product Details Information Sources Gallery

Course Providers

I have had positive experiences with, and can recommend, the following online courses which have been ideally tailored for an introduction to Data Science. At the time of writing, most courses can be accessed for free although costs may be incurred if you choose to purchase a certificate of achievement or continued access to course materials after course completion:

Big Data: Measuring and Predicting Human Behaviour (Future Learn) (free)

This (9 week) course was my initial introduction to Big Data and it proved a fascinating insight into the mass of available data sources and how they are being used to predict human behaviour. The course explains how to access and interrogate data from Google, Wikipedia and social media platforms (Twitter and Facebook), how changes in technology (Smart cities, wearables) are yielding more data and how analysts are using information for a variety purposes (stock market trading, predicting crime, mapping disease outbreaks and predicting epidemic spreads). The course provides a step-by-step guide to the R programming language and practice in its use with online data.

Big Data: Data Visualisation (Future Learn) (free)

A brief but packed (2 week) introduction to visual analytics covering characteristics of good artistic and scientific visualisations demonstrated through impressive case studies. The course gives access to visualisation tools (MATLAB, Tableau, D3,js, OpenGL) and opportunities to investigate how to use them with sample datasets.

Big Data: Mathematical Modelling (Future Learn) (free)

Using Matlab, this (2 week) course provides an introduction to some mathematical concepts that are important to big data analysis (such as linear algebra, matrices, tensors, eigenvectors and eigenvalues) and the analytical techniques of Principal Component Analysis and Singular Value Decomposition. Through practical examples, it delves into the Big Data challenges of ranking data, clustering and data compression.

Big Data: Statistical Inference and Machine Learning (Future Learn) (free)

This (2 week) course introduces the key terms associated with machines learning (classification, clustering, regression, dimension reduction, predictions, neural networks, deep learning, reinforcement learning and principal component analysis) and provides relevance to the subject through academic papers and real-world applications. The material provides practical experience of decision trees and statistical learning with R, RStudio, H2O and data mining and analysis with WEKA.

Learn to Code for Data Analysis (Future Learn) (free)

Although this (4 week) course aims to teach the fundamentals of computer programming it provides an excellent introduction to Jupyter Notebook, Anaconda and the Python programming language. This provides an effective explanation of variables, expressions, functions and data operations but more interestingly a practical guide to the Pandas module used for loading data, structuring data into dataframes, transforming and combining data, investigating correlations and using pivot tables.

Managing Big Data with R and Hadoop (Future Learn) (free)

This is a more complex and challenging (5 week) course which covers virtual machines, R, RStudio, Apache Hadoop and RHadoop. It introduces the principles of Hadoop’s distributed processing for managing large datasets (HFDS and MapReduce) and the use of AWK, and the concepts and techniques of Supervised / Unsupervised Learning, k-Means and non-hierarchical Clustering, Linear Regression, Linear Discriminant Analysis. While the lessons explain basic file operations and commands it is beneficial to have prior understanding of UNIX and Linux operating systems.

The Data Scientist’s Toolbox (Coursera)

This (4 week) course takes a whistle-stop tour through some of the common data science tools and concepts. It introduces Git (version control system), GitHub (web-based software hosting service) and the use of the Git Bash command line interface. The practical steps to Fork and Clone GitHub repositories and to create a GitHub account are explained. The material focuses on the R programming language, providing an overview of the RStudio Integrated Development Environment, R commands, how to obtain, install and load R packages, and graphing data. It also covers machine learning, and regression and classification models. In setting out the principles of best practice for Data Scientists and explaining the data analysis cycle (accessing, merging, and manipulating data) it is a useful introduction to the subject.

R Programming Language (Code School) (free)

There can’t be many courses that attempt to teach the rudiments of R through the theme of pirates but Code School uses this effectively to deliver fun and educational training. In a self-paced course it introduces an extensive range of R commands, providing examples of how they are used and testing understanding. The content includes R expressions, variables and functions, vectors, matrices, statistics, plotting data, data frames, importing data from files, correlating data and use of libraries.

Coursera: Machine Learning Andrew Ng (Coursera) (free)

A highly respected, comprehensive and challenging (11 week) course which explains the key terms and concepts associated with machine learning and reinforces understanding through practical code development using the Octave programming language. Topics explored include supervised learning and unsupervised learning, univariate & multivariate linear and polynomial regression, cost functions, optimisation algorithms like Gradient Descent, learning rates, feature scaling, classification problems, logistic regression, decision boundaries, regularisation, Neural Networks, error metrics, Support Vector Machines, Kernels, the K-means algorithm, Principal Component Analysis, reinforcement learning and recommender systems.


Other courses that I'm aware of and that I have on my plan to try myself include:

Practical Deep Learning For Coders Jeremy Howard (University of San Francisco), 2017 (Free)

Visualization in R FlowingData

Hadoop Python MapReduce Tutorial for Beginners Matthew Rathbone (Beekeeper Data), November 2013 (Free)

Probability and Statistics Stanford University online, April 2017 (Free)

Statistical Learning Stanford University online, June 2016 (Free)

Statistical Reasoning Stanford University online, February 2016 (Free)

Data Visualization and D3.js Udacity (Free)

Intro to Machine Learning Udacity (Free)

Deep Learning by Google Udacity (Free)