
**Tutorials, examples, collections, and everything else that falls into the categories: pattern classification, machine learning, and data mining.**
Table of Contents¶
Introduction to Machine Learning and Pattern Classification¶
- Predictive modeling, supervised machine learning, and pattern classification - the big picture [Markdown]
- Entry Point: Data - Using Python's sci-packages to prepare data for Machine Learning tasks and other data analyses [IPython nb]
- An Introduction to simple linear supervised classification using
scikit-learn[IPython nb]
Pre-Processing¶
- Feature Extraction
- Tips and Tricks for Encoding Categorical Features in Classification Tasks [IPython nb]
- Scaling and Normalization
- About Feature Scaling: Standardization and Min-Max-Scaling (Normalization) [IPython nb]
- Feature Selection
- Sequential Feature Selection Algorithms [IPython nb]
- Dimensionality Reduction
- Principal Component Analysis (PCA) [IPython nb]
- PCA based on the covariance vs. correlation matrix [IPython nb]
- Linear Discriminant Analysis (LDA) [IPython nb]
- The effect of scaling and mean centering of variables prior to a PCA [PDF]
- Kernel tricks and nonlinear dimensionality reduction via PCA [IPython nb]
- Representing Text
- Tf-idf Walkthrough for scikit-learn [IPython nb]
Model Evaluation¶
- An Overview of General Performance Metrics of Binary Classifier Systems [PDF]
- Cross-Validation
- Streamline your cross-validation workflow - scikit-learn's Pipeline in action [IPython nb]
- Model evaluation, model selection, and algorithm selection in machine learning - Part I [Markdown]
- Model evaluation, model selection, and algorithm selection in machine learning - Part II [Markdown]
Parameter Estimation¶
Parametric Techniques
- Introduction to the Maximum Likelihood Estimate (MLE) [IPython nb]
- How to calculate Maximum Likelihood Estimates (MLE) for different distributions [IPython nb]
Non-Parametric Techniques
- Kernel density estimation via the Parzen-window technique [IPython nb]
- The K-Nearest Neighbor (KNN) technique
Regression Analysis
- Linear Regression
- Least-Squares fit [IPython nb]
- Non-Linear Regression
- Linear Regression
Machine Learning Algorithms¶
Bayes Classification¶
- Naive Bayes and Text Classification I - Introduction and Theory [View PDF] [Download PDF]
Logistic Regression¶
- Out-of-core Learning and Model Persistence using scikit-learn [IPython nb]
Neural Networks¶
Artificial Neurons and Single-Layer Neural Networks - How Machine Learning Algorithms Work Part 1 [IPython nb]
Activation Function Cheatsheet [IPython nb]
Ensemble Methods¶
- Implementing a Weighted Majority Rule Ensemble Classifier in scikit-learn [IPython nb]
Decision Trees¶
- Cheatsheet for Decision Tree Classification [IPython nb]
Clustering¶
- Protoype-based clustering
- Hierarchical clustering
- Complete-Linkage Clustering and Heatmaps in Python [IPython nb]
- Density-based clustering
- Graph-based clustering
- Probabilistic-based clustering
Collecting Data¶
Collecting Fantasy Soccer Data with Python and Beautiful Soup [IPython nb]
Download Your Twitter Timeline and Turn into a Word Cloud Using Python [IPython nb]
Reading MNIST into NumPy arrays [IPython nb]
Statistical Pattern Classification Examples¶
- Supervised Learning
Parametric Techniques
Univariate Normal Density
- Ex1: 2-classes, equal variances, equal priors [IPython nb]
- Ex2: 2-classes, different variances, equal priors [IPython nb]
- Ex3: 2-classes, equal variances, different priors [IPython nb]
- Ex4: 2-classes, different variances, different priors, loss function [IPython nb]
- Ex5: 2-classes, different variances, equal priors, loss function, cauchy distr.[IPython nb]
Multivariate Normal Density
- Ex5: 2-classes, different variances, equal priors, loss function [IPython nb]
- Ex7: 2-classes, equal variances, equal priors [IPython nb]
Non-Parametric Techniques
Resources¶
Matplotlib examples - Visualization techniques for exploratory data analysis [IPython nb]
Copy-and-paste ready LaTex equations [Markdown]
Open-source datasets [Markdown]
Free Machine Learning eBooks [Markdown]
Terms in data science defined in less than 50 words [Markdown]
Useful libraries for data science in Python [Markdown]
General Tips and Advices [Markdown]
A matrix cheatsheat for Python, R, Julia, and MATLAB [HTML]