Machine Learning | Alex Dillhoff

Boosting

Table of Contents Introduction AdaBoost Introduction Combining predictions from multiple sources is usually preferred to a single source. For example, a medical diagnosis would carry much more weight if it was the result of a consensus of several experts. This idea of prediction by consensus is a powerful way to improve classification and regression models. In fact, good performance of a committee of models can be achieved even if each individual model is conceptually very simple.

Decision Trees

Table of Contents Resources Introduction Example: Iris Dataset Growing a Tree Examining the Iris Classification Tree Pruning a Tree The Algorithm Resources https://www.kaggle.com/dmilla/introduction-to-decision-trees-titanic-dataset Introduction A decision tree, or Classification and Regression Trees (CART), is a model that recursively partitions the input space based on a collection of features. The partitions are split based on very simple binary choices. If yes, branch to the left; if no, branch to the right.

Camera Models

Table of Contents Reading Outline Pinhole Model From World Space to Image Space Camera Parameters Estimating Camera Parameters Application: Camera Calibration Reading Chapters 1 and 7 (Forsyth and Ponce) https://www.scratchapixel.com/ https://docs.google.com/presentation/d/1RMyNQR9jGdJm64FCiuvoyJiL6r3H18jeNIyocIO9sfA/edit#slide=id.p http://vision.stanford.edu/teaching/cs131_fall1617/lectures/lecture8_camera_models_cs131_2016.pdf http://vlm1.uta.edu/~athitsos/courses/cse4310_spring2021/lectures/11_geometry.pdf Outline Pinhole model Coordinates of a pinhole model Perspective projections Homogeneous coordinates Computer graphics perspective Lenses Intrinsic and extrensic parameters From world to camera to image space Camera calibration Pinhole Model Imagine piercing a small hole into a plate and placing it in front of a black screen. The light that enters through the pinhole will show an inverted image against the back plane. If we place a virtual screen in front of the pinhole plate, we can project the image onto it. This is the basic idea behind a pinhole camera model.

Hidden Markov Models

Table of Contents Introduction The Markov Assumption Definition Evaluation The Viterbi Algorithm Estimating Parameters Expectation Maximization Introduction This article is essentially a grok of a tutorial on HMMs by (RABINER 1989). It will be useful for the reader to reference the original paper. Up to this point, we have only explored “atomic” data points. That is, all of the information about a particular sample is encapsulated into one vector. Sequential data is easily represented by graphical models. This article introduces Hidden Markov Models, a powerful probabilistic graphical model used in many applications from gesture recognition to natural language processing.

RANdom SAmple Consensus

Table of Contents Introduction Finding the Best Fit Model Introduction Unless our data is perfect, we will not be able to find parameters that fit the data in the presence of outliers. Consider fitting the data in the figure below using a least squares method. Figure 1: Points sample along a line with many outliers around it. Source: Wikipedia

Kernels

Table of Contents Introduction Dual Representation Relating Back to the Original Formulation Types of Kernels Constructing Kernels RBF maps to infinite-dimensional space Slides for these notes can be found here. Introduction Notebook link: https://github.com/ajdillhoff/CSE6363/blob/main/svm/kernels.ipynb Parametric models use training data to estimate a set of parameters that can then be used to perform inference on new data. An alternative approach uses nonparametric methods, meaning the function is estimated directly from the data instead of optimizing a set of parameters.

Naive Bayes

Table of Contents Introduction Definition Maximum Likelihood Estimation Making a Decision Relation to Multinomial Logistic Regression MNIST Example Gaussian Formulation Slides for these notes can be found here. Introduction To motivate naive Bayes classifiers, let’s look at slightly more complex data. The MNIST dataset was one of the standard benchmarks for computer vision classification algorithms for a long time. It remains useful for educational purposes. The dataset consists of 60,000 training images and 10,000 testing images of size \(28 \times 28\). These images depict handwritten digits. For the purposes of this section, we will work with binary version of the images. This implies that each data sample has 784 binary features.

Neural Networks

Table of Contents Resources Introduction Definition Forward Pass Activation Functions Multi-Class Classification Backpropagation Non-Convex Optimization Resources https://playground.tensorflow.org/ Introduction Previously, we studied the Perceptron and saw that while it made for a simple linear classifier, it is severely limited to problems that are already linearly separable. This limitation was resolved by introduding a hidden layer with multiple perceptron units, aptly named Multi-Layer Perceptrons. In this series, we will explore the more general method of neural networks. We will see that even a network of only two layers can approximate any continuous functional mapping to arbitrary accuracy. Through a discussion about network architectures, activation functions, and backpropagation, we will understand and use neural networks to resolve a large number of both classification and regression tasks.

Perceptron

Table of Contents Introduction The Perceptron Learning Algorithm Limitations of Single-Layer Perceptrons Introduction A popular example of a Logistic Regression model is the perceptron. Proposed by Frank Rosenblatt in 1962, the perceptron is defined as a generalized linear model: \begin{equation*} f(\mathbf{w}^T\mathbf{\phi}(\mathbf{x})), \end{equation*} where \(\phi\) is a basis function and \(f\) is a stepwise function with the form \begin{equation*} f(a) = \begin{cases} 1, a \geq 0\\ -1, a < 0 \end{cases} \end{equation*}

Principal Component Analysis

Table of Contents Summary Maximum Variance Formulation Motivating Example Noise and Redundancy Covariance Matrix Summary If we have some measurements of data, but do not know the underlying dynamics, PCA can resolve this by producing a change of basis such that the dynamics are reflected upon the eigenvectors. Maximum Variance Formulation Although there are several derivations of PCA. I really like the approach of projecting the data onto a lower dimensional space in order to maximize the variance of the projected data.

Regularization

Table of Contents Introduction Overfitting Penalizing Weights Dataset Augmentation Early Stopping Dropout Slides for these notes are available here. Introduction Regularization is any modification we make to a learning algorithm that is intended to reduce its generalization error but not its training error. - Goodfellow et al. Regularization comes in many forms. Some techniques may add an additional penalty to the loss function. Others, such as data augmentation, add artificial variation to the data. In all cases, regularization aims to improve the generalization performance by preventing the model from overfitting.

Support Vector Machine

Table of Contents Introduction Maximum Margin Classifier Formulation Overlapping Class Distributions Multiclass SVM Additional Resources Introduction Support Vector Machines are a class of supervised learning methods primarily used for classification. Although they can be formulated for regression and outlier detection as well. Instead of optimizing a set of parameters which compress or summarize the training set, they use a small subset of the training data to compute the decision function.