Machine Learning

The Language of LLMs

How do LLMs read and process the high dimensional landscape of text efficiently? Presented as a workshop at UTA's Datathon on April 13, 2024.

Bag of Visual Words

Table of Contents Bag of Visual Words Bag of Words is a technique used in Natural Language Processing for document classification. It is a collection of word counts. To create a Bag of Words for a document, it necessary to create a dictionary first. Choosing the a dictionary is based on many factors including computational limitations. Next, the documents in a dataset are tokenized into words. The word counts are collected as part of a histogram and used as a feature vector for a machine learning model.

Pretraining Large Language Models

Table of Contents Unsupervised Pre-training From GPT to GPT2 These notes provide an overview of pre-training large language models like GPT and Llama. Unsupervised Pre-training Let’s start by reviewing the pre-training procedure detailed in the GPT paper (Radford et al. 2020). The Generative in Generative Pre-Training reveals much about how the network can be trained without direct supervision. It is analogous to how you might have studied definitions as a kid: create some flash cards with the term on the front and the definition on the back. Given the context of the word, you try and recite the definition. For a pre-training language model, it is given a series of tokens and is tasked with generating the next token in the sequence. Since we have access to the original documents, we can easily determine if it was correct.

Gradient Boosting

Table of Contents Notes from (Friedman 2001) Notes from (Friedman 2001) Many machine learning methods are parameterized functions that are optimized using some numerical optimization techniques, notably steepest-descent. Initial learner is a stump, subsequent learners are trees with depth as some power of 2 (commonly). Numerical optimization in function space \[ g_m(\mathbf{x}) = E_y\Big[\frac{\partial L(y, F(\mathbf{x}))}{\partial F(\mathbf{x})}|\mathbf{x}\Big]_{F(\mathbf{x})=F_{m-1}(\mathbf{x})} \] The optimal step size found by solving

An Introduction to Hidden Markov Models for Gesture Recognition

Hidden Markov Models provide a way of modeling the dynamics of sequential information. They have been used for speech recognition, part-of-speech tagging, machine translation, handwriting recognition, and, as we will see in this article, gesture recognition.

Bias and Variance

Table of Contents Generalization Bias Variance Bias-Variance Tradeoff Generalization When fitting machine learning models to data, we want them to generalize well to the distribution that we have sampled from. We can measure a model’s ability to generalize by evaluating it on previously unseen data that is sampled from the same distribution as the training set. However, we often do not know the true underlying distribution. So we must fit the models to empirical distributions derived from observed data.

Transformers for Computer Vision

Table of Contents Vision Transformer (ViT) (Dosovitskiy et al. 2021) Swin Transformer (Liu et al. 2021) Vision Transformer (ViT) (Dosovitskiy et al. 2021) The original Vision Transformer (ViT) was published by Google Brain with a simple objective: apply the Transformer architecture to images, adding as few modifications necessary. When trained on ImageNet, as was standard practice, the performance of ViT does not match models like ResNet. However, scaling up to hundreds of millions results in a better performing model.

Sequential Minimal Optimization

Table of Contents Introduction Box Constraints Updating the Lagrangians The Algorithm Implementation Introduction Paper link: https://www.microsoft.com/en-us/research/publication/sequential-minimal-optimization-a-fast-algorithm-for-training-support-vector-machines/ Sequential Minimal Optimization (SMO) is an algorithm to solve the SVM Quadratic Programming (QP) problem efficiently (Platt, n.d.). Developed by John Platt at Microsoft Research, SMO deals with the constraints of the SVM objective by breaking it down into a smaller optimization problem at each step.

Instance Segmentation

Table of Contents Introduction Mask R-CNN (He et al. 2018) CenterMask (Lee and Park 2020) Cascade R-CNN (Cai and Vasconcelos 2019) MaskFormer (Cheng, Schwing, and Kirillov 2021) Mask2Former (Cheng et al. 2022) Mask-FrozenDETR (Liang and Yuan 2023) Segment Anything (Kirillov et al. 2023) Segment Anything 2 (Ravi et al. 2024) Introduction Mask R-CNN (He et al. 2018) Mask R-CNN adapts Faster R-CNN to include a branch for instance segmentation (Ren et al. 2017). This branch predicts a binary mask for each RoI, and the training loss is updated to include this branch.

Object Detection

Table of Contents Papers Evaluating Object Detection Methods Datasets An Incomplete History of Deep-Learning-based Object Detection Papers https://awesomeopensource.com/projects/object-detection Evaluating Object Detection Methods Object detection algorithms are evaluated using the mean of Average Precision (mAP) across all classes in the dataset. Precision and recall are computed from the predictions and the ground truth. A sample and the model’s prediction can either be positive or negative when it comes to classification. Either it belongs to a class or it does not. The table below summarizes the outcomes between the model’s prediction and the true underlying class.

Boosting

Table of Contents Introduction AdaBoost Introduction Combining predictions from multiple sources is usually preferred to a single source. For example, a medical diagnosis would carry much more weight if it was the result of a consensus of several experts. This idea of prediction by consensus is a powerful way to improve classification and regression models. In fact, good performance of a committee of models can be achieved even if each individual model is conceptually very simple.

Decision Trees

Table of Contents Resources Introduction Example: Iris Dataset Growing a Tree Examining the Iris Classification Tree Pruning a Tree The Algorithm Resources https://www.kaggle.com/dmilla/introduction-to-decision-trees-titanic-dataset Introduction A decision tree, or Classification and Regression Trees (CART), is a model that recursively partitions the input space based on a collection of features. The partitions are split based on very simple binary choices. If yes, branch to the left; if no, branch to the right.