Llms

Low Rank Adaptation

Table of Contents Key Concepts Key Concepts Traditional Fine-Tuning Fine-tuning a model for a specific task can be expensive if the entire weight matrix is updated. LLMs range from billions to trillions of parameters, making fine-tuning infeasible for many applications. Low Rank Decomposition Low Rank Adaptation (LoRA) is a method of decomposing the weight update matrix \(\Delta W\) into smaller matrices \(A\) and \(B\) such that \(\Delta W \approx AB\). The rank \(r\) of the decomposition is a hyperparameter that can be tuned to balance performance and computational cost (Hu et al. 2021).

The Language of LLMs

How do LLMs read and process the high dimensional landscape of text efficiently? Presented as a workshop at UTA's Datathon on April 13, 2024.

Transformers

Table of Contents Introduction Definition Attention Key-value Store Scaled Dot Product Attention Multi-Head Attention Encoder-Decoder Architecture Encoder Decoder Usage Resources Introduction The story of Transformers begins with “Attention Is All You Need” (Vaswani et al., n.d.). In this seminal work, the authors describe the current landscape of sequential models, their shortcomings, and the novel ideas that result in their successful application.