Computer Science

Distributed Databases

Table of Contents Overview Data Fragmentation Data Replication Data Concurrency Distributed systems excel at partitioning large problems into smaller chunks that can be processed in parallel. This requires parallel thinking instead of serial thinking. Many algorithms and solutions that run serially may be easier to adapt to parallel applications than others. Distributed solutions are the natural next step to scaling up a system. In the context of databases, the main challenges related to distribution, replication, distributed transactions, distributed metadata management, and distributed query processing.

NOSQL

Table of Contents NOSQL Characteristics for Distributed Systems NOSQL Data Models CAP Theorem Document-Based NOSQL Systems Key-Value NOSQL Systems Column-Based NOSQL Systems Graph-Based NOSQL Systems NOSQL refers to Not Only SQL. A NOSQL system is commonly a distributed one that focuses on semi-structured data storage, high performance, availability, replication and scalability. These type of systems developed to meet the needs of large-scale internet applications where a traditional SQL database could not.

Structured Query Language

Table of Contents History and Development Schemas Data Types Creation Constraints Retrieving Data Modifying Data Nested Queries Joined Tables Aggregate Functions Grouping WITH Clause Modifying Tables Summary History and Development Structured Query Language (SQL) is a database language for managing data in a relation DBMS. Its original inception was based on a paper by Edgar F. Codd in 1970 titled A Relational Model of Data for Large Shared Data Banks (Codd 1970). Two employees working at IBM in the 1970s, Donald D. Chamberlin and Raymond F. Boyce, developed the first version of SQL in 1974 (Chamberlin and Boyce 1974).

Introduction to Databases

Table of Contents An Online RPG Database From Schema to Database Database Management Systems Creating our RPG Database Recommended Reading: Chapters 1 and 2 from (Elmasri and Navathe 2015) Databases allow us to store, retrieve, and edit different types of data. They should be scalable, secure, and reliable. They should also be able to handle concurrent access and be able to recover from failures. There are multiple types of databases that are optimized for different use cases. Tabular data, for example, is typically stored in a relational database. Large format data such as images, videos, and audio are typically stored in a non-relational database.

Minimum Spanning Trees

Table of Contents Definition Finding the Minimum Spanning Tree Kruskal’s Algorithm Prim’s Algorithm Minimum spanning trees are undirected graphs that connect all of the vertices such that there are no redundant edges and the total weight is minimized. They are useful for finding the shortest path between two points in a graph. Useful application of MSTs include network design: it is useful to know the least expensive path with respect to either latency or resource cost for telecommunications networks, transportation networks, or electrical grids. approximation algorithms: MSTs can be used to approximate the solution to the traveling salesman problem. clustering: MSTs can be used to cluster data points in a graph. image segmentation: MSTs can be used to segment images into smaller regions. Definition Let \(G\) be a connected, undirected graph with edges \(E\), vertices \(V\), and edge weights \(w\). A minimum spanning tree is a subset \(T \subseteq E\) that connects all of the vertices such that the total weight is minimized. The original graph \(G\) is shown below.

Single-Source Shortest Paths

Table of Contents Definition Bellman-Ford Shortest Paths on a DAG Dijkstra’s Algorithm When you hear the term shortest path, you may think of the shortest physical distance between your current location and wherever it is you’re going. Finding the most optimal route via GPS is one of the most widely used mobile applications. Physical paths are not the only types we may wish to find a shortest path for. Other examples include:

Introduction to Graph Theory

Table of Contents What are Graphs? Graph Traversal Algorithms Breadth First Search Depth First Search What are Graphs? A graph is a data structure that is used to represent pairwise relationships between objects. Graphs are used in many applications, such as social networks, maps, and routing algorithms. These notes accompany the series of lectures on graphs for my Foundations of Computing course at the University of Texas - Arlington.

Red-Black Trees

Table of Contents Definition Operations Exercises Red-Black Trees are modified Binary Search Trees that maintain a balanced structure in order to guarantee that operations like search, insert, and delete run in \(O(\log n)\) time. Definition A red-black tree is a binary search tree with the following properties: Every node is either red or black. The root is black. Every NULL leaf is black. If a node is red, then both its children are black. For each node, all simple paths from the node to descendant leaves contain the same number of black nodes. Figure 1: Red-Black Tree from CLRS Chapter 13.

Binary Search Trees

Table of Contents Binary Search Trees Operations Analysis A $n$-ary tree is a graph-based data structure in which each node has up to \(n\) subnodes. It is supported by the following operations (not exclusive): Search Insert Delete Tree-based data structures are defined by the following properties. The size of a tree \(T\) is determined by the total number of nodes in \(T\). The root of a tree \(T\) is the starting point of \(T\). A leaf node of a tree \(T\) is a node that has no sub-nodes. The height of a tree is determined by the length of the shortest path between the root of \(T\) and the lowest leaf node of \(T\). If we limit the number of subnodes each node may have to 2, the structure becomes known as a binary tree. Limiting the structure in this way is of interest to us because of the efficiency benefits seen in operations applied to binary trees. If we narrow the scope of these trees further, we can define a binary search tree whose search operation, as the name might suggest, runs in \(\Theta(lg n)\) worst-case time.

Introduction to Data Structures

Table of Contents Introduction to Data Structures Review: Pointers Arrays Matrices Multi-Dimensional Arrays Stacks Queues Introduction to Data Structures Data structures are fundamental concepts in computer science that allow us to organize and store data in a way that enables efficient access and modification. They are essential building blocks for creating efficient and sophisticated computer programs and databases. Different types of data structures include arrays, linked lists, stacks, queues, trees, graphs, and many more, each serving a specific purpose and suited to specific applications.

Linked Lists

Table of Contents Singly-Linked Lists Doubly-Linked Lists Operations Exercises A linked list is a dynamic and aggregate data structure made up of a collection of nodes. The nodes of a linked list can store any data type and are not enforced to contain the same data type. A basic node structure may be defined as struct node { void *data; struct node *next; }; Singly-Linked Lists A singly-linked list, the most basic form, has a reference to the first node in the list, called the head, and a single link between each node.

Complexity Analysis

Table of Contents The notation of complexity analysis Formal Definition of Asymptotic Notation Common Functions The notation of complexity analysis $O$-notation $O$-notation, often referred to as “Big Oh” notation, describes an upper bound on the behavior of a function. It really means that the function will not grow faster than the a given rate. This rate is typically the highest-order term in the function, and is often referred to as the “dominant term” or “dominant function”.