Table of Contents What is MapReduce? Hadoop Distributed File System (HDFS) MapReduce Overview Hadoop v2 AKA YARN Summary These are my personal notes from the book Fundamentals of Database Systems by (Elmasri and Navathe 2015). I highly recommend reading the original source material. The contents of the article should only serve as a brief overview of the topic.
What is MapReduce? MapReduce is a programming model for processing large datasets in parallel. It was originally developed by Jeffrey Dean and Sanjay Ghemawat at Google in 2004 (Dean and Ghemawat 2008). It is based on the functional programming paradigm and is inspired by the map and reduce functions in Lisp and other functional languages. The MapReduce programming model is implemented in the Hadoop framework.
Table of Contents Overview Data Fragmentation Data Replication Data Concurrency Distributed systems excel at partitioning large problems into smaller chunks that can be processed in parallel. This requires parallel thinking instead of serial thinking. Many algorithms and solutions that run serially may be easier to adapt to parallel applications than others.
Distributed solutions are the natural next step to scaling up a system. In the context of databases, the main challenges related to distribution, replication, distributed transactions, distributed metadata management, and distributed query processing.
Table of Contents NOSQL Characteristics for Distributed Systems NOSQL Data Models CAP Theorem Document-Based NOSQL Systems Key-Value NOSQL Systems Column-Based NOSQL Systems Graph-Based NOSQL Systems NOSQL refers to Not Only SQL. A NOSQL system is commonly a distributed one that focuses on semi-structured data storage, high performance, availability, replication and scalability. These type of systems developed to meet the needs of large-scale internet applications where a traditional SQL database could not.
Table of Contents History and Development Schemas Data Types Creation Constraints Retrieving Data Modifying Data Nested Queries Joined Tables Aggregate Functions Grouping WITH Clause Modifying Tables Summary History and Development Structured Query Language (SQL) is a database language for managing data in a relation DBMS. Its original inception was based on a paper by Edgar F. Codd in 1970 titled A Relational Model of Data for Large Shared Data Banks (Codd 1970). Two employees working at IBM in the 1970s, Donald D. Chamberlin and Raymond F. Boyce, developed the first version of SQL in 1974 (Chamberlin and Boyce 1974).
Table of Contents An Online RPG Database From Schema to Database Database Management Systems Creating our RPG Database Recommended Reading: Chapters 1 and 2 from (Elmasri and Navathe 2015)
Databases allow us to store, retrieve, and edit different types of data. They should be scalable, secure, and reliable. They should also be able to handle concurrent access and be able to recover from failures. There are multiple types of databases that are optimized for different use cases. Tabular data, for example, is typically stored in a relational database. Large format data such as images, videos, and audio are typically stored in a non-relational database.