Tuesday, April 5, 2016

NoSQL categories and DynamoDB example

NoSQL has 4 categories:
Key-Value stores (AWS DynamoDB)
Document stores (MongoDB)
Graph stores (Nodes keep the data)
Wide Column / Column Family stores
In Key-Value store e.g. DynamoDB tables have rows, rows have key and value


Querying in Key-Value store








CAP theorem - Consistency, Availability, Partition Tolerance

CAP stands for:
Consistency, Availability, Partition Tolerance

SQL
priorities Consistency first and then Partition Tolerance

NoSQL
priorities Partition Tolerance first and then Availability



AWS - Big Data stack components components overview

AWS Big Data stack components overview

  • Elastic MapReduce (EMR)
  • Redshift
  • DynamoDB
  • Data Pipline (ETL tool)
  • Simple Storage Service (S3)
  • Jaspersoft AWS
  • Kinesis (streaming data)
Elastic MapReduce (MapReduce - processing algorythm)
Amazon implementation of Hadoop
Hadoop-on-Demand
Integrated with S3 (Simple Storage Service)
Amazon distro or MapR
MapR - Unlike other Hadoop distributions that require separate clusters for multiple applications, the MapR Platform is built to process both distributed files, database tables, and event streams in one unified layer – an engineering feat in its own right. This enables organizations to support both operational (e.g., HBase) and analytic apps (e.g., Apache Drill, Hive, or Impala) on one cluster, significantly reducing costs as you grow your big data deployment. https://www.mapr.com/why-hadoop/why-mapr
Redshift 
Cloud-based, Massively Parallel Processing (MPP), column store data warehouse.
Uses common relational, SQL technology.
Integrated with S3 and DynamoDB

DynamoDB
Based on Dynamo, Amazon's internal, seminal Key-Value store
Accommodates unstructured data - no schema needs to be declared
Replaced Amazon SimpleDB

    Data Pipline (ETL tool - Extract Transform Load)
    A workflow system for shaping data and moving data from table to table, DB to DB +=>
    Serves as an Integration tool for AWS Big Data stack components (moves components)
    Build pipelines graphically (WEB) or programmatically (scripts)
    Works on a scheduled, batch bases
    Integrates with RDS/MySQL (Relational Database Service from Amazon - SQL distributed solution)

    Important Acronyms
    AWS Amazon Web Services
    EC2 Elastic Compute Cloud
    AMI Amazon Machine Image
    S3 Simple Storage Service
    EMR Elastic MapReduce
    VPC Virtual Private Cloud
    IAM Identity and Access Management
    SSH Secure Socket Shell

    Getting Set Up with AWS
    Create an account
    Create a Key pair
    Create an S3 bucket
    Install SSH client
    Install S3 client
    Install SQL Workbench, drivers