PANIOV: AWS - Big Data stack components components overview

Tuesday, April 5, 2016

AWS - Big Data stack components components overview

AWS Big Data stack components overview

Elastic MapReduce (EMR)
Redshift
DynamoDB
Data Pipline (ETL tool)
Simple Storage Service (S3)
Jaspersoft AWS
Kinesis (streaming data)

Elastic MapReduce (MapReduce - processing algorythm)

Amazon implementation of Hadoop

Hadoop-on-Demand

Integrated with S3 (Simple Storage Service)

Amazon distro or MapR

MapR - Unlike other Hadoop distributions that require separate clusters for multiple applications, the MapR Platform is built to process both distributed files, database tables, and event streams in one unified layer – an engineering feat in its own right. This enables organizations to support both operational (e.g., HBase) and analytic apps (e.g., Apache Drill, Hive, or Impala) on one cluster, significantly reducing costs as you grow your big data deployment. https://www.mapr.com/why-hadoop/why-mapr

Redshift
Cloud-based, Massively Parallel Processing (MPP), column store data warehouse.
Uses common relational, SQL technology.
Integrated with S3 and DynamoDB

DynamoDB
Based on Dynamo, Amazon's internal, seminal Key-Value store
Accommodates unstructured data - no schema needs to be declared
Replaced Amazon SimpleDB

Data Pipline (ETL tool - Extract Transform Load)

A workflow system for shaping data and moving data from table to table, DB to DB +=>

Serves as an Integration tool for AWS Big Data stack components (moves components)

Build pipelines graphically (WEB) or programmatically (scripts)

Works on a scheduled, batch bases

Integrates with RDS/MySQL (Relational Database Service from Amazon - SQL distributed solution)

Important Acronyms

AWS Amazon Web Services

EC2 Elastic Compute Cloud

AMI Amazon Machine Image

S3 Simple Storage Service

EMR Elastic MapReduce

VPC Virtual Private Cloud

IAM Identity and Access Management

SSH Secure Socket Shell

Getting Set Up with AWS

Create an account

Create a Key pair

Create an S3 bucket

Install SSH client

Install S3 client

Install SQL Workbench, drivers

Tuesday, April 5, 2016

AWS - Big Data stack components components overview

No comments: