Introduction to Big Data Essentials
This module covers the fundamental concepts of big data, including the Business Imperative, CAP Theorem, Big Data Lambda Architecture, Batch Layer, Speed Layer, and Serving Layer. It delves into the importance of big data, the architecture behind handling large datasets, and the tools and services such as Windows Azure HDInsight utilized in this domain.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Microsoft Big Data Essentials Module 1 -Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya
Why Big Data? Why Big Data? Agenda Big Big Data Lambda Architecture Data Lambda Architecture Getting started with Windows Azure HDInsight Getting started with Windows Azure HDInsight Service Service
The Business Imperative 1. 2. 3. 4. Human Fault Tolerance Minimize CapEx Hyper Scale on Demand Low Learning Curve
CAP Theorem C Consistency A P Availability Partition Tolerance
Big Data Lambda Architecture
Big Data Lambda Architecture Batch layer Stores master dataset Compute arbitrary views Speed layer Fast, incremental algorithms Batch layer eventually overrides speed layer Serving layer Random access to batch views Updated by batch layer
The Batch Layer Stores master dataset (in append mode) Unrestrained computation Horizontally scalable High latency
The Speed Layer Stream processing of data Stores a limited window of data Dynamic computation
The Serving Layer Queries the batch and real-time views Merges the results
Microsoft Lambda Architecture Support Federations in Windows Azure SQL Database Azure tables Memcached/MongoDB SQL Server database engine SQL Server VM: Columnstore indexes Azure Storage Explorer Microsoft Excel Power Query PowerPivot Power View Power Map Reporting Services LINQ to Hive Analysis Services Windows Azure HDInsight Azure Blob storage MapReduce, Hive, Pig, Oozie, SSIS Analysis Services StreamInsight
Yahoo! Staging Database SQL Server Analysis Service (SSAS) Apache Hadoop Microsoft Excel and PowerPivot Other BI Tools and Custom Applications SQL Server Connector (Hadoop Hive ODBC) SQL Server Analysis Services (SSAS Cube) Hadoop Data Third Party Database + Custom Applications
Ferranti Computer Systems Reactive Extensions (Rx) SQL Server Database (In- Memory OLTP) Microsoft Dynamics AX SQL Server Analysis Services SQL Server Reporting Services Windows Azure HDInsight Data Feed from Smart Meters Reactive Extensions (Rx) Microsoft Dynamics AX SQL Server Analysis Services SQL Server Reporting Services SQL Server (In-Memory OLTP)
Windows Azure Storage
Demo 1: Setting up the Windows Azure storage account Azure Blob storage Azure Storage Explorer
Blob Storage Concepts Store large amounts of unstructured text or binary data with the fastest read performance Highly scalable, durable, and available file system Blobs can be exposed publically over HTTP Securely lock down permissions to blobs
Getting started with HDInsight Service
Demo 2: Setting up the Windows Azure HDInsight cluster Windows Azure HDInsight Azure Blob storage HDInsight Console
Demo 3: Loading data into Windows Azure storage for use with HDInsight Windows Azure HDInsight Azure Blob storage HDInsight Console
Easy Access to Data, Big & Small
Easy Access to Data, Big & Small Key Features Search, Access & Shape Simplify access to public & corporate data Power Query Windows Azure Marketplace Windows Azure HDInsight Service Parallel Data Warehouse with Polybase Easily preview, shape, & format your data Combine with Unstructured Combine and refine data across multiple sources Gain insight across relational, unstructured, & semi-structured data Easily Manage & Query Common management of structured & unstructured data Query across relational DB & Hadoop with single T-SQL Query
Getting Started with HDInsight http://blogs.msdn.com/b/windowsazure/archive/2013/03/ 19/getting-started-with-hdinsight.aspx Azure HDInsight and Azure Storage http://blogs.msdn.com/b/windowsazure/archive/2013/03/ 21/azure-hdinsight-and-azure-storage.aspx Learn more