Introduction to Big Data Essentials

Introduction to Big Data Essentials
Slide Note
Embed
Share

This module covers the fundamental concepts of big data, including the Business Imperative, CAP Theorem, Big Data Lambda Architecture, Batch Layer, Speed Layer, and Serving Layer. It delves into the importance of big data, the architecture behind handling large datasets, and the tools and services such as Windows Azure HDInsight utilized in this domain.

  • Big Data
  • Lambda Architecture
  • Windows Azure
  • HDInsight
  • CAP Theorem

Uploaded on Mar 10, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Microsoft Big Data Essentials Module 1 -Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya

  2. Why Big Data? Why Big Data? Agenda Big Big Data Lambda Architecture Data Lambda Architecture Getting started with Windows Azure HDInsight Getting started with Windows Azure HDInsight Service Service

  3. The Business Imperative 1. 2. 3. 4. Human Fault Tolerance Minimize CapEx Hyper Scale on Demand Low Learning Curve

  4. CAP Theorem C Consistency A P Availability Partition Tolerance

  5. Big Data Lambda Architecture

  6. Big Data Lambda Architecture Batch layer Stores master dataset Compute arbitrary views Speed layer Fast, incremental algorithms Batch layer eventually overrides speed layer Serving layer Random access to batch views Updated by batch layer

  7. The Batch Layer Stores master dataset (in append mode) Unrestrained computation Horizontally scalable High latency

  8. The Speed Layer Stream processing of data Stores a limited window of data Dynamic computation

  9. The Serving Layer Queries the batch and real-time views Merges the results

  10. Microsoft Lambda Architecture Support Federations in Windows Azure SQL Database Azure tables Memcached/MongoDB SQL Server database engine SQL Server VM: Columnstore indexes Azure Storage Explorer Microsoft Excel Power Query PowerPivot Power View Power Map Reporting Services LINQ to Hive Analysis Services Windows Azure HDInsight Azure Blob storage MapReduce, Hive, Pig, Oozie, SSIS Analysis Services StreamInsight

  11. Yahoo! Staging Database SQL Server Analysis Service (SSAS) Apache Hadoop Microsoft Excel and PowerPivot Other BI Tools and Custom Applications SQL Server Connector (Hadoop Hive ODBC) SQL Server Analysis Services (SSAS Cube) Hadoop Data Third Party Database + Custom Applications

  12. Ferranti Computer Systems Reactive Extensions (Rx) SQL Server Database (In- Memory OLTP) Microsoft Dynamics AX SQL Server Analysis Services SQL Server Reporting Services Windows Azure HDInsight Data Feed from Smart Meters Reactive Extensions (Rx) Microsoft Dynamics AX SQL Server Analysis Services SQL Server Reporting Services SQL Server (In-Memory OLTP)

  13. Windows Azure Storage

  14. Demo 1: Setting up the Windows Azure storage account Azure Blob storage Azure Storage Explorer

  15. Blob Storage Concepts Store large amounts of unstructured text or binary data with the fastest read performance Highly scalable, durable, and available file system Blobs can be exposed publically over HTTP Securely lock down permissions to blobs

  16. Getting started with HDInsight Service

  17. Demo 2: Setting up the Windows Azure HDInsight cluster Windows Azure HDInsight Azure Blob storage HDInsight Console

  18. Demo 3: Loading data into Windows Azure storage for use with HDInsight Windows Azure HDInsight Azure Blob storage HDInsight Console

  19. Easy Access to Data, Big & Small

  20. Easy Access to Data, Big & Small Key Features Search, Access & Shape Simplify access to public & corporate data Power Query Windows Azure Marketplace Windows Azure HDInsight Service Parallel Data Warehouse with Polybase Easily preview, shape, & format your data Combine with Unstructured Combine and refine data across multiple sources Gain insight across relational, unstructured, & semi-structured data Easily Manage & Query Common management of structured & unstructured data Query across relational DB & Hadoop with single T-SQL Query

  21. Getting Started with HDInsight http://blogs.msdn.com/b/windowsazure/archive/2013/03/ 19/getting-started-with-hdinsight.aspx Azure HDInsight and Azure Storage http://blogs.msdn.com/b/windowsazure/archive/2013/03/ 21/azure-hdinsight-and-azure-storage.aspx Learn more

  22. Questions?

Related


More Related Content