Big Data Technologies in Information Technology

big data technologies n.w
1 / 16
Embed
Share

Explore the characteristics and design considerations of big data technologies such as Google File System (GFS) at the International Institute of Information Technology, Pune. Learn about big data volume, variety, velocity, veracity, and more.

  • Big Data Tech
  • Information Technology
  • Google File System
  • IT Institute
  • Data Volume

Uploaded on | 2 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Big Data Technologies Prof. Smita Wangikar Information Technology Department International Institute of Information Technology, I IT www.isquareit.edu.in

  2. Big Data Technologies Characteristics of Big Data Volume Variety Velocity Veracity Volume Internal and External Data Data that is owned by an organization Data that belongs to an entity other than the organization that wishes to acquire and use it. Structured and Unstructured Data International Institute of Information Technology, I IT, P-14, Rajiv Gandhi Infotech Park, Hinjawadi Phase 1, Pune - 411 057 Phone - +91 20 22933441/2/3 | Website - www.isquareit.edu.in | Email - info@isquareit.edu.in

  3. Google File System Design consideration Interface Architecture Chunk Size Metadata Client operations :Write Client operations: with Server Decoupling and Atomic Record Appends Master operations Logging, Where to put a chunk,Re-replication and Rebalancing Garbage Collection Fault Tolerance Summary ( Benefits ,limitations) International Institute of Information Technology, I IT, P-14, Rajiv Gandhi Infotech Park, Hinjawadi Phase 1, Pune - 411 057 Phone - +91 20 22933441/2/3 | Website - www.isquareit.edu.in | Email - info@isquareit.edu.in

  4. GFS Design consideration Built from cheap commodity hardware Expect large files: 100MB to many GB Support large streaming reads and small random reads Support large, sequential file appends Support producer-consumer queues for many-way merging and file atomicity Sustain high bandwidth by writing data in bulk International Institute of Information Technology, I IT, P-14, Rajiv Gandhi Infotech Park, Hinjawadi Phase 1, Pune - 411 057 Phone - +91 20 22933441/2/3 | Website - www.isquareit.edu.in | Email - info@isquareit.edu.in

  5. GFS Architecture International Institute of Information Technology, I IT, P-14, Rajiv Gandhi Infotech Park, Hinjawadi Phase 1, Pune - 411 057 Phone - +91 20 22933441/2/3 | Website - www.isquareit.edu.in | Email - info@isquareit.edu.in

  6. GFS ChunkSize Interface 64MB Much larger than typical file system block sizes Advantages from large chunk size Reduce interaction between client and master Client can perform many operations on a given chunk Reduces network overhead by keeping persistent TCP connection Reduce size of metadata stored on the master The metadata can reside in memory Store three major types Namespaces File and chunk identifier Mapping from files to chunks Location of each chunk replicas International Institute of Information Technology, I IT, P-14, Rajiv Gandhi Infotech Park, Hinjawadi Phase 1, Pune - 411 057 Phone - +91 20 22933441/2/3 | Website - www.isquareit.edu.in | Email - info@isquareit.edu.in

  7. GFS Client Operation Write Some chunkserver is primary for each chunk Master grants lease to primary (typically for 60 sec.) Leases renewed using periodic heartbeat messages between master and chunkservers Client asks master for primary and secondary replicas for each chunk Client sends data to replicas in daisy chain Pipelined: each replica forwards as it receives Takes advantage of full-duplex Ethernet links International Institute of Information Technology, I IT, P-14, Rajiv Gandhi Infotech Park, Hinjawadi Phase 1, Pune - 411 057 Phone - +91 20 22933441/2/3 | Website - www.isquareit.edu.in | Email - info@isquareit.edu.in

  8. GFS Client Operation Write International Institute of Information Technology, I IT, P-14, Rajiv Gandhi Infotech Park, Hinjawadi Phase 1, Pune - 411 057 Phone - +91 20 22933441/2/3 | Website - www.isquareit.edu.in | Email - info@isquareit.edu.in

  9. GFS Client OperationWrite with Issues control (metadata) requests to master server Issues data requests directly to chunkservers Caches metadata Does no caching of data No consistency difficulties among clients Streaming reads (read once) and append writes (write once) don t benefit much from caching at client International Institute of Information Technology, I IT, P-14, Rajiv Gandhi Infotech Park, Hinjawadi Phase 1, Pune - 411 057 Phone - +91 20 22933441/2/3 | Website - www.isquareit.edu.in | Email - info@isquareit.edu.in

  10. HDFS (Hadoop Distributed File System) A distributed file system that provides high-throughput access to application data HDFS uses a master/slave architecture in which one device (master) termed as NameNode controls one or more other devices (slaves) termed as DataNode. It breaks Data/Files into small blocks (128 MB each block) and stores on DataNode and each block replicates on other nodes to accomplish fault tolerance. NameNode keeps the track of blocks written to the DataNode International Institute of Information Technology, I IT, P-14, Rajiv Gandhi Infotech Park, Hinjawadi Phase 1, Pune - 411 057 Phone - +91 20 22933441/2/3 | Website - www.isquareit.edu.in | Email - info@isquareit.edu.in

  11. HDFS Architecture International Institute of Information Technology, I IT, P-14, Rajiv Gandhi Infotech Park, Hinjawadi Phase 1, Pune - 411 057 Phone - +91 20 22933441/2/3 | Website - www.isquareit.edu.in | Email - info@isquareit.edu.in

  12. Hadoops Map Reduce Engine International Institute of Information Technology, I IT, P-14, Rajiv Gandhi Infotech Park, Hinjawadi Phase 1, Pune - 411 057 Phone - +91 20 22933441/2/3 | Website - www.isquareit.edu.in | Email - info@isquareit.edu.in

  13. How Map Reduce Works? A method for distributing computation across multiple nodes Each node processes the data that is stored at that node Consists of two main phases Map Reduce The Mapper Reads data as key/value pairs The key is often discarded Outputs zero or more key/value pairs The Shuffle and Sort Output from the mapper is sorted by key All values with the same key are guaranteed to go to the same machine The Reducer Called once for each unique key Gets a list of all values associated with a key as input The reducer outputs zero or more final key/value pairs Usually just one output per input key International Institute of Information Technology, I IT, P-14, Rajiv Gandhi Infotech Park, Hinjawadi Phase 1, Pune - 411 057 Phone - +91 20 22933441/2/3 | Website - www.isquareit.edu.in | Email - info@isquareit.edu.in

  14. How Map Reduce Works? A method for distributing computation across multiple nodes Each node processes the data that is stored at that node Consists of two main phases Map Reduce The Mapper Reads data as key/value pairs The key is often discarded Outputs zero or more key/value pairs The Shffle and Sort Output from the mapper is sorted by key All values with the same key are guaranteed to go to the same machine The Reducer Called once for each unique key Gets a list of all values associated with a key as input The reducer outputs zero or more final key/value pairs Usually just one output per input key International Institute of Information Technology, I IT, P-14, Rajiv Gandhi Infotech Park, Hinjawadi Phase 1, Pune - 411 057 Phone - +91 20 22933441/2/3 | Website - www.isquareit.edu.in | Email - info@isquareit.edu.in

  15. Map Reduce Example International Institute of Information Technology, I IT, P-14, Rajiv Gandhi Infotech Park, Hinjawadi Phase 1, Pune - 411 057 Phone - +91 20 22933441/2/3 | Website - www.isquareit.edu.in | Email - info@isquareit.edu.in

  16. Thank You E-mail: smitaw@isquareit.edu.in International Institute of Information Technology, I IT, P-14, Rajiv Gandhi Infotech Park, Hinjawadi Phase 1, Pune - 411 057 Phone - +91 20 22933441/2/3 | Website - www.isquareit.edu.in | Email - info@isquareit.edu.in

Related


More Related Content