The Evolution of Data Storage Technologies in the Post-Relational Era

the nosql movement or the dawn of the post n.w

1 / 19

Embed Share

Explore the shift towards NoSQL databases and the forces driving this change. Learn about the history of specialized databases, scalability challenges, and use cases for terabytes of data storage and batch data analysis. Discover how NoSQL is not a new concept but a response to the evolving needs of modern applications.

jveron Follow

Uploaded on Apr 03, 2025 | 2 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

The NoSQL movement or the dawn of the post-relational age

What is the buzz? Job Trends Search Trends Twitter search

Something for your CV

NoSQL Not only SQL or No Sql - No SQL support Support for the full SQL language imposes constraints on datastores. So does ACID compliance. So does the need for a fixed database schema. Many applications need more specialised datastores. A movement for choice in database architecture CouchBase survey Mike Loukides at O'Reilly an excellent overview Polyglot Persistance by Martin Fowler Wikipedia Comparision nosql-databases.org - a rather terrifying set of resources. Tim Anglade's compilation of Interviews

NoSQL is not new Despite the wide-spread adoption of the relational data model for business application, there have always been a wide variety of specialised databases: o Geographic Information Systems - complex spatial relationships - ArcGIS e.g. BCC KnowYourPlace o OLAP - OnLine Analytic Processing - for analysis of transaction data o Free Text databases eg. LexisNexis for legal documents o Multi-dimensional sparse arrays - Pick and MUMPS o Object-oriented databases - eg ZOPE for the Plone CMS o These databases were directed at the need for complex and flexible data structures.

Forces for change o Volume of data - Facebook has over 30 Petabytes - 30,000 terabytes or 30 million Gigabytes o Volume of transactions - order of 1 million writes/sec o Changeability/flexibility of schema - constant beta o Complexity of data - UK Legislation

Use case: Terabytes of data need to be stored reliably with no schema requirements Reliability is a big problem when volumes are large. In a farm of say, 1000 servers, each with 8 spindles , there is a high probability that one disk will be down at any time. Random access update is too slow - append new data and merge in batch o BigTable from Google o HBase from Apache o Dynamo from Amazon Doug Cutting on Apache's Hadoop

Use case: Batch data analysis Where very large transaction datasets need to be filtered and summarised, for example to analysis log files by IP location. In the past these could have been overnight jobs,now they need to be done in at most minutes. Map-Reduce is an architecture for large-scale distributed computation. MapReduce should be called MapMergeReduce. Each MapReduce task is written in Java (or a high-level language like Pig). The operating system (like Hadoop) coordinates the distribution of the map, merge and reduce jobs and the dataflows. o input is a database of key-value pairs which are split ('sharded') over many spindles on many servers. o the user's map operation runs on every server hosting the shards and transforms each key/value input into 0,one or more key/value outputs. o Merge (shuffle) merges all pairs for the same key and distributes them (e.g. by hashing the keys) to multiple Reduce servers. This to can be user configurable. o the user's reduce takes each group of values for the same key and produces zero, one or more key/values for each group. Successive MapMergeReduce operations can be chained together in a pipeline.

Map / Reduce

Use case: Document storage and retrieval Document store Complex hierarchical documents present problems for storing in a relational database. Every repeated part of the document would stored in its own table -Shredding; each repeated part would need to be link to is parent with a key; to reconstruct the document would require multiple joins from data distributed all over the file system. Platforms: o eXist eXist open source XML store - query with XQuery o MarkLogic MarkLogic commercial XML store o CouchDb JSON store - query with JavaScript o MongoDb JSON store Telemetric data precessing

Use case: Fast put/get of keyed data Key-value store Where complex data is to be stored but the database is not interested in the internal structure. For example storing session data, user profiles, shopping carts The only operations are value = store.get(key) store.put(key, value) store.delete(key) Platforms: Project Voldemort Rhino

Use case: Page Caching Key-value cache Where the generation of a page takes a significant time, it is better to cache the pages as key/value pairs where the key is a URI and the value is the HTML page. As much of the cache as poosible is kept in RAM for rapid access Issues: cache flushing For example this site views summarized data from an eXist document store: AidView Platforms: Memecached

Use case: Linked data Graph Database Where data is composed of simple, highly interrelated facts. For example, there is an RDF version of Wikipedia called dbpedia. Some use available databases such as MySQL, but the specific form of the data and the queries on the data suggest native Triple (usually quad) stores to support RDF - Jena, Sesame Virtuoso- query with SPARQL . RDF has a rigid data model : [graph] subject- predicate- object and is widely used for linked data Custom Graph stores - Neo4J non standard interfaces

XML/XQuery for graphs tutorial for using Neo4j to compute relationships in a graph Friends relationship Some friends as XML a bit of XQuery The knows relationship expanded Permissions People Roles a bit of XQuery People and permissions Shortest Path is difficult - Dijkstra's algorithm is tricky to implement in functional languages

NoSQL DB types and databases

NoSQL DB characteristics (summary)

NoSQL DB characteristics (summary)

Dan McCreary's Overview The CIO's Guide to NoSQL

Risks o Lack of standardisation o New technology o Design cul-de-sac - requirements change o Lack of available developer skills. o R DMBS like Oracle and SQL Server are changing too - but just get more complex. A dissenting view - warning - NSFW

The Evolution of Data Storage Technologies in the Post-Relational Era

Download Presentation

Presentation Transcript

Related

More Related Content