Memory Shared Systems with Multiple Processors

slide1 n.w
1 / 43
Embed
Share

Explore the benefits and drawbacks of shared memory systems and shared disk systems, along with their applications and advantages in modern computing environments. Learn about efficient data accessibility, fault tolerance, scalability, and more in these interconnected systems.

  • Memory Systems
  • Shared Memory
  • Disk Systems
  • Fault Tolerance
  • Scalability

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Shared memory system Shared memory system uses multiple processors which is attached to a global shared memory via intercommunication channel or communication bus. Shared memory system have large amount of cache memory at each processors, so referencing of the shared memory is avoided. If a processor performs a write operation to memory location, the data should be updated or removed from that location.

  2. Advantages of Shared memory systemData is easily accessible to any processor. One processor can send message to other efficiently. Disadvantages of Shared memory systemWaiting time of processors is increased due to more number of processors. Bandwidth problem.

  3. Shared Disk System Shared disk system uses multiple processors which are accessible to multiple disks via intercommunication channel and every processor has local memory. Each processor has its own memory so the data sharing is efficient. The system built around this system are called as clusters.

  4. Advantages of Shared Disk System Fault tolerance is achieved using shared disk system. Fault tolerance: If a processor or its memory fails, the other processor can complete the task. This is called as fault tolerance. Disadvantage of Shared Disk System Shared disk system has limited scalability as large amount of data travels through the interconnection channel. If more processors are added the existing processors are slowed down.

  5. Applications of Shared Disk System Digital Equipment Corporation(DEC): DEC cluster running relational databases use the shared disk system and now owned by Oracle. Shared nothing disk system Each processor in the shared nothing system has its own local memory and local disk. Processors can communicate with each other through intercommunication channel. Any processor can act as a server to serve the data which is stored on local disk.

  6. Advantages of Shared nothing disk system Number of processors and disk can be connected as per the requirement in share nothing disk system. Shared nothing disk system can support for many processor, which makes the system more scalable. Disadvantages of Shared nothing disk systemData partitioning is required in shared nothing disk system. Cost of communication for accessing local disk is much higher. Applications of Shared nothing disk systemTera data database machine. The Grace and Gamma research prototypes.

  7. Hierarchical System or Non-Uniform Memory Architecture Hierarchical model system is a hybrid of shared memory system, shared disk system and shared nothing system. Hierarchical model is also known as Non-Uniform Memory Architecture (NUMA). In this system each group of processor has a local memory. But processors from other groups can access memory which is associated with the other group in coherent. NUMA uses local and remote memory(Memory from other group), hence it will take longer time to communicate with each other. Advantages of NUMAImproves the scalability of the system. Memory bottleneck(shortage of memory) problem is minimized in this architecture. Disadvantages of NUMA The cost of the architecture is higher compared to other architectures.

  8. Intra-Query Parallelism Intra-query parallelism is a form of parallelism in the evaluation of database queries, in which a single query is decomposed into smaller tasks that execute concurrently on multiple processors.

  9. Intra-query parallelism is achieved when several processors cooperate in the execution of a single query to improve the query s response time. Intra-query parallelism is orthogonal to inter-query parallelism, in which multiple independent requests execute concurrently on several processors to improve the overall system throughput.

  10. There exist two forms of intra-query parallelism: operator-level parallelism and intra-operator parallelism. Operator-level parallelism is obtained by executing concurrently several operators of the same query. For example, consider a simple query that consists of a scan operator and an aggregation. The scan operator uses a selection condition to filter tuples. The aggregation calculates some statistics over all...

  11. Introduction to Database Parallelism Database Parallelism is a method of implementing parallel processing in a database. An attempt to increase database functioning in a short period of time. Parallelism in the database covers all the operations that are usually going through in a database, like loading the data, transforming the data, executing the queries, etc. concurrently. This helps to improve data consistency, system performance, parallel access of data, by making use of multiple data sources, multiple system s memory usages, multiple disk space occupation, Hierarchical flow with lesser time spent compared to individual process execution.

  12. Types of Database Parallelism Parallelism has the following types as below: Interquery Parallelism Independent Parallelism Pipelined Parallelism Intraoperative Parallelism

  13. 1. Interquery Parallelism In interquery parallelism, there are different queries or transactions which are run in parallel. By doing this the throughput increases. The response time of transactions which are present will not be faster than the ones when running in isolation. The main purpose of interquery parallelism is that you can increase in transaction processing. It supports a significant number of transactions per second. The advantage of interquery parallelism is the implementation of multi- server and multithreaded systems.

  14. It can efficiently handle a large number of client requests in a few seconds. When multiple requests are submitted then the system can execute the requests in parallel and increase the throughput. There are different server threads that can handle multiple requests at the same time.

  15. Interquery parallelism does not speed up the process as there is only one processor to take care of the query which is being executed. Every query is independent and relatively takes a very short time to execute. The more the number of users more the queries will be generated.

  16. Without interquery parallelism, all queries will perform like a single processor in a time-shared manner. The queries are distributed over multiple processors. The interquery parallelism can be implemented successfully on SMP systems where the throughput can also be increased, and it supports concurrent users as well.

  17. 2. Intraquery Parallelism Intraquery parallelism defines the execution of a query on multiple disks. Intraquery parallelism is capable of breaking a single query into multiple sub-tasks. These subtasks which are created can run in parallel using different processors for each. As a result of this, the overall elapsed time is the time needed to execute a single query. This kind of query is useful in systems where decisions are to be made.

  18. The decision support systems have long complex queries that are complex for the system as well. These systems are widely being used and the database vendors are thus increasing support for this type of query parallelism. The application decomposes the serial SQL. This happens when the query decomposes to lower-level operations like scan, join, sort and aggregation.

  19. The lower level operations thus distinguished are executed concurrently. This parallelism divides the database operation like index creation, database load, or SQL queries. These can be executed in parallel in a single database partition. This can be used as an advantage of multiple processors of the multiprocessor server. This parallelism takes advantage of data parallelism and pipeline parallelism.

  20. It scans large indexes and tables. The index or data being used can be partitioned dynamically and queries can be executed in parts. The data can be partitioned based on key values whereas the table can be scanned and partitioned accordingly. It carries distinct operations that will be executed parallelly.

  21. 3. Pipelined Parallelism Pipeline partition breaks the task into the sequence of processing stages. As the concept of pipeline works, it takes the output of previous input and the results are giving as input to the next stage. It is limited and has limited scalability. It can parallelize all the tasks which are dependent and as a result can allow more cases or results to run in parallel.

  22. A stage can consume multiple values before it sends an output which can affect the overall pipelining. The staged reading will start when one processor is being used and the pipeline starts filling with the data which is being read. The next stage will start running on another processor when data is there in the pipeline process and start filling the next pipeline.

  23. 4. Intraoperative Parallelism When a single relational operator given in a query works then it is intraoperative parallelism. In short, it paralyzes the execution of an individual query. Consider a query which is having joins. The query will be joining two tables on a particular common attribute. Parallelism is needed when the tables are huge in size. The order of tuples in a database does not matter in a relational database.

  24. As a result, the tables can be arranged randomly. When a join is involved it is important that each record is matched with every other record in order to complete the join process. Parallelism helps in having the better performance of this query. Many relational operations are present which can help in parallel execution.

  25. There are subsets of the query created which can involve many relational operators or sorting techniques so that operations can take place in parallel. The operations can be range partitioning sort, parallel external sort-merge, partitioned join, fragment and replicate join, partitioned parallel hash join, projection, aggregation, etc. The breaking of any individual query hence helps in improved performance.

  26. Advantages and Disadvantages of Database Parallelism Following advantages and disadvantages are explained below. It helps in breaking a query and running it over multiple nodes. It has different types that work in optimizing the process and providing better results. Parallelism breaks the queries and runs different threads of data. The resources are distributed and uniformly used. Parallelism improves the performance of the system. The disadvantage of database parallelism is that it is not scalable and is limited.

  27. Conclusion Thus it is the most efficient way of using a database. Distributing the data helps in using the resources in a utilized way. Parallelism improves system performance and helps in maintaining data properly. A large task when divided into smaller tasks hence speeds up the process.

  28. Comparison of Data-Partitioning Strategies in Parallel Database

  29. The workload of the system a. What is the size of the database? b. How many users are accessing the system? c. What type of accesses they are? (reading/writing) and so on. What type of data-accesses? (Scanning the entire relation, Point queries, Range queries).

  30. The nature of data stored in the table. a. The data types b. Cardinality of data of every column (uniqueness of values stored in a column) etc. The above are not the complete list. Different systems try with different options. In this post, let us discuss about select of partitioning strategies based on the access type on data, i.e, based on the type of query which tries to scan the entire relation, range queries or point queries.

  31. Let us use the following three queries for explaining the concept; A. SELECT * FROM Employee WHERE Emp_ID = E101 ; (Assume Emp_ID is Primary key) B. SELECT * FROM Employee WHERE EName = Murugan ; (Assume EName is non-key attribute) C. SELECT * FROM Employee ORDER BY Phone; D. SELECT * FROM Employee WHERE Salary BETWEEN 10000 AND 20000;

  32. 1.Round-robin Technique Round-robin partitioning strategy partitions the given relational table into set of disks based on the position of the records. That is, it sends ithrecord into disk (i mod n). hence, this strategy ensures even distribution of records into available disks.

  33. Advantages: Suitable for queries which requires scanning the entire relation. For example, Query C can be handled efficiently by distributing data evenly among the available disks. Due to even distribution of records, the workload is well balanced among disks.

  34. The above said data-partitioning strategies are useful in partitioning data across several processors to establish a complete parallel system. Here, the term Complete Parallel System means the Parallel Database System which can deliver the expected improvement in the performance of the whole system. Hence, it cannot be achieved directly without considering how the database is going to be accessed by the end user and other related details. The following list would help us in choosing the right partitioning technique;

  35. Disadvantages: Records are not distributed in any order in this strategy. Hence, we cannot handle point queries and range queries efficiently (when compared to other strategies). So, the strategy is not well suited for Point and Range queries. Query A, B and D cannot be answered efficiently (when compared to other strategies. That is, for example, query b can be answered efficiently if Hash partitioning strategy is used)

  36. 2. HashPartitioning Hash partitioning strategy works by using a hash function which is designed to distribute the records into many disks based on the input values (partitioning attribute values).

  37. Advantages: Sequential scan can be done efficiently. Query C can be answered efficiently as the data are available in all the disks, sorting can be done in parallel Well balanced data distribution is possible provided that good portioning attributes are chosen and good hash function is chosen. Point queries can be executed very efficiently compare to Round- robin partitioning. The reason is, in Round-robin technique to answer query B, one has to search in all the disks. But in the case of Hash partitioning the similar values could be found in one location only. Hence, the other processors could be used to handle other queries. This leads to higher Transaction throughput.

  38. Disadvantages: It is difficult to answer range queries (when compared to Range partitioning technique). The data distributed unevenly. Hence, to answer Query D, we have to use all the processors. Not good for partitioning on non-partitioning attributes. Suppose, if the table Employee is partitioned using Hash partitioning on Emp_ID attribute, then query B has to scan all the disks

  39. 3. RangePartitioning Range partitioning strategy partitions the data based on the partitioning attributes values. We need to find set of range vectors on which we are about to partition. For example, the records with Salary range 100 to 5000 will be in disk 1, 5001 to 10000 in disk 2, and so on.

  40. Advantages: Sequential scan can be done in parallel using all the processors. Query C can be executed efficiently. Well balanced data distribution is possible, only if good partitioning vector is chosen. One has to choose the partitioning vector which would at least near equally into many disks.

  41. Point queries can be handled efficiently just like Hash partitioning (when compare to Round-robin technique). Queries A and B can be handled efficiently. For range queries, according to the range given in the query, we may need to scan one to few disks as whole. Hence, Range queries can be handled efficiently. Query D can be executed efficiently (if not execution skew present). Higher throughput and good response time.

  42. Disadvantages: If the query specifies large range values, i.e, if the result consists of large number of records in very few disks among the available disk, an I/O bottleneck may be created on those disks. This is called Execution skew. If the same query executed in Round-robin or Hash partitioned disks, we could get better performance compared to Range partitioning.

More Related Content