Dangling Tuples

Slide Note

Learn about dangling tuples in DBMS, which are tuples that do not participate in a natural join, potentially indicating consistency issues in the database. Referential integrity constraints play a key role in identifying and addressing dangling tuples.

ewan Follow

Uploaded on Mar 06, 2024 | 1 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript

Dangling Tuples

What is Dangling Tuples In DBMS if there is a tuple that does not participate in a natural join we called it as dangling tuple . It may gives indication consistency problem in the database. Another definition of dangling problem tuple is that a tuple with a foreign key value that not appear in the referenced relation is known as dangling tuple. In DBMS Referential integrity constraints specify us exactly when dangling tuples indicate problem.

Multivalued Dependency Multivalued dependency occurs when two attributes in a table are independent of each other but, both depend on a third attribute. A multivalued dependency consists of at least two attributes that are dependent on a third attribute that's why it always requires at least three attributes. BIKE_MODEL MANUF_YEAR COLOR M2011 2008 White M2001 2008 Black M3001 2013 White M3001 2013 Black M4006 2017 White M4006 2017 Black Here columns COLOR and MANUF_YEAR are dependent on BIKE_MODEL and independent of each other.

Transaction System The transaction refers to a small unit of any given program that consists of various low-level tasks. Every transaction in DBMS must maintain ACID A (Atomicity), C (Consistency), I (Isolation), D (Durability). One must maintain ACID so as to ensure completeness, accuracy, and integrity of data.

ACID Properties

Atomicity: By this, we mean that either the entire transaction takes place at once or doesn t happen at all. There is no midway i.e. transactions do not occur partially. Each transaction is considered as one unit and either runs to completion or is not executed at all. It involves the following two operations. Abort: If a transaction aborts, changes made to the database are not visible. Commit: If a transaction commits, changes made are visible. Atomicity is also known as the All or nothing rule . Consider the following transaction T consisting of T1 and T2: Transfer of 100 from account X to account Y.

If the transaction fails after completion of T1 but before completion of T2.( say, after write(X) but before write(Y)), then the amount has been deducted from X but not added to Y. This results in an inconsistent database state. Therefore, the transaction must be executed in its entirety in order to ensure the correctness of the database state.

Consistency: This means that integrity constraints must be maintained so that the database is consistent before and after the transaction. It refers to the correctness of a database. Referring to the example above, The total amount before and after the transaction must be maintained. Total before T occurs Total after T occurs = Therefore, the database is consistent. Inconsistency occurs in case T1 completes but T2 fails. As a result, T is incomplete. = 500 400 + + 200 300 = = 700. 700.

Isolation: This concurrently without leading to the inconsistency of the database state. Transactions occur independently Changes occurring in a particular transaction will not be visible to any other transaction until that particular change in that transaction is written to memory or has been committed. This property ensures that the execution of transactions concurrently will result in a state that is equivalent to a state achieved these were executed serially in some Let X= 500, Y = 500. property ensures that multiple transactions can occur without interference. order.

Durability: This property ensures that once the transaction has completed execution, the updates and modifications to the database are stored in and written to disk and they persist even if a system failure occurs. These updates now become permanent and are stored in non-volatile memory. The effects of the transaction, thus, are never lost.

Important points: Responsibility for maintaining properties Property Atomicity Transaction Manager Consistency Application programmer Isolation Concurrency Control Manager Durability Recovery Manager

Serializability Serializability of schedules ensures that a non-serial schedule is equivalent to a serial schedule. It helps in maintaining the transactions to execute simultaneously without interleaving one another.

Schedules and Serializable Schedules in DBMS Schedules are a series of operations performing one transaction to the other. R(X) means Reading the value: X; and W(X) means Writing the value: X.

Schedules in DBMS are of two types: Serial Schedule -A schedule in which only one transaction is executed at a time, i.e., one transaction is executed completely before starting another transaction.

Example Transaction-1 Transaction-2 R(a) W(a) R(b) W(b) R(b) W(b) R(a) W(a)

Non-serial Schedule- A schedule in which the transactions are interleaving or interchanging. There are several transactions executing simultaneously as they are being used in performing real-world database operations. These transactions may be working on the same piece of data. Hence, the serializability of non-serial schedules is a major concern so that our database is consistent before and after the execution of the transactions.

Example Transaction-1 Transaction-2 R(a) W(a) R(b) W(b) R(b) R(a) W(b) W(a)

What is a serializable schedule? A non-serial schedule is called a serializable schedule if it can be converted to its equivalent serial schedule. In simple words, if a non-serial schedule and a serial schedule result in the same then the non-serial schedule is called a serializable schedule.

Testing of Serializability To test the serializability of a schedule, we can use Serialization Graph or Precedence Graph. A serialization Graph is nothing but a Directed Graph ofthe entire transactions ofa schedule. It can be defined as a Graph G(V, E) consisting of a set of directed- edges E = {E1, E2, E3, ..., En} and a set of vertices V = {V1, V2, V3, ...,Vn}. The set of edges contains one of the two operations - READ, WRITE performed by acertain transaction.

Cont.. Ti -> Tj, means Transaction-Ti is either performing read or write before the transaction-Tj.

Example If there is a cycle present in the serialized graph then the schedule is non-serializable because the cycle resembles that one transaction is dependent on the other transaction and vice versa. It also means that there are one or more conflicting pairs of operations in the transactions. On the other hand, no-cycle means that the non-serial schedule is serializable.

What is a conflicting pair in transactions? Two operations inside a schedule are called conflicting if they meet these three conditions: 1.They belong to two different transactions. 2.They are working on the same data piece. 3.One of them is performing the WRITE operation. To conclude, let s take two operations on data: "a". The conflicting pairs are: 1.READ(a) - WRITE(a) 2.WRITE(a) - WRITE(a) 3.WRITE(a) - READ(a)

Recoverability Recoverability A transaction may not execute completely due to hardware failure, system crash or software issues. In that case, we have to roll back the failed transaction. But some other transaction may also have used values produced by the failed transaction. So, we have to roll back those transactions as well.

Recoverable Schedules Schedules in which transactions commit only after all transactions whose changes they read commit are called recoverable schedules. In other words, if some transaction Tjis reading value updated or written by some other transaction Ti, then the commit of Tjmust occur after the commit of Ti. Example- Given schedule follows order of Ti->Tj => C1->C2. Transaction T1 is executed before T2 hence there is no chances of conflict occur. R1(x) appears before W1(x) and transaction T1 is committed before T2 i.e. completion of first transaction performed first update on data item x, hence given schedule is recoverable. S1: R1(x), W1(x), R2(x), R1(y), R2(y), W2(x), W1(y), C1, C2;

Example 2 Example 2 Consider the following schedule involving two transactions T1and T2. T1 T2 R(A) W(A) W(A) R(A) commit commit This is a recoverable schedule since T1commits before T2, that makes the value read by T2correct.

Irrecoverable Schedule The table below shows a schedule with two transactions, T1 reads and writes A and that value is read and written by T2. T2 commits. But later on, T1 fails. So we have to rollback T1. Since T2 has read the value written by T1, it should also be rollbacked. But we have already committed that. So this schedule is irrecoverable schedule. When Tj is reading the value updated by Ti and Tj is committed before committing of Ti, the schedule will be irrecoverable.

Recoverable with Cascading Rollback The table below shows a schedule with two transactions, T1 reads and writes A and that value is read and written by T2. But later on, T1 fails. So we have to rollback T1. Since T2 has read the value written by T1, it should also be rollbacked. As it has not committed, we can rollback T2 as well. So it is recoverable with cascading rollback. Therefore, if Tj is reading value updated by Ti and commit of Tj is delayed till commit of Ti, the schedule is called recoverable with cascading rollback.

Cascadeless Recoverable Rollback The table below shows a schedule with two transactions, T1 reads and writes A and commits and that value is read by T2. But if T1 fails before commit, no other transaction has read its value, so there is no need to rollback other transaction. So this is a Cascadeless recoverable schedule. So, if Tj reads value updated by Ti only after Ti is committed, the schedule will be cascadeless recoverable.

Log and log records The log is a sequence of log records, recording all the update activities in the database. In a stable storage, logs for each transaction are maintained. Any operation which is performed on the database is recorded is on the log. Prior to performing any modification to database, an update log record is created to reflect that modification. An update log record represented as: <Ti, Xj, V1, V2> has these fields: 1.Transaction identifier: Unique Identifier of the transaction that performed the write operation. 2.Data item: Unique identifier of the data item written. 3.Old value: Value of data item prior to write. 4.New value: Value of data item after write operation.

Other type of log records are: 1.<Ti start>: It contains information about when a transaction Ti starts. 2.<Ti commit>: It contains information about when a transaction Ti commits. 3.<Ti abort>: It contains information about when a transaction Ti aborts.

Undo and Redo Operations Undo and Redo Operations Because all database modifications must be preceded by creation of log record, the system has available both the old value prior to modification of data item and new value that is to be written for data item. This allows system to perform redo and undo operations as appropriate: 1.Undo: using a log record sets the data item specified in log record to old value. 2.Redo: using a log record sets the data item specified in log record to new value.

Checkpoint The checkpoint is a type of mechanism where all the previous logs are removed from the system and permanently stored in the storage disk. The checkpoint is like a bookmark. While the execution of the transaction, such checkpoints are marked, and the transaction is executed then using the steps of the transaction, the log files will be created. When it reaches to the checkpoint, then the transaction will be updated into the database, and till that point, the entire log file will be removed from the file. Then the log file is updated with the new step of transaction till next checkpoint and so on. The checkpoint is used to declare a point before which the DBMS was in the consistent state, and all transactions were committed.

Recovery using Checkpoint The recovery system reads log files from the end to start. It reads log files from T4 to T1. Recovery system maintains two lists, a redo-list, and an undo-list. The transaction is put into redo state if the recovery system sees a log with <Tn, Start> and <Tn, Commit> or just <Tn, Commit>. In the redo-list and their previous list, all the transactions are removed and then redone before saving their logs.

Concurrency Control Concurrency Control is provided in a database to: Enforce isolation among transactions. Preserve database consistency through consistency preserving execution of transactions. Resolve read-write and write-read conflicts.

Concurrency Control Techniques are 1.Two-phase locking Protocol 2.Time stamp ordering Protocol 3.Multi version concurrency control 4.Validation concurrency control

Two Phase Locking Locking is an operation which secures: permission to read, OR permission to write a data item. Two phase locking is a process used to gain ownership of shared resources without creating the possibility of deadlock. The 3 activities taking place in the two phase update algorithm are: (i) Lock Acquisition (ii) Modification of Data (iii) Release Lock

Cont.. Two phase locking prevents deadlock from occurring in distributed systems by releasing all the resources it has acquired, if it is not possible to acquire all the resources required without waiting for another process to finish using a lock. This means that no process is ever in a state where it is holding some shared resources, and waiting for another process to release a shared resource which it requires. This means that deadlock cannot occur due to resource contention. A transaction in the Two Phase Locking Protocol can assume one of the 2 phases:

Cont.. Growing Phase: In this phase a transaction can only acquire locks but cannot release any lock. The point when a transaction acquires all the locks it needs is called the Lock Point. Shrinking Phase: In this phase a transaction can only release locks but cannot acquire any.

Time Stamp Ordering A timestamp is a tag that can be attached to any transaction or any data item, which denotes a specific time on which the transaction or the data item had been used in any way. A timestamp can be implemented in 2 ways. One is to directly assign the current value of the clock to the transaction or data item. The other is to attach the value of a logical counter that keeps increment as new timestamps are required. The timestamp of a data item can be of 2 types: (i) W-timestamp(X): This means the latest time when the data item X has been written into. (ii) R-timestamp(X): This means the latest time when the data item X has been read from. These 2 timestamps are updated each time a successful read/write operation is performed on the data item X.

Multiversion Concurrency Control Multiversion schemes keep old versions of data item to increase concurrency. Multiversion 2 phase locking: Each successful write results in the creation of a new version of the data item written. Timestamps are used to label the versions. When a read(X) operation is issued, select an appropriate version of X based on the timestamp of the transaction

Validation Concurrency Control The optimistic approach is based on the assumption that the majority of the database operations do not conflict. The optimistic approach requires neither locking nor time stamping techniques. Instead, a transaction is executed without restrictions until it is committed. Using an optimistic approach, each transaction moves through 2 or 3 phases, referred to as read, validation and write.

Cont.. (i) During read phase, the transaction reads the database, executes the needed computations and makes the updates to a private copy of the database values. All update operations of the transactions are recorded in a temporary update file, which is not accessed by the remaining transactions. (ii) During the validation phase, the transaction is validated to ensure that the changes made will not affect the integrity and consistency of the database. If the validation test is positive, the transaction goes to a write phase. If the validation test is negative, he transaction is restarted and the changes are discarded. (iii) During the write phase, the changes are permanently applied to the database.

RDBMS A relational database is a type of database that stores and provides access to data points that are related to one another. Relational databases are based on the relational model, an intuitive, straightforward way of representing data in tables. In a relational database, each row in the table is a record with a unique ID called the key. The columns of the table hold attributes of the data, and each record usually has a value for each attribute, making it easy to establish the relationships among data points.

RDBMS DBMS Data stored is in table format Data stored is in the file format Multiple data elements are accessible together Individual access of data elements Data in the form of a table are linked together No connection between data Normalisation is not achievable There is normalisation Support distributed database No support for distributed database Data is stored in a large amount Data stored is a small quantity

Here, redundancy of data is reduced with the help of key and indexes in RDBMS Data redundancy is common RDBMS supports multiple users DBMS supports a single user It features multiple layers of security while handling data There is only low security while handling data The software and hardware requirements are higher The software and hardware requirements are low Oracle, SQL Server. XML, Microsoft Access.

Concept of Table Spaces A table space is a storage structure containing tables, indexes, large objects, and long data. They are used to organize data in a database into logical storage groupings that relate to where data is stored on a system. Table spaces are stored in database partition groups.

Segments, Extents and Block Oracle allocates database space for all data in a database. The units of logical database allocation are data blocks, extents, and segments. The following illustration shows the relationships between these data structures:

Data Blocks Data Blocks At the finest level of granularity, Oracle stores data in data blocks (also called logical blocks, Oracle blocks, or pages). One data block corresponds to a specific number of bytes of physical database space on disk. You set the data block size for every Oracle database when you create the database. This data block size should be a multiple of the operating system's block size within the maximum limit. Oracle data blocks are the smallest units of storage that Oracle can use or allocate. In contrast, all data at the physical, operating system level is stored in bytes. Each operating system has what is called a block size. Oracle requests data in multiples of Oracle blocks, not operating system blocks. Therefore, you should set the Oracle block size to a multiple of the operating system block size to avoid unnecessary I/O. Oracle manages the storage space in the datafiles of a database in units called data blocks. A data block is the smallest unit of I/O used by a database.

Extents The next level of logical database space is called an extent. An extent is a specific number of contiguous data blocks that is allocated for storing a specific type of information.

Dangling Tuples

Download Presentation

Presentation Transcript

Related

More Related Content