Database System Concepts Interview Questions & Answers

  • Question 1. List Four Significant Differences Between A File-processing System And A Dbms?

    Answer :

    Some main differences between a database management system and a file-processing system are:

    • Both systems contain a collection of data and a set of programs which access that data. A database management system coordinates both the physical and the logical access to the data, whereas a file-processing system coordinates only the physical access.
    • A database management system reduces the amount of data duplication by ensuring that a physical piece of data is available to all programs authorized to have access to it,whereas data written by one program in a file-processing system may not be readable by another program.
    • A database management system is designed to allow flexible access to data (i.e., queries), whereas a file-processing system is designed to allow predetermined access to data (i.e., compiled programs).
    • A database management system is designed to coordinate multiple users accessing the same data at the same time. A file-processing systemis usually designed to allow one or more programs to access different data files at the same time. In a file-processing system, a file can be accessed by two programs concurrently only if both programs have read-only access to the file.
  • Question 2. This Chapter Has Described Several Major Advantages Of A Database System. What Are Two Disadvantages?

    Answer :

    Two disadvantages associated with database systems are listed below.

    1. Setup of the database system requires more knowledge, money, skills, and time.
    2. The complexity of the database may result in poor performance.
  • Python Interview Questions

  • Question 3. Explain The Difference Between Physical And Logical Data Independence?

    Answer :

    • Physical data independence is the ability to modify the physical scheme without making it necessary to rewrite application programs. Such modifications include changing from unblocked to blocked record storage, or from sequential to random access files.
    • Logical data independence is the ability to modify the conceptual scheme without making it necessary to rewrite application programs. Such a modification might be adding a field to a record; an application program’s view hides this change from the program.
  • Question 4. List Five Responsibilities Of A Database Management System?

    Answer :

    A general purpose database manager (DBM) has five responsibilities:

    1. interaction with the file manager.
    2. integrity enforcement.
    3. security enforcement.
    4. backup and recovery.
    5. concurrency control.
  • Python Tutorial

  • Question 5. What Are Five Main Functions Of A Database Administrator?

    Answer :

    Five main functions of a database administrator are:

    • To create the scheme definition
    • To define the storage structure and access methods
    • To modify the scheme and/or physical organization when necessary
    • To grant authorization for data access
    • To specify integrity constraints
  • Teradata Interview Questions

  • Question 6. List Seven Programming Languages That Are Procedural And Two That Are Non Procedural. Which Group Is Easier To Learn And Use?

    Answer :

    Programming language classification:

    • Procedural: C, C++, Java, Basic, Fortran, Cobol, Pascal
    • Non-procedural: Lisp and Prolog

    Note: Lisp and Prolog support some procedural constructs, but the core of both these languages is non-procedural. In theory, non-procedural languages are easier to learn, because they let the programmer concentrate on what needs to be done, rather than how to do it. This is not always true in practice, especially if procedural languages are learned first.

  • Question 7. List Six Major Steps That You Would Take In Setting Up A Database For A Particular Enterprise?

    Answer :

    Six major steps in setting up a database for a particular enterprise are:

    • Define the high level requirements of the enterprise (this step generates a document known as the system requirements specification.)
    • Define a model containing all appropriate types of data and data relationships.
    • Define the integrity constraints on the data.
    • Define the physical level.
    • For each known problem to be solved on a regular basis (e.g., tasks to be carried out by clerks or Web users) define a user interface to carry out the task, and write the necessary application programs to implement the user interface.
    • Create/initialize the database.
  • Teradata Tutorial Adv Java Interview Questions

  • Question 8. Consider A Two-dimensional Integer Array Of Size N ×m That Is To Be Used In Your Favorite Programming Language. Using The Array As An Example, Illustrate The Difference?

    1. Between The Three Levels Of Data Abstraction, And
    2. between A Schema And Instances?

    Answer :

    Let tgrid be a two-dimensional integer array of size n × m.

      • The physical level would simply be m × n (probably consecutive) storage locations of whatever size is specified by the implementation (e.g., 32 bits each).
      • The conceptual level is a grid of boxes, each possibly containing an integer, which is n boxes high by m boxes wide.
      • There are 2m×n possible views. For example, a view might be the entire array, or particular row of the array, or all n rows but only columns 1 through i.
      • Consider the following Pascal declarations: type tgrid = array[1..n, 1..m] of integer; var vgrid1, vgrid2 : tgrid Then tgrid is a schema, whereas the value of variables vgrid1 and vgrid2 are instances.
      • To illustrate further, consider the schema array[1..2, 1..2] of integer. Two instances of this scheme are:
  • Question 9. Explain The Distinctions Among The Terms Primary Key, Candidate Key, And Super Key?

    Answer :

    A super key is a set of one or more attributes that, taken collectively, allows us to identify uniquely an entity in the entity set. A super key may contain extraneous attributes. If K is a super key, then so is any super set of K. A super key for which no proper subset is also a super key is called a candidate key. It is possible that several distinct sets of attributes could serve as candidate keys. The primary key is one of the candidate keys that is chosen by the database designer as the principal means of identifying entities within an entity set.

  • SQL Database Interview Questions

  • Question 10. Construct An E-r Diagram For A Car-insurance Company Whose Customers Own One Or More Cars Each. Each Car Has Associated With It Zero To Any Number Of Recorded Accidents?

    Answer :

     

  • Adv Java Tutorial

  • Question 11. We Can Convert Any Weak Entity Set To A Strong Entity Set By Simply Adding Appropriate Attributes.why, Then, Do We Have Weak Entity Sets?

    Answer :

    We have weak entities for several reasons:

    • We want to avoid the data duplication and consequent possible inconsistencies caused by duplicating the key of the strong entity.
    • Weak entities reflect the logical structure of an entity being dependent on another entity.
    • Weak entities can be deleted automatically when their strong entity is deleted.
    • Weak entities can be stored physically with their strong entities.
  • Computer Science Engineering Interview Questions

  • Question 12. An E-r Diagram Can Be Viewed As A Graph.what Do The Following Mean In Terms Of The Structure Of An Enterprise Schema?

    1. the Graph Is Disconnected.<
    2. the Graph Is Acyclic.

    Answer :

    1. If a pair of entity sets are connected by a path in an E-R diagram, the entity sets are related, though perhaps indirectly. A disconnected graph implies that there are pairs of entity sets that are unrelated to each other. If we split the graph into connected components, we have, in effect, a separate database corresponding to each connected component.
    2. As indicated in the answer to the previous part, a path in the graph between a pair of entity sets indicates a (possibly indirect) relationship between the two entity sets. If there is a cycle in the graph then every pair of entity sets on the cycle are related to each other in at least two distinct ways. If the E-R diagram is acyclic then there is a unique path between every pair of entity sets and, thus, a unique relationship between every pair of entity sets.
  • Python Interview Questions

  • Question 13. A Weak Entity Set Can Always Be Made Into A Strong Entity Set By Adding To Its Attributes The Primary Key Attributes Of Its Identifying Entity Set. Outline What Sort Of Redundancy Will Result If We Do So?

    Answer :

    The primary key of a weak entity set can be inferred from its relationship with the strong entity set. If we add primary key attributes to the weak entity set, they will be present in both the entity set and the relationship set and they have to be the same. Hence there will be redundancy.

  • SQL Database Tutorial

  • Question 14. Explain The Distinction Between Condition-defined And User-defined Constraints. Which Of These Constraints Can The System Check Automatically?

    Answer :

    In a generalization–specialization hierarchy, it must be possible to decide which entities are members of which lower level entity sets. In a condition defined design constraint, membership in the lower level entity-sets is evaluated on the basis of whether or not an entity satisfies an explicit condition or predicate.User-defined lower-level entity sets are not constrained by a membership condition; rather, entities are assigned to a given entity set by the database user.

    Condition-defined constraints alone can be automatically handled by the system. Whenever any tuple is inserted into the database, its membership in the various lower level entity-sets can be automatically decided by evaluating the respective membership predicates. Similarly when a tuple is updated, its membership in the various entity sets can be re-evaluated automatically.

  • Question 15. Explain The Distinction Between Total And Partial Constraints?

    Answer :

    In a total design constraint, each higher-level entity must belong to a lower-level entity set. The same need not be true in a partial design constraint. For instance, some employees may belong to no work-team.

  • SQL Interview Questions

  • Question 16. Design A Relational Database For A University Registrar’s Office. The Office Maintains Data About Each Class, Including The Instructor, The Number Of Students Enrolled, And The Time And Place Of The Class Meetings. For Each Student-class Pair, A Grade Is Recorded?

    Answer :

    Underlined attributes indicate the primary key.

    student (student-id, name, program)
    course (courseno, title, syllabus, credits)
    course-offering (courseno, secno, year, semester, time, room)
    instructor (instructor-id, name, dept, title)
    enrols (student-id, courseno, secno, semester, year, grade)
    teaches (courseno, secno, semester, year, instructor-id)
    requires (maincourse, prerequisite)

  • JBOSS Tutorial

  • Question 17. List Two Reasons Why We May Choose To Define A View?

    Answer :

    1. Security conditions may require that the entire logical database be not visible to all users.
    2. We may wish to create a personalized collection of relations that is better matched to a certain user’s intuition than is the actual logical model.
  • MYSQL DBA Interview Questions

  • Question 18. List Two Major Problems With Processing Update Operations Expressed In Terms Of Views?

    Answer :

    Views present significant problems if updates are expressed with them. The difficulty is that a modification to the database expressed in terms of a view must be translated to a modification to the actual relations in the logical model of the database.

    1. Since the view may not have all the attributes of the underlying tables, insertion of a tuple into the view will insert tuples into the underlying tables, with those attributes not participating in the view getting null values. This may not be desirable, especially if the attribute in question is part of the primary key of the table.
    2. If a view is a join of several underlying tables and an insertion results in tuples with nulls in the join columns, the desired effect of the insertion will not be achieved. In other words, an update to a view may not be expressible at all as updates to base relations.
  • Teradata Interview Questions

  • Question 19. List Two Reasons Why Null Values Might Be Introduced Into The Database?

    Answer :

    Nulls may be introduced into the database because the actual value is either unknown or does not exist. For example, an employee whose address has changed and whose new address is not yet known should be retained with a null address. If employee tuples have a composite attribute dependents, and a particular employee has no dependents, then that tuple’s dependents attribute should be given a null value.

  • JSON (JavaScript Object Notation) Tutorial

  • Question 20. Show That, In Sql, <> All Is Identical To Not In?

    Answer :

    Let the set S denote the result of an SQL subquery. We compare (x <> all S) with (x not in S). If a particular value x1 satisfies (x1 <> all S) then for all elements y of S x1 = y. Thus x1 is not a member of S andmust satisfy (x1 not in S). Similarly, suppose there is a particular value x2 which satisfies (x2 not in S). It cannot be equal to any element w belonging to S, and hence (x2 <> all S)
    will be satisfied. Therefore the two expressions are equivalent.

  • Computer architecture Interview Questions

  • Question 21. Consider The Relational Database Of Figure 4.13. Using Sql, Define A View Consisting Of Manager-name And The Average Salary Of All Employees Who Work For That Manager. Explain Why The Database System Should Not Allow Updates To Be Expressed In Terms Of This View?

    Answer :

      create view salinfo as  select manager-name, avg(salary)  from manages m, works w  where m.employee-name = w.employee-name  group by manager-name

    Updates should not be allowed in this view because there is no way to determine how to change the underlying data. For example, suppose the request is “change the average salary of employees working for Smith to $200”. Should everybody who works for Smith have their salary changed to $200? Or should the first (or more, if necessary) employee found who works for Smith have their salary adjusted so that the average is $200?Neither approach really makes sense.

  • Question 22. Suppose There Are Two Relations R And S, Such That The Foreign Key B Of R References The Primary Key A Of S. Describe How The Trigger Mechanism Can Be Used To Implement The On Delete Cascade Option, When A Tuple Is Deleted From S?

    Answer :

    We define triggers for each relation whose primary-key is referred to by the foreign-key of some other relation. The trigger would be activated whenever a tuple is deleted from the referred-to relation. The action performed by the trigger would be to visit all the referring relations, and delete all the tuples in them whose foreign-key attribute value is the same as the primary-key attribute value of the deleted tuple in the referred-to relation. These set of triggers will take care of the on delete cascade operation.

  • Question 23. Write An Assertion For The Bank Database To Ensure That The Assets Value For The Perry Ridge Branch Is Equal To The Sum Of All The Amounts Lent By The Perry Ridge Branch?

    Answer :

    The assertion-name is arbitrary. We have chosen the name Perry. Note that since the assertion applies only to the Perry ridge branch we must restrict attention to only the Perry ridge tuple of the branch relation rather than writing a constraint on the entire relation.

      create assertion perry check  (not exists (select *  from branch  where branch-name = ’Perryridge’ and  assets = (select sum (amount)  from loan  where branch-name = ’Perryridge’)))

  • JBOSS Interview Questions

  • Question 24. Write An Sql Trigger To Carry Out The Following Action: On Delete Of An Account, For Each Owner Of The Account, Check If The Owner Has Any Remaining Accounts, And If She Does Not, Delete Her From The Depositor Relation?

    Answer :

    create trigger check-delete-trigger after delete on account referencing old row as row for each row delete from depositor

      where depositor.customer-name not in  ( select customer-name from depositor  where account-number <> orow.account-number )  end

  • Adv Java Interview Questions

  • Question 25. For Each Of The Views, Explain How Updates Would Be Performed (if They Should Be Allowed At All)?

    Answer :

    To insert (account-number, name) into the view deer-park we insert the tuple (Deer Park, account-number, null) into the account relation and the tuple (name, account-number) into the depositor relation.

    Updates to the views no-debt and avg-bal present serious problems. If we insert into the no-debt view, the system must reject the insertion if the customer has a loan. The overhead of updating through this view is so high that most systems would disallow update. The avg-bal view cannot be updated since the result of an aggregate operation depends on several tuples, not just one.

  • Question 26. We Described The Use Of Views To Simplify Access To The Database By Users Who Need To See Only Part Of The Database. The Use Of Views As A Security Mechanism. Do These Two Purposes For Views Ever Conflict?

    Answer :

    Usually, a well-designed view and security mechanism can avoid conflicts between ease of access and security. However, as the following example shows, the two purposes do conflict in case the mechanisms are not designed carefully.

    Suppose we have a database of employee data and a user whose view involves employee data for employees earning less than $10,000. If this user inserts employee Jones, whose salary is $9,000, but accidentally enters $90,000, several existing database systems will accept this update as a valid update through a view. However, the user will be denied access to delete this erroneous tuple by the security mechanism.

  • Machine learning Interview Questions

  • Question 27. What Is The Purpose Of Having Separate Categories For Index Authorization And Resource Authorization?

    Answer :

    Index and resource authorization should be special categories to allow certain users to create relations (and the indices to operate on them) while preventing these time-consuming and schema-changing operations from being available to many users. Separating index and resource authorization allows a user to build an index on existing relations, say, for optimization purposes, but allows us to deny that user the right to create new relations.

  • SQL Database Interview Questions

  • Question 28. Database Systems That Store Each Relation In A Separate Operating System File May Use The Operating System’s Security And Authorization Scheme, Instead Of Defining A Special Scheme Themselves. Discuss An Advantage And A Disadvantage Of Such An Approach?

    Answer :

    Database systems have special requirements which are typically more refined than most operating systems. For example, a single user may have different privileges on different files throughout the system, including changing indices and attributes which file systems typically don’t monitor. The advantage of using the operating system’s security mechanism is that it simplifies the database system and can be used for simple (read/write) security measures.

  • Question 29. What Are Two Advantages Of Encrypting Data Stored In The Database?

    Answer :

    1. Encrypted data allows authorized users to access data without worrying about other users or the system administrator gaining any information.
    2. Encryption of data may simplify or even strengthen other authorization mechanisms. For example, distribution of the cryptographic key amongst only trusted users is both, a simple way to control read access, and an added layer of security above that offered by views.
  • JSON (JavaScript Object Notation) Interview Questions

  • Question 30. Perhaps The Most Important Data Items In Any Database System Are The Passwords That Control Access To The Database. Suggest A Scheme For The Secure Storage Of Passwords.be Sure That Your Scheme Allows The System To Test Passwords Supplied By Users Who Are Attempting To Log Into The System?

    Answer :

    A scheme for storing passwords would be to encrypt each password, and then use a hash index on the user-id. The user-id can be used to easily access the encrypted password. The password being used in a login attempt is then encrypted and compared with the stored encryption of the correct password. An advantage of this scheme is that passwords are not stored in clear text and the code for decryption need not even exist!

  • Question 31. Explain What Is Meant By Repetition Of Information And Inability To Represent Information. Explain Why Each Of These Properties May Indicate A Bad Relational Database Design?

    Answer :

    • Repetition of information is a condition in a relational database where the values of one attribute are determined by the values of another attribute in the same relation, and both values are repeated throughout the relation. This is a bad relational database design because it increases the storage required for the relation and it makes updating the relation more difficult.
    • Inability to represent information is a condition where a relationship exists among only a proper subset of the attributes in a relation. This is bad relational database design because all the unrelated attributes must be filled with null values otherwise a tuple without the unrelated information cannot be inserted into the relation.
    • Loss of information is a condition of a relational database which results from the decomposition of one relation into two relations and which cannot be combined to recreate the original relation. It is a bad relational database design because certain queries cannot be answered using the reconstructed relation that could have been answered using the original relation.
  • Question 32. Why Are Certain Functional Dependencies Called Trivial Functional Dependencies?

    Answer :

    Certain functional dependencies are called trivial functional dependencies because they are satisfied by all relations.

  • Question 33. In Designing A Relational Database, Why Might We Choose A Non-bcnf Design?

    Answer :

    BCNF is not always dependency preserving. Therefore, we may want to choose another normal form (specifically, 3NF) in order to make checking dependencies easier during updates. This would avoid joins to check dependencies and increase system performance.

  • Computer Science Engineering Interview Questions

  • Question 34. Explain Why 4nf Is A Normal Form More Desirable Than Bcnf?

    Answer :

    4NF is more desirable than BCNF because it reduces the repetition of information. If we consider a BCNF schema not in 4NF (see Exercise 7.28), we observe that decomposition into 4NF does not lose information provided that a loss less join decomposition is used, yet redundancy is reduced.

  • Question 35. Explain How Dangling Tuples May Arise. Explain Problems That They May Cause?

    Answer :

    Dangling tuples can arise when one tuple is inserted into a decomposed relation but no corresponding tuple is inserted into the other relations in the decomposition. They can cause incorrect values to be returned by queries which form the join of a decomposed relation since the dangling tuple might not be included. dangling tuples can be avoided by the specification of referential integrity constraints.

  • Question 36. For Each Of The Following Application Areas, Explain Why A Relational Database System Would Be Inadequate. List All Specific System Components That Would Need To Be Modified?

    1. computer-aided Design
    2. multimedia Databases

    Answer :

    Each of the applications includes large, specialized data items (e.g., a program module, a graphic image, digitized voice, a document). These data items have operations specific to them (e.g., compile, rotate, play, format) that cannot be expressed in relational query languages. These data items are of variable length making it impractical to store them in the short fields that are allowed in records for such database systems. Thus, the data model, data manipulation language, and data definition language need to be changed. Also, long-duration and nested transactions are typical of these applications. Changes to the concurrency and recovery subsystems are likely to be needed.

  • SQL Interview Questions

  • Question 37. How Does The Concept Of An Object In The Object-oriented Model Differ From The Concept Of An Entity In The Entity-relationship Model?

    Answer :

    An entity is simply a collection of variables or data items. An object is an encapsulation of data as well as the methods (code) to operate on the data. The data members of an object are directly visible only to its methods. The outside world can gain access to the object’s data only by passing pre-defined messages to it, and these messages are implemented by the methods.

  • Question 38. Explain Why Ambiguity Potentially Exists With Multiple Inheritance?

    Answer :

    A class inherits the variables and methods of all its immediate super classes. Thus it could inherit a variable or method of the same name from more than one super-class. When that particular variable or method of an object of the sub-class is referenced, there is an ambiguity regarding which of the super classes provides the inheritance.

    For instance, let there be classes teacher and student, both having a variable department. If a class teaching Assistant inherits from both of these classes, any reference to the department variable of a teaching Assistant object is ambiguous.

  • Question 39. Explain How The Concept Of Object Identity In The Object-oriented Model Differs From The Concept Of Tuple Equality In The Relational Model?

    Answer :

    Tuple equality is determined by data values. Object identity is independent of data values, since object-oriented systems use built-in identity.

  • Question 40. Explain The Distinction In Meaning Between Edges In A Dag Representing Inheritance And A Dag Representing Object Containment?

    Answer :

    An edge from class A to class B in the DAG representing inheritance means that an object of class B is also an object of class A. It has all the properties that objects of class A have, plus additional ones of its own. In particular, it inherits all the variables and methods of class A. It can of course provide its own implementations for the inherited methods. And edge from class A to class B in the object containment DAG means that an object of class A contains an object of class B. There need not be any similarities in the properties of A and B. Neither B nor A inherit anything from the other. They function as independent types, to the extent that an object of class A can access the variables of the B object contained in it only via the B object’s methods.

  • MYSQL DBA Interview Questions

  • Question 41. Why Do Persistent Programming Languages Allow Transient Objects? Might It Be Simpler To Use Only Persistent Objects, With Unneeded Objects Deleted At The End Of An Execution?

    Answer :

    Creation, destruction and access will typically be more time consuming and expensive for persistent objects stored in the database, than for transient objects in the transaction’s local memory. This is because of the over-heads in preserving transaction semantics, security and integrity. Since a transient object is purely local to the transaction which created it and does not enter the database, all these over-heads are avoided. Thus, in order to provide efficient access to purely local and temporary data, transient objects are provided by persistent programming languages.

  • Question 42. Explain How A Persistent Pointer Is Implemented. Contrast This Implementation With That Of Pointers As They Exist In General-purpose Languages, Such As C Or Pascal?

    Answer :

    Persistent pointers can be implemented as Abstract Data Types (ADTs). These ADTs should provide the typical pointer operations like incrementing and dereferencing, so their usage and regular pointer usage is uniform. Regular pointers on the other hand are usually built-in types, implemented as part of the language.

  • Computer architecture Interview Questions

  • Question 43. If An Object Is Created Without Any References To It, How Can That Object Be Deleted?

    Answer :

    If an object is created without any references to it, it can neither be accessed nor deleted via a program. The only way is for the database system to locate and delete such objects by itself. This is called garbage collection. One way to do garbage collection is by the method of mark and sweep. First, the objects referred to directly by programs are marked. Then references from these objects to other objects are followed, and those referred objects are marked. This procedure is followed repeatedly until no more unmarked objects can be reached by f
    ollowing reference chains from the marked objects. At this point, all these remaining unmarked objects are deleted. This method is correct; we can prove that if no new objects are marked after a round of mark and sweep, the remaining unmarked objects are indeed unreferenced.

  • Question 44. Consider A System That Provides Persistent Objects. Is Such A System Necessarily A Database System?

    Answer :

    A database system must provide for such features as transactions, queries (associative retrieval of objects), security, and integrity. A persistent object system may not offer such features.

  • Question 45. Explain The Distinction Between A Type X And A Reference Type Ref(x). Under What Circumstances Would You Choose To Use A Reference Type?

    Answer :

    If the type of an attribute is x, then in each tuple of the table, corresponding to that attribute, there is an actual object of type x . If its type is ref(x), then in each tuple, corresponding to that attribute, there is a reference to some object of type x. We choose a reference type for an attribute, if that attribute’s intended purpose is to refer to an independent object.

  • Question 46. Compare The Use Of Embedded Sql With The Use In Sql Of Functions Defined In A General-purpose Programming Language. Under What Circumstances Would You Use Each Of These Features?

    Answer :

    SQL functions are primarily a mechanism for extending the power of SQL to handle attributes of complex data types (like images), or to perform complex and non-standard operations. Embedded SQL is useful when imperative actions like displaying results and interacting with the user are needed. These cannot be done conveniently in an SQL only environment. Embedded SQL can be used instead of SQL functions by retrieving data and then performing the function’s operations on the SQL result. However a drawback is that a lot of query-evaluation functionality may end up getting repeated in the host language code.

  • Question 47. List The Physical Storage Media Available On The Computers You Use Routinely. Give The Speed With Which Data Can Be Accessed On Each Medium?

    Answer :

    Your answer will be based on the computers and storage media that you use. Typical examples would be hard disk, floppy disks and CD-ROM drives.

  • Question 48. How Does The Remapping Of Bad Sectors By Disk Controllers Affect Data-retrieval Rates?

    Answer :

    Remapping of bad sectors by disk controllers does reduce data retrieval rates because of the loss of sequentiality amongst the sectors. But that is better than the loss of data in case of no remapping!

  • Question 49. Raid Systems Typically Allow You To Replace Failed Disks Without Stopping Access To The System. Thus, The Data In The Failed Disk Must Be Rebuilt And Written To The Replacement Disk While The System Is In Operation. With Which Of The Raid Levels Is The Amount Of Interference Between The Rebuild And Ongoing Disk Accesses Least?

    Answer :

    RAID level 1 (mirroring) is the one which facilitates rebuilding of a failed disk with minimum interference with the on-going disk accesses. This is because rebuilding in this case involves copying data from just the failed disk’s mirror. In the other RAID levels, rebuilding involves reading the entire contents of all the other disks.

  • Question 50. Give An Example Of A Database Application In Which The Reserved-space Method Of Representing Variable-length Records Is Preferable To The Pointer Method.

    Answer :

    In the reserved space method, a query comparing the last existing field in a record to some value requires only one read from the disk. This single read is preferable to the potentially many reads needed to chase down the pointers to the last field if the pointer method is used.

  • Question 51. Give An Example Of A Database Application In Which The Pointer Method Of Representing Variable-length Records Is Preferable To The Reserved-space Method?

    Answer :

    Using the pointer method, a join operation on attributes which are only in the anchor block can be performed on only this smaller amount of data, rather than on the entire relation, as would be the case using the reserved space method. Therefore, in this join example, the pointer method is preferable.

  • Question 52. Explain Why The Allocation Of Records To Blocks Affects Database-system Performance Significantly?

    Answer :

    If we allocate related records to blocks, we can often retrieve most, or all, of the requested records by a query with one disk access. Disk accesses tend to be the bottlenecks in databases; since this allocation strategy reduces the number of disk accesses for a given operation, it significantly improves performance.

  • Question 53. If Possible, Determine The Buffer-management Strategy Used By The Operating System Running On Your Local Computer System, And What Mechanisms It Provides To Control Replacement Of Pages. Discuss How The Control On Replacement That It Provides Would Be Useful For The Implementation Of Database Systems?

    Answer :

    The typical OS uses LRU for buffer replacement. This is often a bad strategy for databases. As explained in Section 11.5.2 of the text, MRU is the best strategy for nested loop join. In general no single strategy handles all scenarios well, and ideally the database system should be given its own buffer cache for which the replacement policy takes into account all the performance related issues.

  • Question 54. In The Sequential File Organization, Why Is An Overflow Block Used Even If There Is, At The Moment, Only One Overflow Record?

    Answer :

    An overflow block is used in sequential file organization because a block is the smallest space which can be read from a disk. Therefore, using any smaller region would not be useful from a performance standpoint. The space saved by allocating disk storage in record units would be overshadowed by the performance cost of allowing blocks to contain records of multiple files.

  • Question 55. Explain Why A Physical Oid Must Contain More Information Than A Pointer To A Physical Storage Location?

    Answer :

    A physical OID needs to have a unique identifier in addition to a pointer to a physical storage location. This is required to prevent dereferences of dangling pointers.

  • Question 56. If Physical Oids Are Used, An Object Can Be Relocated By Keeping A Forwarding Pointer To Its New Location. In Case An Object Gets Forwarded Multiple Times, What Would Be The Effect On Retrieval Speed? Suggest A Technique To Avoid Multiple Accesses In Such A Case?

    Answer :

    If an object gets forwarded multiple times, the retrieval speed will decrease because accessing it will require accessing the series of locations from which the object has been successively forwarded to the current location. Multiple accesses can be avoided by always keeping in the oldest location the latest address of the object. This can be done by checking while forwarding whether this object has already been forwarded and in that case updating the forwarding address at the oldest location. Thus, atmost two accesses will be required.

  • Question 57. Define The Term Dangling Pointer. Describe How The Unique-id Scheme Helps In Detecting Dangling Pointers In An Object-oriented Database?

    Answer :

    A dangling pointer is a pointer to an area which no longer contains valid data.

    In the unique-id scheme to detect dangling pointers, physical OIDsmay contain a unique identifier which is an integer that distinguishes the OID from the identifiers of other objects that happened to be stored at the same location earlier, and were deleted or moved elsewhere. The unique identifier is also stored with the object, and the identifiers in an OID and the corresponding object should match. If the unique identifier in a physical OID does not match the unique identifier in the object to which that OID points, the system detects that the pointer is a dangling pointer, and signals an error.

  • Question 58. When Is It Preferable To Use A Dense Index Rather Than A Sparse Index?

    Answer :

    It is preferable to use a dense index instead of a sparse index when the file is not sorted on the indexed field (such as when the index is a secondary index) or when the index file is small compared to the size of memory.

  • Question 59. Since Indices Speed Query Processing, Why Might They Not Be Kept On Several Search Keys?

    Answer :

    Reasons for not keeping several search indices include:

    1. Every index requires additional CPU time and disk I/O overhead during inserts and deletions.
    2. Indices on non-primary keys might have to be changed on updates, although an index on the primary key might not (this is because updates typically do not modify the primary key attributes).
    3. Each extra index requires additional storage space.
    4. For queries which involve conditions on several search keys, efficiency might not be bad even if only some of the keys have indices on them. Therefore database performance is improved less by adding indices when many indices already exist
  • Question 60. What Is The Difference Between A Primary Index And A Secondary Index?

    Answer :

    The primary index is on the field which specifies the sequential order of the file. There can be only one primary index while there can be many secondary indices.

  • Question 61. Is It Possible In General To Have Two Primary Indices On The Same Relation For Different Search Keys?

    Answer :

    In general, it is not possible to have two primary indices on the same relation for different keys because the tuples in a relation would have to be stored in different order to have same values stored together.We could accomplish this by storing the relation twice and duplicating all values, but for a centralized system, this is not efficient.

  • Question 62. Explain The Distinction Between Closed And Open Hashing. Discuss The Relative Merits Of Each Technique In Database Applications?

    Answer :

    Open hashing may place keys with the same hash function value in different buckets. Closed hashing always places such keys together in the same bucket. Thus in this case, different buckets can be of different sizes, though the implementation may be by linking together fixed size buckets using overflow chains. Deletion is difficult with open hashing as all the buckets may have to inspected before we can ascertain that a key value has been deleted, whereas in closed hashing only that bucket whose address is obtained by hashing the key value need be inspected. Deletions are more common in databases and hence closed hashing is more appropriate for them. For a small, static set of data lookups may be more efficient using open hashing. The symbol table of a compiler would be a good example.

  • Question 63. What Are The Causes Of Bucket Overflow In A Hash File Organization? What Can Be Done To Reduce The Occurrence Of Bucket Overflows?

    Answer :

    The causes of bucket overflow are :-

    1. Our estimate of the number of records that the relation will have was toolow, and hence the number of buckets allotted was not sufficient.
    2. Skew in the distribution of records to buckets. This may happen either becausethere are many records with the same search key value, or becausethe the hash function chosen did not have the desirable properties of uniformityand randomness.

    To reduce the occurrence of overflows, we can :-

    1. Choose the hash function more carefully, and make better estimates of therelation size.
    2. If the estimated size of the relation is nr and number of records per block isfr, allocate (nr/fr) ∗ (1 + d) buckets instead of (nr/fr) buckets. Here d is afudge factor, typically around 0.2. Some space iswasted: About 20 percentof the space in the buckets will be empty. But the benefit is that some of theskew is handled and the probability of overflow is reduced.
  • Question 64. Why Is A Hash Structure Not The Best Choice For A Search Key On Which Range Queries Are Likely?

    Answer :

    A range query cannot be answered efficiently using a hash index, we will have to read all the buckets. This is because key values in the range do not occupy consecutive locations in the buckets, they are distributed uniformly and randomly throughout all the buckets.

  • Question 65. Consider A Grid File In Which We Wish To Avoid Overflow Buckets For Performance Reasons. In Cases Where An Overflow Bucket Would Be Needed, We Instead Reorganize The Grid File. Present An Algorithm For Such A Reorganization?

    Answer :

    Let us consider a two-dimensional grid array. When a bucket overflows, we can split the ranges corresponding to that row and column into two, in both the linear scales. Thus the linear scales will get one additional entry each, and the bucket is split into four buckets. The ranges should be split in such a way as to ensure that the four resultant buckets have nearly the same number of values.

    There can be several other heuristics for deciding how to reorganize the ranges, and hence the linear scales and grid array.

  • Question 66. Show How To Compute Existence Bitmaps From Other Bitmaps. Make Sure That Your Technique Works Even In The Presence Of Null Values, By Using A Bitmap For The Value Null?

    Answer :

    The existence bitmap for a relation can be calculated by taking the union (logical-or) of all the bitmaps on that attribute, including the bitmap for value null.

  • Question 67. How Does Data Encryption Affect Index Schemes? In Particular, How Might It Affect Schemes That Attempt To Store Data In Sorted Order?

    Answer :

    Note that indices must operate on the encrypted data or someone could gain access to the index to
    interpret the data.Otherwise, the index would have to be restricted so that only certain users could access it. To keep the data in sorted order, the index scheme would have to decrypt the data at each level in a tree. Note that hash systems would not be affected.

  • Question 68. Why Is It Not Desirable To Force Users To Make An Explicit Choice Of A Query Processing Strategy? Are There Cases In Which It Is Desirable For Users To Be Aware Of The Costs Of Competing Query-processing Strategies?

    Answer :

    In general it is not desirable to force users to choose a query processing strategy because naive users might select an inefficient strategy. The reason users would make poor choices about processing queries is that they would not know how a relation is stored, nor about its indices. It is unreasonable to force users to be aware of these details since ease of use is a major object of database query languages. If users are aware of the costs of different strategies they could write queries efficiently, thus helping performance. This could happen if experts were using the system.

  • Question 69. What Are The Advantages And Disadvantages Of Hash Indices Relative To B+-tree Indices? How Might The Type Of Index Available Influence The Choice Of A Query Processing Strategy?

    Answer :

    Hash indices enable us to perform point look up (eg. σA=r(relation)) operations very fast, but for range searches the B+-tree index would be much more efficient. If there is a range query to be evaluated, and only a hash index is available, the better strategy might be to perform a file scan rather than using that index.

  • Question 70. Design A Variant Of The Hybrid Merge-join Algorithm For The Case Where Both Relations Are Not Physically Sorted, But Both Have A Sorted Secondary Index On The Join Attributes?

    Answer :

    We merge the leaf entries of the first sorted secondary index with the leaf entries of the second sorted secondary index. The result file contains pairs of addresses, the first address in each pair pointing to a tuple in the first relation, and the second address pointing to a tuple in the second relation. This result file is first sorted on the first relation’s addresses. The relation is then scanned in physical storage order, and addresses in the result file are replaced by the actual tuple values. Then the result file is sorted on the second relation’s addresses, allowing a scan of the second relation in physical storage order to complete the join.

  • Question 71. The Indexed Nested-loop Join Algorithm Can Be Inefficient If The Index Is A Secondary Index, And There Are Multiple Tuples With The Same Value For The Join Attributes. Why Is It Inefficient? Describe A Way, Using Sorting, To Reduce The Cost Of Retrieving Tuples Of The Inner Relation. Under What Conditions Would This Algorithm Be More Efficient Than Hybrid Merge-join?

    Answer :

    If there are multiple tuples in the inner relation with the same value for the join attributes, we may have to access that many blocks of the inner relation for each tuple of the outer relation. That is why it is inefficient. To reduce this cost we can perform a join of the outer relation tuples with just the secondary index leaf entries, postponing the inner relation tuple retrieval. The result file obtained is then sorted on the inner relation addresses, allowing an efficient physical order scan to complete the join.

    Hybrid merge–join requires the outer relation to be sorted. The above algorithm does not have this requirement, but for each tuple in the outer relation it needs to perform an index lookup on the inner relation. If the outer relation is much larger than the inner relation, this index lookup cost will be less than the sorting cost, thus this algorithm will be more efficient.

  • Question 72. List The Acid Properties. Explain The Usefulness Of Each?

    Answer :

    The ACID properties, and the need for each of them are:-

    • Consistency: Execution of a transaction in isolation (that is, with no other transaction executing concurrently) preserves the consistency of the database. This is typically the responsibility of the application programmer who codes the transactions.< /li>
    • Atomicity: Either all operations of the transaction are reflected properly in the database, or none are. Clearly lack of atomicity will lead to inconsistency in the database.
    • Isolation: When multiple transactions execute concurrently, it should be the case that, for every pair of transactions Ti and Tj , it appears to Ti that either Tj finished execution before Ti started, or Tj started execution after Ti finished. Thus, each transaction is unaware of other transactions executing concurrently with it. The user view of a transaction system requires the isolation property, and the property that concurrent schedules take the system from one consistent state to another. These requirements are satisfied by ensuring that only serializable schedules of individually consistency preserving transactions are allowed.
    • Durability: After a transaction completes successfully, the changes it has made to the database persist, even if there are system failures.
  • Question 73. Suppose That There Is A Database System That Never Fails. Is A Recovery Manager Required For This System?

    Answer :

    Even in this case the recovery manager is needed to perform roll-back of aborted transactions.

  • Question 74. Consider A File System Such As The One On Your Favorite Operating System.

    1. what Are The Steps Involved In Creation And Deletion Of Files, And In Writing Data To A File?
    2. explain How The Issues Of Atomicity And Durability Are Relevant To The Creation And Deletion Of Files, And To Writing Data To Files.

    Answer :

    There are several steps in the creation of a file. A storage area is assigned to the file in the file system, a unique i-number is given to the file and an i-node entry is inserted into the i-list. Deletion of file involves exactly opposite steps.

    For the file system user in UNIX, durability is important for obvious reasons, but atomicity is not relevant generally as the file system doesn’t support transactions. To the file system implementor though, many of the internal file system actions need to have transaction semantics. All the steps involved in creation/deletion of the file must be atomic, otherwise there will be unreferenceable files or unusable areas in the file system.

  • Question 75. Database-system Implementers Have Paid Much More Attention To The Acid Properties Than Have File-system Implementers. Why Might This Be The Case?

    Answer :

    Database systems usually perform crucial tasks whose effects need to be atomic and durable, and whose outcome affects the real world in a permanent manner. Examples of such tasks are monetary transactions, seat bookings etc. Hence the ACID properties have to be ensured. In contrast, most users of file systems would not be willing to pay the price (monetary, disk space, time) of supporting ACID properties.

  • Question 76. During Its Execution, A Transaction Passes Through Several States, Until It Finally Commits Or Aborts. List All Possible Sequences Of States Through Which A Transaction May Pass. Explain Why Each State Transition May Occur.

    Answer :

    The possible sequences of states are:-

    1. active→partially committed→committed. This is the normal sequence a successful transaction will follow. After executing all its statements it enters the partially committed state. After enough recovery information has been written to disk, the transaction finally enters the committed state.
    2. active → partially committed → aborted. After executing the last statement of the transaction, it enters the partially committed state. But before enough recovery information is written to disk, a hardware failure may occur destroying the memory contents. In this case the changes which it made to the database are undone, and the transaction enters the aborted state.
    3. active → failed → aborted. After the transaction starts, if it is discovered at some point that normal execution cannot continue (either due to internal program errors or external errors), it enters the failed state. It is then rolled back, after which it enters the aborted state.
  • Question
    77. Justify The Following Statement: Concurrent Execution Of Transactions Is More Important When Data Must Be Fetched From (slow) Disk Or When Transactions Are Long, And Is Less Important When Data Is In Memory And Transactions Are Very Short.

    Answer :

    If a transaction is very long or when it fetches data from a slow disk, it takes a long time to complete. In absence of concurrency, other transactions will have to wait for longer period of time. Average response time will increase. Also when the transaction is reading data from disk, CPU is idle. So resources are not properly utilized. Hence concurrent execution becomes important in this case. However, when the transactions are short or the data is available in memory, these problems do not occur.

  • Question 78. Explain The Distinction Between The Terms Serial Schedule And Serializable Schedule.

    Answer :

    A schedule in which all the instructions belonging to one single transaction appear together is called a serial schedule. A serializable schedule has a weaker restriction that it should be equivalent to some serial schedule. There are two definitions of schedule equivalence – conflict equivalence and view equivalence. Both of these are described in the chapter.

  • Question 79. Since Every Conflict-serializable Schedule Is View Serializable, Why Do We Emphasize Conflict Serializability Rather Than View Serializability?

    Answer :

    Most of the concurrency control protocols (protocols for ensuring that only serializable schedules are generated) used in practice are based on conflict serializability—they actually permit only a subset of conflict serializable schedules. The general form of view serializability is very expensive to test, and only a very restricted form of it is used for concurrency control.

  • Question 80. What Is A Recoverable Schedule? Why Is Recoverability Of Schedules Desirable? Are There Any Circumstances Under Which It Would Be Desirable To Allow Nonrecoverable Schedules?

    Answer :

    A recoverable schedule is one where, for each pair of transactions Ti and Tj such that Tj reads data items previously written by Ti, the commit operation of Ti appears before the commit operation of Tj. Recoverable schedules are desirable because failure of a transaction might otherwise bring the system into an irreversibly inconsistent state. Nonrecoverable schedules may sometimes be needed when updates must be made visible early due to time constraints, even if they have not yet been committed, which may be required for very long duration transactions.

  • Question 81. What Is A Cascadeless Schedule?why Is Cascadelessness Of Schedules Desirable? Are There Any Circumstances Under Which It Would Be Desirable To Allow Noncascadeless Schedules?

    Answer :

    A cascadeless schedule is one where, for each pair of transactions Ti and Tj such that Tj reads data items previously written by Ti, the commit operation of Ti appears before the read operation of Tj. Cascadeless schedules are desirable because the failure of a transaction does not lead to the aborting of any other transaction. Of course this comes at the cost of less concurrency. If failures occur rarely, so that we can pay the price of cascading aborts for the increased concurrency, noncascadeless schedules might be desirable.

  • Question 82. What Benefit Does Strict Two-phase Locking Provide? What Disadvantages Result?

    Answer :

    Because it produces only cascadeless schedules, recovery is very easy. But the set of schedules obtainable is a subset of those obtainable from plain two phase locking, thus concurrency is reduced.

  • Question 83. What Benefit Does Rigorous Two-phase Locking Provide? How Does It Compare With Other Forms Of Two-phase Locking?

    Answer :

    Rigorous two-phase locking has the advantages of strict 2PL. In addition it has the property that for two conflicting transactions, their commit order is their serializability order. In some systems users might expect this behavior.

  • Question 84. Most Implementations Of Database Systems Use Strict Two-phase Locking. Suggest Three Reasons For The Popularity Of This Protocol?

    Answer :

    It is relatively simple to implement, imposes low rollback overhead because of cascadeless schedules, and usually allows an acceptable level of concurrency.

  • Question 85. Consider A Database Organized In The Form Of A Rooted Tree. Suppose That We Insert A Dummy Vertex Between Each Pair Of Vertices. Show That, If We Follow The Tree Protocol On The New Tree, We Get Better Concurrency Than If We Follow The Tree Protocol On The Original Tree?

    Answer :

    The proof is in Buckley and Silberschatz, “Concurrency Control in Graph Protocols by Using Edge Locks,” Proc. ACM SIGACT-SIGMOD Symposium on the Principles of Database Systems, 1984.

  • Question 86. In Time Stamp Ordering,w-time Stamp(q) Denotes The Largest Time Stamp Of Any Transaction That Executed Write(q) Successfully. Suppose That, Instead, We Defined It To Be The Time Stamp Of The Most Recent Transaction To Execute Write(q) Successfully.would This Change In Wording Make Any Difference?

    Answer :

    It would make no difference. The write protocol is such that the most recent transaction to write an item is also the one with the largest time stamp to have done so.

  • Question 87. When A Transaction Is Rolled Back Under Time Stamp Ordering, It Is Assigned A New Time Stamp. Why Can It Not Simply Keep Its Old Time Stamp?

    Answer :

    A transaction is rolled back because a newer transaction has read or written the data which it was supposed to write. If the rolled back transaction is re-introduced with the same time stamp, the same reason for rollback is still valid, and the transaction will have be rolled back again. This will continue indefinitely.

  • Question 88. In Multiple-granularity Locking, What Is The Difference Between Implicit And Explicit Locking?

    Answer :

    When a transaction explicitly locks a node in shared or exclusive mode, it implicitly locks all the descendents of that node in the same mode. The transaction need not explicitly lock the descendent nodes. There is no difference in the functionalities of these locks, the only difference is in the way they are acquired, and their presence tested.

  • Question 89. Although Six Mode Is Useful In Multiple-granularity Locking, An Exclusive And Intend-shared (xis) Mode Is Of No Use. Why Is It Useless?

    Answer :

    An exclusive lock is incompatible with any other lock kind. Once a node is locked in exclusive mode, none of the descendents can be simultaneously accessed by any other transaction in any mode. Therefore an exclusive and intend-shared declaration has no meaning.

  • Question 90. Use Of Multiple-granularity Locking May Require More Or Fewer Locks Than An Equivalent System With A Single Lock Granularity. Provide Examples Of Both Situations, And Compare The Relative Amount Of Concurrency Allowed.

    Answer :

    If a transaction needs to access a large a set of items, multiple granularity locking requires fewer locks, whereas if only one item needs to be accessed, the single lock granularity system allows this with just one lock. Because all the desired data items are locked and unlocked together in the multiple granularity scheme, the locking overhead is low, but concurrency is also reduced.

  • Question 91. Consider The Validation-based Concurrency-control. Show That By Choosing Validation(ti), Rather Than Start(ti), As The Time Stamp Of Transaction Ti, We Can Expect Better Response Time Provided That Conflict Rates Among Transactions Are Indeed Low?

    Answer :

    In the concurrency control scheme choosing Start(Ti) as the time stamp of Ti gives a subset of the schedules allowed by choosing Validation(Ti) as the time stamp. Using Start(Ti) means that whoever started first must finish first. Clearly transactions could enter the validation phase in the same order in which they began executing, but this is overly restrictive. Since choosing Validation(Ti) causes fewer non conflicting transactions to restart, it gives the better response times.

  • Question 92. Under A Modified Version Of The Timestamp Protocol, We Require That A Commit Bit Be Tested To See Whether A Read Request Must Wait. Explain How The Commit Bit Can Prevent Cascading Abort. Why Is This Test Not Necessary For Write Requests?

    Answer :

    Using the commit bit, a read request is made to wait if the transaction which wrote the data item has not yet committed. Therefore, if the writing transaction fails before commit, we can abort that transaction alone. The waiting read will then access the earlier version in case of a multiversion system, or the restored value of the data item after abort in case of a single-version system. For writes, this commit bit checking is unnecessary. That is because either the write is a “blind” write and thus independent of the old value of the data item or there was a prior read, in which case the test was already applied.

  • Question 93. Under What Conditions Is It Less Expensive To Avoid Deadlock Than To Allow Deadlocks To Occur And Then To Detect Them?

    Answer :

    Deadlock avoidance is preferable if the consequences of abort are serious (as in interactive transactions), and if there is high contention and a resulting high probability of deadlock.

  • Question 94. If Deadlock Is Avoided By Deadlock Avoidance Schemes, Is Starvation Still Possible?

    Answer :A transaction may become the victim of deadlock-prevention rollback arbitrarily many times, thus creating a potential starvation situation.

  • Question 95. Explain The Phantom Phenomenon. Why May This Phenomenon Lead To An Incorrect Concurrent Execution Despite The Use Of The Two-phase Locking Protocol?

    Answer :

    The phantom phenomenon arises when, due to an insertion or deletion, two transactions logically conflict despite not locking any data items in common. The insertion case is described in the book. Deletion can also lead to this phenomenon. Suppose Ti deletes a tuple from a relation while Tj scans the relation. If Ti deletes the tuple and then Tj reads the relation, Ti should be serialized before Tj . Yet there is no tuple that both Ti and Tj conflict on.

    An interpretation of 2PL as just locking the accessed tuples in a relatio
    n is incorrect. There is also an index or a relation data that has information about the tuples in the relation. This information is read by any transaction that scans the relation, and modified by transactions that update, or insert into, or delete from the relation. Hence locking must also be performed on the index or relation data, and this will avoid the phantom phenomenon.

  • Question 96. Devise A Time Stamp-based Protocol That Avoids The Phantom Phenomenon?

    Answer :In the text, we considered two approaches to dealing with the phantom phenomenon by means of locking. The coarser granularity approach obviously works for time stamps as well. The B+-tree index based approach can be adapted to time stamping by treating index buckets as data items with time stamps associated with them, and requiring that all read accesses use an index. We now show that this simple method works. Suppose a transaction Ti wants to access all tuples with a particular range of search-key values, using a B+- tree index on that search-key. Ti will need to read all the buckets in that index which have key values in that range. It can be seen that any delete or insert of a tuple with a key-value in the same range will need to write one of the index buckets read by Ti. Thus the logical conflict is converted to a conflict on an index bucket, and the phantom phenomenon is avoided.

  • Question 97. Explain The Difference Between The Three Storage Types-volatile, Nonvolatile, And Stable-in Terms Of I/o Cost?

    Answer :

    Volatile storage is storage which fails when there is a power failure. Cache, main memory, and registers are examples of volatile storage. Nonvolatile storage is storage which retains its content despite power failures. An example is magnetic disk. Stable storage is storage which theoretically survives any kind of failure (short of a complete disaster!). This type of storage can only be approximated by replicating data.

    In terms of I/O cost, volatile memory is the fastest and non-volatile storage is typically several times slower. Stable storage is slower than non-volatile storage because of the cost of data replication.

  • Question 98. Compare The Shadow-paging Recovery Scheme With The Log-based Recovery Schemes In Terms Of Ease Of Implementation And Overhead Cost?

    Answer :

    The shadow-paging scheme is easy to implement for single-transaction systems, but difficult for multiple-transaction systems. In particular it is very hard to allow multiple updates concurrently on the same page. Shadow paging could suffer from extra space overhead, but garbage collection can take care of that. The I/O overhead for shadow paging is typically higher than the log based schemes, since the log based schemes need to write one record per update to the log, whereas the shadow paging scheme needs to write one block per updated block.

  • Question 99. Explain The Difference Between A System Crash And A “disaster.”?

    Answer :

    In a system crash, the CPU goes down, and disk may also crash. But stable-storage at the site is assumed to survive system crashes. In a “disaster”, everything at a site is destroyed. Stable storage needs to be distributed to survive disasters.

  • Question 100. Why Is It Relatively Easy To Port A Database From A Single Processor Machine To A Multiprocessor Machine If Individual Queries Need Not Be Parallelized?

    Answer :

    Porting is relatively easy to a shared memory multiprocessor machine. Databases designed for single-processor machines already provide multitasking, allowing multiple processes to run on the same processor in a time shared manner, giving a view to the user of multiple processes running in parallel. Thus, coarse-granularity parallel machines logically appear to be identical to single-processor machines, making the porting relatively easy. Porting a database to a shared disk or shared nothing multiprocessor architecture is a little harder.

  • Question 101. Instead Of Storing Shared Structures In Shared Memory, An Alternative Architecture Would Be To Store Them In The Local Memory Of A Special Process, And Access The Shared Data By Inter Process Communication With The Process. What Would Be The Drawback Of Such An Architecture?

    Answer :

    The drawbacks would be that two interprocess messages would be required to acquire locks, one for the request and one to confirm grant. Interprocess communication is much more expensive than memory access, so the cost of locking would increase. The process storing the shared structures could also become a bottleneck. The benefit of this alternative is that the lock table is protected better from erroneous updates since only one process can access it.

  • Question 102. What Is Lock De-escalation, And Under What Conditions Is It Required?why Is It Not Required If The Unit Of Data Shipping Is An Item?

    Answer :

    In a client-server system with page shipping, when a client requests an item, the server typically grants a lock not on the requested item, but on the page having the item, thus implicitly granting locks on all the items in the page. The other items in the page are said to be prefetched. If some other client subsequently requests one of the prefetched items, the server may ask the owner of the page lock to transfer back the lock on this item. If the page lock owner doesn’t need this item, it de-escalates the page lock that it holds, to item locks on all the items that it is actually accessing, and then returns the locks on the unwanted items. The server can then grant the latter lock request. If the unit of data shipping is an item, there are no coarser granularity locks; even if prefetching is used, it is typically implemented by granting individual locks on each of the prefetched items. Thus when the server asks for a return of a lock, there is no question of de-escalation, the requested lock is just returned if the client has no use for it.

  • Question 103. Suppose You Were In Charge Of The Database Operations Of A Company Whose Main Job Is To Process Transactions. Suppose The Company Is Growing Rapidly Each Year, And Has Outgrown Its Current Computer System. When You Are Choosing A New Parallel Computer, What Measure Is Most Relevant-speed Up, Batch Scale Up, Or Transaction Scale Up?why?

    Answer :

    With increasing scale of operations, we expect that the number of transactions submitted per unit time increases. On the other hand, we wouldn’t expect most of the individual transactions to grow longer, nor would we require that a given transaction should execute more quickly now than it did before. Hence transaction scale-up is the most relevant measure in this scenario.

  • Question 104. Suppose A Transaction Is Written In C With Embedded Sql, And About 80 Percent Of The Time Is Spent In The Sql Code, With The Remaining 20 Percent Spent In C Code. How Much Speed Up Can One Hope To Attain If Parallelism Is Used Only For The Sql Code?

    Answer :

    Since the part which cannot be parallelized takes 20% of the total running time, the best speed up we can hope for has to be less than 5.

  • Question 105. What Are The Factors That Can Work Against Linear Scale Up In A Transaction Processing System?which Of The Factors Are Likely To Be The Most Important In Each Of The Following Architectures: Shared Memory, Shared Disk, And Shared Nothing?

    Answer :

    Increasing contention for shared resources prevents linear scale-up with increasing parallelism. In a shared memory system, contention for memory (which implies bus contention) will result in falling scale-up with increasing parallelism. In a shared disk system, it is contention for disk and bus access which affects scale-up. In a shared-nothing system, inter-process communication overheads will be the main impeding factor. Since there is no shared memory, acquiring locks, and other activities requiring message passing between processes will take more time with increased parallelism.

  • Question 106. Consider A Bank That Has A Collection Of Sites, Each Running A Database System. Suppose The Only Way The Databases Interact Is By Electronic Transfer Of Money Between One Another. Would Such A System Qualify As A Distributed Database? Why?

    Answer :

    In a distributed system, all the sites typically run the same database management software, and they share a global schema. Each site provides an environment for execution of both global transactions initiated at remote sites and local transactions. The system described in the question does not have these properties, and hence it cannot qualify as a distributed database.

  • Question 107. Consider A Network Based On Dial-up Phone Lines, Where Sites Communicate Periodically, Such As Every Night. Such Networks Are Often Configured With A Server Site And Multiple Client Sites. The Client Sites Connect Only To The Server, And Exchange Data With Other Clients By Storing Data At The Server And Retrieving Data Stored At The Server By Other Clients. What Is The Advantage Of Such An Architecture Over One Where A Site Can Exchange Data With Another Site Only By First Dialing It Up?

    Answer :

    With the central server, each site does not have to remember which site to contact when a particular data item is to be requested. The central server alone needs to remember this, so data items can be moved around easily, depending on which sites access which items most frequently.Other house-keeping tasks are also centralized rather than distributed, making the system easier to develop and maintain. Of course there is the disadvantage of a total shutdown in case the server becomes unavailable. Even if it is running, it may become a bottleneck because every request has to be routed via it.

  • Question 108. Discuss The Relative Advantages Of Centralized And Distributed Databases?

    Answer :

    • A distributed database allows a user convenient and transparent access to data which is not stored at the site, while allowing each site control over its own local data. A distributed database can be made more reliable than a centralized system because if one site fails, the database can continue functioning, but if the centralized system fails, the database can no longer continue with its normal operation. Also, a distributed database allows parallel execution of queries and possibly splitting one query into many parts to increase throughput.
    • A centralized system is easier to design and implement. A centralized system is cheaper to operate because messages do not have to be sent.
  • Question 109. Explain How The Following Differ: Fragmentation Transparency, Replication Transparency, And Location Transparency?

    Answer :

    1. With fragmentation transparency, the user of the system is unaware of any fragmentation the system has implemented. A user may formulate queries against global relations and the system will perform the necessary transformation to generate correct output.
    2. With replication transparency, the user is unaware of any replicated data. The systemmust prevent inconsistent operations on the data. This requires more complex concurrency control algorithms.
    3. Location transparencymeans the user is unaware of where data are stored. The system must route data requests to the appropriate sites.
  • Question 110. How Might A Distributed Database Designed For A Local-area Network Differ From One Designed For A Wide-area Network?

    Answer :

    Data transfer on a local-area network (LAN) is much faster than on a wide-area network (WAN). Thus replication and fragmentation will not increase throughput and speed-up on a LAN, as much as in a WAN. But even in a LAN, replication has its uses in increasing reliability and availability.

  • Question 111. When Is It Useful To Have Replication Or Fragmentation Of Data?

    Answer :

    Replication is useful when there are many read-only transact
    ions at different sites wanting access to the same data. They can all execute quickly in parallel, accessing local data. But updates become difficult with replication. Fragmentation is useful if transactions on different sites tend to access different parts of the database.

  • Question 112. Explain The Notions Of Transparency And Autonomy. Why Are These Notions Desirable From A Human-factors Standpoint?

    Answer :

    Autonomy is the amount of control a single site has over the local database. It is important because users at that site want quick and correct access to local data items. This is especially true when one considers that local data will be most frequently accessed in a database. Transparency hides the distributed nature of the database. This is important because users should not be required to know about location, replication, fragmentation or other implementation aspects of the database.

  • Question 113. Explain The Difference Between Data Replication In A Distributed System And The Maintenance Of A Remote Backup Site?

    Answer :

    In remote backup systems all transactions are performed at the primary site and the data is replicated at the remote backup site. The remote backup site is kept synchronized with the updates at the primary site by sending all log records. Whenever the primary site fails, the remote backup site takes over processing.

    The distributed systems offer greater availability by having multiple copies of the data at different sites whereas the remote backup systems offer lesser availability at lower cost and execution overhead.

    In a distributed system, transaction code runs at all the sites whereas in a remote backup system it runs only at the primary site. The distributed system transactions follow two-phase commit to have the data in consistent state whereas a remote backup system does not followtwo-phase commit and avoids related overhead.

  • Question 114. Give An Example Where Lazy Replication Can Lead To An Inconsistent Database State Even When Updates Get An Exclusive Lock On The Primary (master) Copy?

    Answer :

    Consider the balance in an account, replicated at N sites. Let the current balance be $100 – consistent across all sites. Consider two transactions T1 and T2 each depositing $10 in the account. Thus the balance would be $120 after both these transactions are executed. Let the transactions execute in sequence: T1 first and then T2. Suppose the copy of the balance at one of the sites, say s, is not consistent – due to lazy replication strategy – with the primary copy after transaction T1 is executed and let transaction T2 read this copy of the balance. One can see that the balance at the primary site would be $110 at the end.

  • Question 115. Discuss The Advantages And Disadvantages Of The Two Methods That We Presented In Section 19.5.2 For Generating Globally Unique Time Stamps?

    Answer :

    The centralized approach has the problem of a possible bottleneck at the central site and the problem of electing a new central site if it goes down. The distributed approach has the problem that many messages must be exchanges to keep the system fair, or one site can get ahead of all other sites and dominate the database.