Recovery in Distributed Database


The recovery process in distributed databases is quite involved. In some cases it is quite difficult even to determine whether a site is down without exchanging numerous messages with other sites.

There are several possible explanations:

1) The message was not delivered to Y because of communication failure.

2) Site Y is down and could not respond.

3) Site Y is running and sent a response, but the response was not delivered.

Without additional information or the sending of additional messages, it is difficult to determine what actually happened.

Another problem with distributed recovery is distributed commit. When a transaction is updating data at several sites, it cannot commit until it is sure that the effect of the transaction on every site cannot be lost. This means that every site must first have recorded the local effects of the transactions permanently in the local site log on disk.