ABSTRACT

This chapter focuses on transport layer techniques that exploit host multihoming to provide end-to-end fault tolerance and improved application performance. Generally, mission critical systems rely on redundancy at multiple levels to provide uninter-

rupted service during resource failures. Following the redundancy approach, mission critical systems can multihome their hosts to improve availability. A host is multihomed if it can be addressed by multiple IP addresses (Braden 1989b). Redundancy at the network layer allows a host to be accessible even if one of its IP addresses becomes unreachable for an extended period of time (assuming the paths to the multiple interfaces do not share the same failed link). Transport layers that support multihoming allow traffic of existing connections to be redirected to a peer’s alternate IP address without the need for applications (or users) to abort and re-establish connections. Considering the prevalence of path outages on the Internet today, multihoming support at the transport layer can improve resilience of established connections, and thus improve application performance. While fault tolerance can be addressed at other layers, we argue that the transport layer is in the best position to detect failure and make end-to-end failover decisions. After all, the transport layer is the lowest layer responsible for both end-to-end quality of service and having knowledge about end-to-end path characteristics.