An in-depth look at SA Forum interface specs

来源:百度文库 编辑:神马文学网 时间:2024/04/30 09:09:43
‘); document.close(); }

 

An in-depth look at SA Forum interface specs

Dr. Asif Naseem, Service Availability Forum and GoAhead Software
01, 2006 (2:04 H)
URL: http://www.commsdesign.com/showArticle.jhtml?articleID=178600489

The telecommunication industry seems to be mimicking the transformation that the enterprise computing industry went through during the 90s--it is moving from a vertical industry model to more of a horizontal model. Maturation and adoption of a key set of standards are making it possible for Telecom Equipment Manufacturers (TEMs) to rely on an emerging eco-system of Commercial Off-The-Shelf (COTS) components to build network elements, and focus their precious resources on their core value-add.

Emerging standards
A few key standards address the hardware platform, the operating systems, and the middleware layer of network elements. Three industry consortia are particularly relevant in driving the COTS adoption in the telecom industry.

PCI Computer Manufacturing Group (PICMG) (www.picmg.org) a consortium of several companies including TEMs, develops and promotes carrier grade equipment standards. A recent set of specifications called AdvancedTCA (ATCA) from this consortium is fast gaining wide industry acceptance and adoption. ATCA, targeted primarily at developers of telecommunication applications, defines standards for creating new architecture that allows ease of integration and migration of telecom applications across platforms. Many TEMs have already announced plans to provide network elements based on standard ATCA platforms.

Open Software Development Laboratory (OSDL) (www.osdl.org) is an industry body dedicated to accelerating the adoption of the Linux operating system for enterprise computing and carrier applications. The use of Carrier Grade Linux (CGL) specified by OSDL is fast gaining traction among equipment vendors, TEMs, and service providers alike.

Service Availability Forum (SA Forum) (www.saforum.org), a vendor consortium, develops and promotes standard specifications that enable independent software vendors to develop easy to integrate interoperable middleware COTS components.

The Hardware Platform Interface (HPI), the first specification published by the SA Forum defines a standard interface between the service availability middleware and the hardware platform. The second interface definition known as the Application Interface Specification (AIS) establishes an interface between the high availability middleware and the application layer. The Systems Management Interfaces, provides an umbrella that ties together the management capabilities for HPI and AIS services.

In January 2006, the Forum announced the availability of new and enhanced interfaces representing the complete set of SA Forum specifications that includes updates to all these specifications1. These specifications are intended to facilitate portability of middleware and applications across multiple platforms, thus reducing the startup cost and the integration effort (See Figure 1).


Figure 1. Elements of SA Forum AIS

Proliferation of such standards enables designers to rapidly build application-ready platforms that utilize various COTS building blocks. This allows the equipment vendors to minimize cost and effort involved in building carrier-grade network elements, and focusing their precious resources on their core competence – communication applications. A high level overview of the SA Forum Application Interface Specification is presented here.

Application interface specification
The Service Availability Forum‘s Application Interface Specification specifies an Availability Management Framework and seven core services. When implemented together, these services provide a comprehensive set of functionality that allows system designers to build highly available applications that are interoperable and portable across a variety of compliant middleware and platforms (See Figure 2). The Availability Management Framework along with the seven services is described here.


Figure 2. Key standards enable COTS-based network elements

Availability management framework (AMF)
The AMF specifies a software entity that provides service availability by coordinating redundant resources within a cluster to deliver a system with no single point of failure. It provides a consistent view of one logical system that comprises a number of cluster nodes each of which host various resources in a distributed computing environment.

This framework provides a set of APIs to enable highly available applications. It drives the high availability state of various system components, and monitors their health by invoking callback functions of these components, as defined in this API. It also manages the readiness state without exposing it to components. It further allows a component to query the framework for information about a given component‘s high availability state, using functions defined in the set AMF APIs.

Cluster membership (CMS)
The cluster membership service is fundamental to defining and deploying a system of clustered nodes. It services a critical cluster node bookkeeping function and, as such, provides applications with up-to-date cluster membership information as the nodes enter or leave the system. Applications register callback functions with cluster membership service (CMS) to receive current cluster membership notifications as changes occur in the cluster configuration.

A cluster consists of a set of configured nodes, each with a unique node name. A member node is a configured node that the CMS recognizes as healthy and well connected to be used for deploying highly available applications and services. The CMS is the authority that determines whether a configured node is allowed to transition as a member node of the cluster. The set of member nodes at a given point in time comprises the cluster membership.

Checkpointing (CKPT)
In order to implement a system capable of seamless recovery from faults, it is important to record and retain dynamic state information that can be readily used by a redundant resource to resume the service provided by the failed resource. The checkpointing service (CKPT) provides such service in a highly available system. It provides a facility for processes to record checkpoint data incrementally. In the event of a failure such checkpoint data can be retrieved, and execution can be resumed using the state recorded before the failure. In AIS, checkpoints are cluster-wide entities that are designated by unique names.

A copy of the data stored in a checkpoint is called a checkpoint replica, which is typically stored in main memory rather than on disk for performance reasons. A given checkpoint may have several checkpoint replicas stored on different nodes in the cluster to protect it against node failures. To avoid accumulation of unused checkpoints in the system, checkpoint replicas have a retention time. When a checkpoint has not been opened by any process for the duration of the retention time, the CKPT automatically deletes the checkpoint.

Event (EVT)
There are numerous events generated at any given time within a cluster, and these events must be communicated to various components. The event service (EVT) provides a mechanism for applications to subscribe to receive events as and when they occur. The EVT also provides a mechanism where multiple publishers can communicate with multiple subscribers over event channels. An event channel is a concept whereby a publisher opens a channel by invoking the APIs specified by the EVT, and communicates asynchronously with one or more subscribers.

Events consist of a standard header and zero or more bytes of publisher event data. Multiple publishers and multiple subscribers can communicate over the same event channel. Individual publishers and individual subscribers can communicate over multiple event channels. Subscribers are anonymous, which means that they may join and leave an event channel at any time without involving the publisher(s).

Messaging (MSG)
A messaging service is designed to address the need for communication between the different components within a cluster--intra-node as well as inter-node. Such service provides an efficient mechanism for communicating a wide variety of information such as application state information (checkpoint data), event and error notifications, fault management information, etc. A messaging service provides an effective way for distributed components to efficiently communicate and coordinate their activities within a cluster. Instead of requiring each resource to manage its various communication complexities, the messaging service does it for them.

The AIS message service (MSG) specifies a buffered message passing system based on the concept of a message queue for processes on the same or on different nodes within a cluster. Messages are written to, and read from, message queues. A single message queue permits a multipoint-to-point communication, and multiple queues can be grouped together to form message queue groups that enable multipoint-to-multipoint communication. These queues can be either persistent or non-persistent.

Lock (LCK)
In a cluster of distributed processing components, it is important to provide mechanism to ensure data consistency and synchronization across the cluster. The lock (LCK) provides a distributed lock service, intended for use in a cluster, where processes running on different nodes might compete with each other for access to a shared resource. This service provides entities, called lock resources, used to synchronize access to shared resources between application processes. The LCK service supports two locking modes, one for exclusive access, and one for shared access. The LCK service specifies a set of mandatory functions (e.g., asynchronous calls, lock timeout, lock wait notifications, etc.) that must be present in all implementations, and a set of optional features (e.g., deadlock detection, lock orphaning, etc).

A lock service interface allows an application to query for support for one or more of the optional features. If an application depends on one of the optional features for proper operation, it should use this interface to check whether the feature is provided. Of course, if portability is important use of the optional features must be avoided. However, because they offer powerful functionality, it may make sense to take advantage of them when they are available.

Information model management service (IMMS)
In a cluster that implements services specified by the SA Forum AIS, it is important to provide a consistent mechanism to represent various components in an information system model, and provide a mechanism to manage them. The IMMS performs administrative operations on components in the system model as well as operations for interrogating the state of the various components represented in the system model.

Various objects of the IMMS information model represent such entities of a cluster as AMF components, checkpoints provided by the CKPT service, message queues provided by the message service, etc.. The information model of such a cluster the information model is specified in UML and managed by the IMMS. The objects in the information model have associated attributes and administrative operations--i.e., operations that can be performed on the represented objects through system management interfaces. For management applications the IMMS provides as set of APIs to create, access and manage these objects.

Logging (LOG)
Logging information is a high level cluster-significant, function-based (as opposed to implementation-particular) information suited primarily for network or system administrators, or automated tools to review current and historical logged information to trouble shoot issues such as misconfigurations, network disconnects and unavailable resources. An SA Forum compliant ecosystem assumes the AIS LOG service, or some functionally equivalent service is available for use by service availability applications as well as other AIS services. Some SA Forum AIS services, e.g., NTF, explicitly expect a logging service, such as the AIS LOG service or some functionally equivalent service, to be available for the NTF service to function properly.

Notification (NTF)
The NTF service provides a means by which applications running in the cluster can send notifications related to system events that can be received by interested entities that have subscribed to receive such notifications. One of the typical subscribers would be the LOG service, which maintains consolidated view of a cluster-wide system event log.

While the notification service is similar in some respects to the AIS EVT service - which provides the transport for the NTF service, there are two main differences between the two services:

  • The notification service is targeted at applications and/or users external to the system that use the notifications to check on the state of the system
  • The notifications sent through the NTF service have built-in semantics which specify the type of notifications that can be sent --e.g., object creation/deletion, object attribute change, alarm, security alarm, etc.--as well as the content and format of each type of notification

State of the industry
SA Forum interfaces are being implemented in products that are commercially available. Working with their customers and among themselves, the SA Forum member companies have learned valuable lessons that are extremely useful in implementing commercial products based on SA Forum specifications that meet TEM requirements. Some of these lessons include the following:

  • Providing portability and interoperability of middleware so it can be ported across a variety of platforms. Furthermore, the implementations must allow portability of applications across AIS compliant middleware. The former allows the TEMs a freedom to choose hardware from an eco-system of COTS suppliers that provide SA Forum HPI compliant hardware. The latter enables them to write applications that port across the underlying platforms--hardware and middleware--from a variety of different vendors.
  • TEMs spend significant effort and resources in integrating and testing other middleware components to create a platform that can be application ready for them to develop their core value-add. There is a need for the SA Forum compliant software to seamlessly integrate with other COTS middleware, e.g., different protocol stacks, in-memory databases, systems management, etc, to minimize the time and effort that otherwise would be required on part of the TEMs to carry out such an integration.

Two implementations that incorporate these lessons are briefly discussed here.

Carrier grade platform
The first example implementation that strives to achieve a carrier grade application ready platform is the GoAhead SelfReliant suite (See Figure 3). The functionality provided by includes the ability to model the clustered system at the node and at the network level. It implements SA Forum AIS functionality providing failure recovery, fast distributed messaging, standard interfaces for external systems management, and other functionality specified by the interface specification.


Figure 3. Carrier-Grade Application Ready Software Platform

To attain portability across various hardware platforms, this product implements a sophisticated hardware abstraction layer utilizing the SA Forum HPI specification, and an operating system abstraction layer. Together these two layers enable the implementation to run on a variety of hardware platforms such as PCI, cPCI, ATCA and Blade Center running Linux, VxWorks, Solaris, and Windows.

To address the interoperability with other third party COTS software components, there is ease of integration with widely used middleware components such as in-memory databases, protocol stacks, storage management systems, etc. The system provides platform management capabilities through its Platform Resource Management functionality, and enables users to implement functions such as hot swap management, alarm management, etc.

Contimuous services
The second example implementation addresses requirements for the SA Forum Application Interface Specifications with the Fujitsu package SAFE4 Continuous Services or SAFE4CS. The solution (See Figure 4) combines a set of components for the implementation and deployment of carrier grade services and applications in convergent networks. It leverages carrier grade clustering technology, standard management functions and adds a number of application programming interfaces for the implementer of non-stop applications and services including but not limited to tracing, statistics, external communication, alarms, audit and recovery and many more.


Figure 4. Fujitsu Siemens carrier-grade availability middleware

There are two major building blocks:

  • Clustering functions
  • Carrier grade feature set

Clustering functions are key in the high availability middleware concept. The SA Forum AIS functions for cluster membership and locking are based upon a proprietary cluster feature set. Additional optional features are available, which help the programmer and the administrator of a high availability installation to work effectively in a clustered environment.

The carrier grade features, used in high-end telecom installations, are based on requirements from network equipment manufacturers for their deployment of advanced services such as Intelligent Networks, soft-switches, locations servers or payment servers. SA Forum based interfaces for messaging, checkpoint, and events as well as the availability management framework.

References
[1] SA Forum Augments and Improves Availability Interface Specifications, Press Release, January 18, 2006. [2] Documentation available at: www.saforum.org/specification

About the Author
Dr. Asif Naseem is currently COO and CTO for GoAhead Software, Inc. in Seattle, Washington. He has more than 18 years of experience in the computer and communications industry. He has served as the General Manager and Director, ICSD, EMEA of Motorola, Inc. where he established and ran a new mobile applications business. As Director of Engineering at AT&T/NCR, he was responsible for developing the LifeKeeper family of products, which was subsequently spun out to SteelEye Technology Inc. Most recently he was Vice President, Business Operations at Iospan Wireless, a broadband wireless company that was acquired by Intel and L3. Dr. Naseem started his career with AT&T Bell Laboratories where he held a variety of technical and management positions.

He is a veteran speaker having presented at national and international events such as ITU Telecom Geneva, GSM World Congress, CTIA and numerous other events. He has also presented papers at conferences organized by ACM, IEEE, and others, and has had articles published in several technical journals and magazines. He has an M.S. in Electrical Engineering and a Ph.D. in Computer Engineering from Michigan State University. He can be reached at: anaseem@goahead.com

 

Copyright © 2003 CMP Media, LLC | Privacy Statement