Research Contributions


My research contributions are divided into several areas that include my work at AT&T Laboratories in data integration, workflow management, electronic document archiving, high-performance messaging infrastructures, and event notification services, my research accomplishments when I was pursuing a Ph.D. degree in Computer Science, and my work in the area of high performance storage systems.


DataGW: Object-based Heterogeneous Data Integration

Integration of data stored with multiple heterogeneous sources has received a lot of attention in recent years. A common characteristic of existing integration approaches is the adoption of either a relational or a semi-structured data model. DataGW presents a novel approach to data integration by exporting an attribute-based data model for entities stored with relational databases, spreadsheets, property files, and LDAP directories.

DataGW exports a flat information model. This model consists of several entity classes. Each entity class specifies the required and optional attributes that can be present in an instance of the class. The values of these attributes may be stored with different storage managers, e.g., LDAP servers, database servers, file systems, and so on. The storage location of these values is specified in a meta-entry. Each entity instance can be identified by using one or more of its attribute values.

There exists one meta-entry for each entity class. This meta-entry contians the required and optional class attributes, as well as attributes that can be used to improve performance, specify access rights, and so on. Meta-entries are stored in an LDAP directory. Note that the entity classes supported by the DataGW do not necessarily correspond to the object classes supported by the LDAP model. This is a crucial distinction since LDAP entities may belong to multiple object classes and, thus, consist of attributes specified in these classes, including duplicate attribute names.


WISE: Workflow Infrastructure, Scheduling, and Enactment

Business processes consist of multiple activities that have to be performed in some sequence. Traditionally, the description of the involved activities and the order in which these activities are performed were done by using processing instructions and rules which were expressed in a textual format, if documented at all. For some business processes these rules were often embedded in the logic of the software program or the human agent that was responsible for carrying out an activity. Today, organizations change their infrastructure to improve the efficiency of the business as a whole, and they reengineer the business processes to adopt to the changes that take place in their environment. Consequently, planning and managing all the activities and rules involved in a business process becomes more challenging. These challenges can be answered by expressing the activities and rules involved in a business process in a concrete and systematic way so that they can be used by a software system and become fully automated.

Workflow management is closely related to the automation and reengineering of a business process in an organization. A workflow may describe business process activities at a conceptual level necessary for understanding, evaluating, and redesigning the business process. Workflow management systems (WFMSs) have been introduced to support the design, execution, and monitoring of business processes. However, existing WFMSs do not provide adequate support for handling deadlines, priorities, and transactional and non-transactional operations. In addition, they lack the ability to manage large volumes of activities, and they do not provide sustained availability to cope with system and logical failures. The main focus of my work is to improve the flexibility and robustness of WFMSs by providing enhanced execution and monitoring support.

The focus of the Workflow Infrastructure, Scheduling, and Enactment (WISE) project is to improve the flexibility and robustness of WFMSs by providing enhanced execution and monitoring support. In particular, WISE aims at an architecture that provides:

  1. Adequate support for handling application and system failures, especially in the case of non-transactional applications and systems;
  2. Improved scalability and load balancing system characteristics to be able to address telecommunications applications, which process hundreds of thousands of workflow instances per day;
  3. Improved run time support by using deadlines and priorities for scheduling, and using information about the load and availability of the agents that execute the workflow activities to boost the performance of the system and, at the same time, reduce operational costs due to escalations.

SaveMe: A System for Archiving Electronic Documents Using Messaging Groupware

Today, organizations have to archive an ever-increasing number of documents that are both related to their core business and required to ensure institutional accountability. In addition, organizations have substantial investments in messaging technologies (email and groupware). SaveMe is a document archival system based on network-centric groupware such as Internet standards-based messaging systems. In SaveMe, the actions of archiving, retrieving, and classifying documents are  similar to the actions of sending, retrieving, and classifying email into folders. SaveMe leverages existing messaging infrastructures -- the one common denominator sitting on every computer is email -- and, thus, it does not require individual users and IT personnel to learn a new technology. The resulting environment is not intrusive, easier to administer, and a lot easier to deploy.


HERMOD: A Distributed Infrastructure for Electronic Messaging

Hermod is a high performance, extensible architecture for messaging, based on the store and access paradigm. Hermod consists of multiple heterogeneous data stores: for depositing and retrieving message contents; for managing user folders; for representing and managing the semi-structured nature of messages; and for maintaining information about users and user groups. Both performance and functionality are maximized by using the most appropriate state-of-the-art technology, e.g., databases, file systems, LDAP directories, and text search engines, for each type of data store, and by managing carefully the interactions between the data stores. Hermod uses a simple and uniform interface for the various data stores, resulting in an ``open'' internal architecture, which allows new data stores to be plugged into the architecture cleanly. Hermod has an active component, wherein internal state information is exported using an event mechanism, which enables a plethora of value-added messaging services to be added in a modular fashion.


READY: A High-Performance Event Notification Service

The proliferation of inexpensive workstations and networks has created a new era in distributed computing. In particular, most modern computer applications are distributed in nature, and they require support by distributed computing platforms. The main goal of existing distributed platforms is to provide an infrastructure that supports the rapid development of value-added services. This means it must be possible to build a new service based on events that occur at existing services without requiring major modifications to existing services. In addition, distributed system users, administrators, and developers require tools and services that enable them to monitor the behavior of the system as a whole.

An event notification service is a key enabling technology for meeting all of the above goals. A notification service accepts event descriptions from suppliers and delivers corresponding event notifications to consumers. As part of initiating contact with the service, a supplier specifies the kinds of events it will supply, while a consumer specifies the kinds of events it is interested in. In combination, these specifications allow the notification service to form an efficient event distribution plan. When an event is handed to the notification service by a supplier, the service must deliver a notification for the event to only interested consumers avoiding wastage of important shared resources, such as network bandwidth.

READY is an event notification service that provides efficient, asynchronous, decoupled communication of event notifications. In READY, communication is asynchronous because the act of supplying an event completes as soon as READY receives the event; decoupled because a supplier need not know which processes will be consumers of its events, and a consumer need not know which processes will act as suppliers of interesting events for which it will receive notifications.


Distributed Transaction Management

The proliferation of inexpensive workstations and networks has created a new era in distributed computing. At the same time, non-traditional applications such as computer-aided design (CAD), computer-aided software engineering (CASE), geographic-information systems (GIS), and office-information systems (OIS) have placed increased demands for high-performance transaction processing on database systems. The combination of these factors gives rise to significant challenges in the design of modern database systems. I have developed novel techniques whose aim is to improve the performance and scalability of these new database systems. These techniques exploit client resources through client-based transaction management.

Client-based transaction management is realized by providing logging facilities locally even when data is shared in a global environment. My work consists of several recovery algorithms which utilize client disks for storing recovery related information (i.e., log records). My algorithms work with both coarse and fine-granularity locking and they do not require the merging of client logs at any time. Moreover, my algorithms support fine-granularity locking with multiple clients permitted to update different portions of the same database page at the same time. The database state is recovered correctly when there is a complex crash as well as when the updates performed by different clients on a page are not present on the disk version of the page, even though some of the updating transactions have committed.

In addition, I have implemented my algorithms in BeSS, and I have studied their performance characteristics using the OO1 database benchmark. The performance results show that client-based logging is superior to traditional server-based logging. This is because client-based logging is an effective way to reduce dependencies on server CPU and disk resources and, thus, prevents the server from becoming a performance bottleneck as quickly when the number of clients accessing the database increases.


High Performance Storage Managers

My research on storage managers for database systems consists of two systems, EOS and BeSS. Both storage systems are used by numerous research organizations and universities around the world. In addition, BeSS is being productized by NCR as part of a content-based multimedia server for massively parallel architectures.