Main Page

From Kid
Jump to: navigation, search

This website collects different projects related to Heterogeneous Data Integration and Ontology Matching-Mapping, Information System Evolution and Pervasive Systems developed at Politecnico di Milano by:

  • Letizia Tanca (Full Professor) [1]
  • Carlo A. Curino (Ph.D Candidate)[2]
  • Giorgio Orsi (Ph.D Student) [3]

collaborating with researchers from UCLA and UCSD (see specific projects for details).

We would like to thank the numerous students that contributed to these works.

Schema and Data Evolution

Prism: Schema Evolution Tool

Supporting graceful schema evolution represents an unsolved problem for traditional information systems that is further exacerbated in web information systems, such as Wikipedia and public scientific databases: in these projects based on multiparty cooperation the frequency of database schema changes has increased while tolerance for downtimes has nearly disappeared. As of today, schema evolution remains an error-prone and time-consuming undertaking, because the DB Administrator (DBA) lacks the methods and tools needed to manage and automate this endeavor by (i) predicting and evaluating the effects of the proposed schema changes, (ii) rewriting queries and applications to operate on the new schema, and (iii) migrating the database.

Our PRISM system takes a big first step toward addressing this pressing need by providing: (i) a language of Schema Modification Operators (SMO) to express concisely complex schema changes, (ii) tools that allow the DBA to evaluate the effects of such changes, (iii) optimized translation of old queries to work on the new schema version, (iv) automatic data migration, and (v) full documentation of intervened changes as needed to support data provenance, database flash back, and historical queries. PRISM solves these problems by integrating recent theoretical advances on mapping composition and invertibility, into a design that also achieves usability and scalability. Wikipedia and its 170+ schema versions provided an invaluable schema evolution testbed for validating PRISM tools and their ability to support legacy queries.

Prima: Transaction Time DB under schema evolution

The old problem of managing the history of database information is now made more urgent and complex by fast-spreading web information systems. Indeed, systems such as Wikipedia are faced with the challenge of managing the history of their databases in the face of intense database schema evolution. Our PRIMA system addresses this difficult problem by introducing two key pieces of new technology. The first is a method for publishing the history of a relational database in XML, whereby the evolution of the schema and its underlying database are given a unified representation. This temporally grouped representation makes it easy to formulate sophisticated historical queries on any given schema version using standard XQuery. The second key piece of technology provided by PRIMA is that schema evolution is transparent to the user: she writes queries against the current schema while retrieving the data from one or more schema versions. The system then performs the labor-intensive and error-prone task of rewriting such queries into equivalent ones for the appropriate versions of the schema. This feature is particularly relevant for historical queries spanning over potentially hundreds of different schema versions. The latter one is realized by (i) introducing Schema Modification Operators (SMO)s to represent the mappings between successive schema versions and (ii) an XML integrity constraint language (XIC) to efficiently rewrite the queries using the constraints established by the SMOs. The scalability of the approach has been tested against both synthetic data and real-world data from the Wikipedia DB schema evolution history.

Wikipedia Schema Evolution Benchmark

Evolving the database that is at the core of an Information System represents a difficult maintenance problem that has only been studied in the framework of traditional information systems. However, the problem is likely to be even more severe in web information systems, where open-source software is often developed through the contributions and collaboration of many groups and individuals. Therefore, in this paper, we present an in- depth analysis of the evolution history of the Wikipedia database and its schema; Wikipedia is the best-known example of a large family of web information systems built using the open-source software MediaWiki. Our study is based on: (i) a set of Schema Modification Operators that provide a simple conceptual representation for complex schema changes, and (ii) simple software tools to automate the analysis. This framework allowed us to dissect and analyze the 4.5 years of Wikipedia history, which was short in time, but intense in terms of growth and evolution. Beyond confirming the initial hunch about the severity of the problem, our analysis suggests the need for developing better methods and tools to support graceful schema evolution. Therefore, we briefly discuss documentation and automation support systems for database evolution, and suggest that the Wikipedia case study can provide the kernel of a benchmark for testing and improving such systems.

History Metadata Manager

Modern information systems, and web information systems in particular, are faced with frequent database schema changes, which generate the necessity to manage such evolution and preserve their history. In this paper, we describe the Pantha Rei Framework designed to provide powerful tools that: (i) facilitate schema evolution and guide the Database Administrator in planning and evaluating changes, (ii) support automatic rewriting of legacy queries against the current schema version, (iii) enable efficient archiving of the histories of data and metadata, and (iv) support complex temporal queries over such histories. We then introduce the Historical Metadata Manager (HMM), a tool designed to facilitate the process of documenting and querying the schema evolution itself. We use the schema history of the Wikipedia database as a telling example of the many uses and benefits of HMM

Data Integration and Schema Evolution, a unified framework

The life of a modern Information System is often characterized by (i) a push toward integration with other systems, and (ii) the evolution of its data management core in response to continuously changing application requirements. Most of the current proposals dealing with these issues from a database perspective rely on the formal notions of mapping and query rewriting. This paper presents the research agenda of ADAM (Advanced Data And Metadata Manager); by harvesting the recent theoretical advances in this area into a unified framework, ADAM seeks to deliver practical solutions to the problems of automatic schema mapping and assisted schema evolution. The evolution of an Information System (IS) reflects the changes occurring in the application reality that the IS is modelling: thus, ADAM exploits ontologies to capture such changes and provide traceability and automated documentation for such evolution. Initial results and immediate benefits of this approach are presented.

Pervasive Systems

MSA - Mobile Student Assistant

Mobile Location Library There are several types of positioning methods:

Using the mobile phone network: The current cell ID can be used to identify the Base Transceiver Station (BTS) that the device is communicating with and the location of that BTS. Clearly, the accuracy of this method depends on the size of the cell. A cell may be anywhere from 2 to 20 kilometers in diameter.

Using satellites: The Global Positioning System (GPS), controlled by the US Department of Defense. GPS determines the device's position by calculating differences in the times signals from different satellites take to reach the receiver. GPS signals are encoded, so the mobile device must be equipped with a GPS receiver. GPS is potentially the most accurate method (between 4 and 40 meters if the GPS receiver has a clear view of the sky), but it has some drawbacks: The extra hardware can be costly, consumes battery while in use, and requires some warm-up after a cold start to get an initial fix on visible satellites

Hybrid methods: For example, A-GPS — a GPS method which also uses network-based information to speed up location determination.

The goal of my project is Automatic Locaton detection. In details, I have to develop a location-aware service that can detect the position of the mobile device (using the location API for J2ME, jsr 179), which updates the Dimension Tree(DT) according to the obtained position.