Proceedings of the 2004 International Conference on the World Wide Web

Fullname:Proceedings of the 13th International Conference on World Wide Web: Alternate Track Papers & Posters
Editors:Stuart Feldman; Mike Uretsky; Marc Najork; Craig Wills
Location:New York
Dates:2004-May-17 to 2004-May-20
Standard No:ISBN:; ACM DL: Table of Contents hcibib: WWW04-2
Links:Conference Home Page
  1. WWW 2004-05-17 Volume 2
    1. Sharing educational resources
    2. Web of communities
    3. Quality of service
    4. Industrial practice I
    5. Adaptive e-learning systems
    6. Business processes and conversations
    7. Student tracking and personalization
    8. Industrial practice 2
    9. Semantics and discovery
    10. Posters

WWW 2004-05-17 Volume 2

Sharing educational resources

Semantic resource management for the web: an e-learning application BIBAKFull-Text 1-10
  Julien Tane; Christoph Schmitz; Gerd Stumme
Topics in education are changing with an ever faster pace. ELearning resources tend to be more and more decentralized. Users increasingly need to be able to use the resources of the web. For this, they should have tools for finding and organizing information in a decentralized way. In this paper, we show how an ontology-based tool suite allows to make the most of the resources available on the web.
Keywords: e-learning, knowledge management, semantic web
EducaNext: a framework for sharing live educational resources with Isabel BIBAKFull-Text 11-18
  Juan Quemada; Gabriel Huecas; Tomÿs de-Miguel; Joaquín Salvachùa; Blanca Fernandez; Bernd Simon; Katherine Maillet; Efiie Lai-Cong
EducaNext is an educational mediator created within the UNIVERSAL IST Project which supports both, the exchange of reusable educational materials based on open standards, as well as the collaboration of educators over the network in the realization of educational activities. The Isabel CSCW application is a group collaboration tool for the Internet supporting audience interconnection over the network, such as distributed classrooms, conferences or meetings. This paper describes the conclusions and feedback obtained from the integration of Isabel into EducaNext, it's use for the realization of collaborative educational activities involving distributed classrooms, lectures or workshops, as well as the general conclusions obtained about the integration of synchronous collaboration applications into educational mediators.
Keywords: IEEE standard, Isabel application, LOM, educaNext, educational activity, educational mediators, learning resource, live collaboration over the internet, videoconferencing
The interoperability of learning object repositories and services: standards, implementations and lessons learned BIBAKFull-Text 19-27
  Marek Hatala; Griff Richards; Timmy Eap; Jordan Willms
Interoperability is one of the main issues in creating a networked system of repositories. The eduSource project in its holistic approach to building a network of learning object repositories in Canada is implementing an open network for learning services. Its openness is supported by a communication protocol called the eduSource Communications Layer (ECL) which closely implements the IMS Digital Repository Interoperability (DRI) specification and architecture. The ECL in conjunction with connection middleware enables any service providers to join the network. EduSource is open to external initiatives as it explicitly supports an extensible bridging mechanism between eduSource and other major initiatives. This paper discusses interoperability in general and then focuses on the design of ECL as an implementation of IMS DRI with supporting infrastructure and middleware. The eduSource implementation is in the mature state of its development as being deployed in different settings with different partners. Two applications used in evaluating our approach are described: a gateway for connecting between eduSource and the NSDL initiative, and a federated search connecting eduSource, EdNA and SMETE.
Keywords: interoperability, learning object repositories

Web of communities

An outsider's view on "topic-oriented blogging" BIBAKFull-Text 28-34
  J. Bar-Ilan
The number of Web blogs is growing extremely fast, thus this phenomenon cannot be ignored. This paper discusses the issue through monitoring a set of blogs for a two months period in September-October 2003 and characterizing these blogs based on descriptive statistics and content analysis.
Keywords: bloggers, blogs
The role of standards in creating community BIBAKFull-Text 35-40
  Kathi C. Martin
Participation in the web of communities requires a common language, a common technological structure and development of content that is relevant and captivating. This paper reports on a project that both conserves a rich regional cultural heritage and has structured the content developed during this conservation to be fluidly shared with both the domain and the broader communities. It also examines the varied degrees of acceptance within these communities.
Keywords: Dublin core, XML, historic costume collection, museums online archive California, ontology, open archive initiative, semantic web, thesaurus
Network arts: exposing cultural reality BIBAKFull-Text 41-47
  David A. Shamma; Sara Owsley; Kristian J. Hammond; Shannon Bradshaw; Jay Budzik
In this article, we explore a new role for the computer in art as a reflector of popular culture. Moving away from the static audio-visual installations of other artistic endeavors and from the traditional role of the machine as a computational tool, we fuse art and the Internet to expose cultural connections people draw implicitly but rarely consider directly. We describe several art installations that use the World Wide Web as a reflection of cultural reality to highlight and explore the relations between ideas that compose the fabric of our every day lives.
Keywords: culture, information retrieval, media arts, network arts, software agents, world wide web

Quality of service

A quality model for multichannel adaptive information BIBAKFull-Text 48-54
  Carlo Marchetti; Barbara Pernici; Pierluigi Plebani
The ongoing diffusion of novel and mobile devices offers new ways to provide services across a growing set of network technologies. As a consequence, traditional information systems evolve to multichannel systems in which services are provided through different channels, being a channel the abstraction of a device and a network. This work proposes a quality model suitable for capturing and reasoning about quality aspects of multichannel information systems. In particular, the model enables a clear separation of modeling aspects of services, networks, and devices. Further, it embeds rules enabling the evaluation of end-to-end quality, which can be used to select services according to the actual quality perceived by users.
Keywords: model, quality of service
Towards context-aware adaptable web services BIBAKFull-Text 55-65
  Markus Keidl; Alfons Kemper
In this paper, we present a context framework that facilitates the development and deployment of context-aware adaptable Web services. Web services are provided with context information about clients that may be utilized to provide a personalized behavior. Context is extensible with new types of information at any time without any changes to the underlying infrastructure. Context processing is done by Web services, context plugins, or context services. Context plugins and context services pre- and post-process Web service messages based on the available context information. Both are essential for automatic context processing and automatic adaption of Web services to new context types without the necessity to adjust the Web services themselves. We implemented the context framework within the ServiceGlobe system, our open and distributed Web service platform.
Keywords: automatic context processing, context, extensibility, extensible framework, information services, service platform, web services
QoS computation and policing in dynamic web service selection BIBAKFull-Text 66-73
  Yutu Liu; Anne H. Ngu; Liang Z. Zeng
The emerging Service-Oriented Computing (SOC) paradigm promises to enable businesses and organizations to collaborate in an unprecedented way by means of standard web services. To support rapid and dynamic composition of services in this paradigm, web services that meet requesters' functional requirements must be able to be located and bounded dynamically from a large and constantly changing number of service providers based on their Quality of Service (QoS). In order to enable quality-driven web service selection, we need an open, fair, dynamic and secure framework to evaluate the QoS of a vast number of web services. The fair computation and enforcing of QoS of web services should have minimal overhead but yet able to achieve sufficient trust by both service requesters and providers. In this paper, we presented our open, fair and dynamic QoS computation model for web services selection through implementation of and experimentation with a QoS registry in a hypothetical phone service provisioning market place application.
Keywords: QoS, extensible QoS model, ranking of QoS, web services

Industrial practice I

Jena: implementing the semantic web recommendations BIBAKFull-Text 74-83
  Jeremy J. Carroll; Ian Dickinson; Chris Dollin; Dave Reynolds; Andy Seaborne; Kevin Wilkinson
The new Semantic Web recommendations for RDF, RDFS and OWL have, at their heart, the RDF graph. Jena2, a second-generation RDF toolkit, is similarly centered on the RDF graph. RDFS and OWL reasoning are seen as graph-to-graph transforms, producing graphs of virtual triples. Rich APIs are provided. The Model API includes support for other aspects of the RDF recommendations, such as containers and reification. The Ontology API includes support for RDFS and OWL, including advanced OWL Full support. Jena includes the de facto reference RDF/XML parser, and provides RDF/XML output using the full range of the rich RDF/XML grammar. N3 I/O is supported. RDF graphs can be stored in-memory or in databases. Jena's query language, RDQL, and the Web API are both offered for the next round of standardization.
Keywords: Jena, OWL, RDF, RDQL, semantic web
Internet delivery of meteorological and oceanographic data in wide area naval usage environments BIBAKFull-Text 84-88
  Udaykiran Katikaneni; Roy Ladner; Frederick Petry
Access and retrieval of meteorological and oceanographic data from heterogeneous sources in a distributed system presents many issues. Effective bandwidth utilization is important for any distributed system. In addition, specific issues need to be addressed in order to assimilate spatio-temporal data from multiple sources. These issues include resolution of differences in datum, map-projection and time coordinate. Reduction in the complexity of data formats is a significant factor for fostering interoperability. Simplification of training is important to promote usage of the distributed system. We describe techniques that revolutionize Web-based delivery of meteorological and oceanographic data to address needs of the Naval/Marine user.
Keywords: meteorological and oceanographic data, resumable object streams
Can web-based recommendation systems afford deep models: a context-based approach for efficient model-based reasoning BIBAKFull-Text 89-93
  Leiguang Gong
Web-based product and service recommendation systems have become ever popular on-line business practice with increasing emphasis on modeling customer needs and providing them with targeted or personalized service solutions in real-time interaction. Almost all the commercial web service systems adopt some kind of simple customer segmentation models and shallow pattern matching or rule-based techniques for high performance. The models built based on these techniques though very efficient have a fundamental limitation in their ability to capture and explain the reasoning in the process of determining and selecting appropriate services or products. However, using deep models (e.g. semantic networks), though desirable for their expressive power, may require significantly more computational resources (e.g. time) for reasoning. This can compromise the system performance. This paper reports on a new approach that represents and uses contextual information in semantic net-based models to constrain and prune potentially very large search space, which results in more efficient reasoning and much improved performance in terms of speed and selectivity as evidenced by the evaluation results.
Keywords: context, model, reasoning, recommendation systems, semantic network

Adaptive e-learning systems

Model based engineering of learning situations for adaptive web based educational systems BIBAKFull-Text 94-103
  Thierry Nodenot; Christophe Marquesuzaá; Pierre Laforcade; Christian Sallaberry
In this paper, we propose an approach for the engineering of web based educational applications. The applications that we focus require advanced functionality for regulating and tutoring learners' activities (dynamics of learning). Our approach aims at proposing models, not only to describe details of such learning situations, but also to characterize the constraints that the Learning Management System exploiting such situations must satisfy in this sense, this approach also contributes to the specification of the Adaptive Web Based Educational System (AWBES) fitted to a particular learning situation. Moreover, this approach for the engineering of learning situations conforms to current software engineering research works.
Keywords: UML language, architectures and designs for web-based learning delivery environments, models and metamodels, specification of educational applications
KnowledgeTree: a distributed architecture for adaptive e-learning BIBAKFull-Text 104-113
  Peter Brusilovsky
This paper presents KnowledgeTree, an architecture for adaptive E-Learning based on distributed reusable intelligent learning activities. The goal of KnowledgeTree is to bridge the gap between the currently popular approach to Web-based education, which is centered on learning management systems vs. the powerful but underused technologies in intelligent tutoring and adaptive hypermedia. This integrative architecture attempts to address both the component-based assembly of adaptive systems and teacher-level reusability.
Keywords: adaptive content service, adaptive web, content re-use, e-learning, learning object metadata, learning portal, student model server
Authoring of learning styles in adaptive hypermedia: problems and solutions BIBAKFull-Text 114-123
  Natalia Victorovna Stash; Alexandra Ioana Cristea; Paul M. De Bra
Learning styles, as well as the best ways of responding with corresponding instructional strategies, have been intensively studied in the classical educational (classroom) setting. There is much less research of application of learning styles in the new educational space, created by the Web. Moreover, authoring applications are scarce, and they do not provide explicit choices and creation of instructional strategies for specific learning styles. The main objective of the research described in this paper is to provide the authors with a tool which will allow them to incorporate different learning styles in their adaptive educational hypermedia applications. In this way, we are creating a semantically significant interface between classical learning styles and instructional strategies and the modern field of adaptive educational hypermedia.
Keywords: adaptive hypermedia, authoring of adaptive hypermedia, learning styles, user modeling

Business processes and conversations

A framework for the server-side management of conversations with web services BIBAKFull-Text 124-133
  Liliana Ardissono; Davide Cardinio; Giovanna Petrone; Marino Segnan
The emerging standards for the publication of Web Services are focused on the specification of the static interfaces of the operations to be invoked, or on the service composition. Few efforts have been made to specify the interaction between a Web Service and the individual consumer, although this aspect is essential to the successful service execution.
   In fact, while "one-shot" services may be invoked in a straight forward way, the invocation of services requiring complex interactions, where multiple messages are needed to complete the service, depends on the fact that the consumer respects the business logic of the Web Service.
   In this paper, we propose a framework for the server-side management of the interaction between a Web Service and its consumers. In our approach, the Web Service is in charge of assisting the consumer during the service invocation, by managing the interaction context and instructing the consumer about the operations that can be invoked and their actual parameters, at each step of the conversation. Our framework is based on the exchange of SOAP messages specifying the invocation of Java-based operations. Moreover, in order to support the interoperability with other software environments, the conversation flow specification is exported to a WSDL format that enables heterogeneous consumers to invoke the Web Service in a seamless way.
Keywords: service oriented architectures, tools and technologies for web services development
Decentralized orchestration of composite web services BIBAKFull-Text 134-143
  Girish B. Chafle; Sunil Chandra; Vijay Mann; Mangala Gowri Nanda
Web services make information and software available programmatically via the Internet and may be used as building blocks for applications. A composite web service is one that is built using multiple component web services and is typically specified using a language such as BPEL4WS or WSIPL. Once its specification has been developed, the composite service may be orchestrated either in a centralized or in a decentralized fashion. Decentralized orchestration offers performance improvements in terms of increased throughput and scalability and lower response time. However, decentralized orchestration also brings additional complexity to the system in terms of error recovery and fault handling. Further, incorrect design of a decentralized system can lead to potential deadlock or non-optimal usage of system resources. This paper investigates build time and runtime issues related to decentralized orchestration of composite web services. We support our design decisions with performance results obtained on a decentralized setup using BPEL4WS to describe the composite web services and BPWS4J as the underlying runtime environment to orchestrate them.
Keywords: BPEL4WS, code partitioning, composite web services, decentralized orchestration
CTR-S: a logic for specifying contracts in semantic web services BIBAKFull-Text 144-153
  Hasan Davulcu; Michael Kifer; I. V. Ramakrishnan
A requirements analysis in the emerging field of Semantic Web Services (SWS) (see http://daml.org/services/swsl/requirements/) has identified four major areas of research: intelligent service discovery, automated contracting of services, process modeling, and service enactment. This paper deals with the intersection of two of these areas: process modeling as it pertains to automated contracting. Specifically, we propose a logic, called CTR-S, which captures the dynamic aspects of contracting for services.
   Since CTR-S is an extension of the classical first-order logic, it is well-suited to model the static aspects of contracting as well. A distinctive feature of contracting is that it involves two or more parties in a potentially adversarial situation. CTR-S is designed to model this adversarial situation through its novel model theory, which incorporates certain game-theoretic concepts. In addition to the model theory, we develop a proof theory for CTR-S and demonstrate the use of the logic for modeling and reasoning about Web service contracts.
Keywords: contracts, services composition, web services

Student tracking and personalization

Visualising student tracking data to support instructors in web-based distance education BIBAKFull-Text 154-161
  Riccardo Mazza; Vania Dimitrova
This paper presents a novel approach of using web log data generated by course management systems (CMS) to help instructors become aware of what is happening in distance learning classes. Specifically, techniques from Information Visualization are used to graphically render complex, multidimensional student tracking data collected by CMS. A system, called CourseVis, illustrates the proposed approach. Graphical representations from the use of CourseVis to visualise data from a Java on-line distance course ran with WebCT are presented. Findings from the evaluation of CourseVis are presented, and it is argued that CourseVis can help teachers become aware of some social, behavioural, and cognitive aspects related to distance learners. Using graphical representations of student tracking data, instructors can identify tendencies in their classes, or quickly discover individuals that need special attention.
Keywords: Web-based distance education, information visualization, student tracking
Dynamic assembly of learning objects BIBAKFull-Text 162-169
  Robert G. Farrell; Soyini D. Liburd; John C. Thomas
This paper describes one solution to the problem of how to select sequence, and link Web resources into a coherent, focused organization for instruction that addresses a user's immediate and focused learning need. A system is described that automatically generates individualized learning paths from a repository of XML Web resources. Each Web resource has an XML Learning Object Metadata (LOM) description consisting of General, Educational, and Classification metadata. Dynamic assembly of these learning objects is based on the relative match of the learning object content and metadata to the learner's needs, preferences, context, and constraints. Learning objects are connected into coherent paths based on their LOM topic classifications and the proximity of these topics in a Resource Description Framework (RDF) graph. An instructional sequencing policy specifies how to arrange the objects on the path into a particular learning sequence. The system has been deployed and evaluated within a corporate setting.
Keywords: LOM, RDF, assembly, content management, data retrieval, information retrieval, instruction, learning object, linking, metadata, organization, semantic web
Personalization in distributed e-learning environments BIBAKFull-Text 170-179
  Peter Dolog; Nicola Henze; Wolfgang Nejdl; Michael Sintek
Personalized support for learners becomes even more important, when e-Learning takes place in open and dynamic learning and information networks. This paper shows how to realize personalized learning support in distributed learning environments based on Semantic Web technologies. Our approach fills the existing gap between current adaptive educational systems with well-established personalization functionality, and open, dynamic learning repository networks. We propose a service-based architecture for establishing personalized e-Learning, where personalization functionality is provided by various web-services. A Personal Learning Assistant integrates personalization services and other supporting services, and provides the personalized access to learning resources in an e-Learning network.
Keywords: P2P, adaptation, learning repositories, ontologies, personalization, standards, web services

Industrial practice 2

EdgeComputing: extending enterprise applications to the edge of the internet BIBAKFull-Text 180-187
  A. Davis; J. Parikh; W. E. Weihl
Content delivery networks have evolved beyond traditional distributed caching. With services such as Akamai's EdgeComputing it is now possible to deploy and run enterprise business Web applications on a globally distributed computing platform, to provide subsecond response time to end users anywhere in the world. Additionally, this distributed application platform provides high levels of fault-tolerance and scalability on-demand to meet virtually any need. Application resources can be provisioned dynamically in seconds to respond automatically to changes in load on a given application.
   In some cases, an application can be deployed completely on the global platform without any central enterprise infrastructure. Other applications can require centralizing core business logic and transactional databases at the enterprise data center while the presentation layer and some business logic and database functionality move onto the edge platform.
   Implementing a distributed application service on the Internet's edge requires overcoming numerous challenges, including sandboxing for security, distributed load-balancing and resource management, accounting and billing, deployment, testing, debugging, and monitoring. Our current implementation of Akamai EdgeComputing supports application programming platforms such as Java 2 Enterprise Edition (J2EE) and Microsoft's .NET Framework, in large part because they make it easier to address some of these challenges. In the near future we will also support environments for other application languages such as C, PHP, and Perl.
Keywords: Internet applications, N-tier applications, Web services, distributed applications, edge computing, grid computing, split-tier applications, utility computing, web applications
B2B integration over the Internet with XML: RosettaNet successes and challenges BIBAKFull-Text 188-195
  Suresh Damodaran
The practical experience of RosettaNet in using Web technologies for B2B integration illustrates the transformative power of Web technologies and also highlights challenges for the future. This paper provides an overview of RosettaNet technical standards and discusses the lessons learned from the standardization efforts, in particular, what works and what doesn't. This paper also describes the effort to increase automation of B2B software integration, and thereby to reduce cost.
Keywords: B2B integration, PIP, XML, business process, messaging services

Semantics and discovery

Through different eyes: assessing multiple conceptual views for querying web services BIBAKFull-Text 196-205
  Wolf-Tilo Balke; Matthias Wagner
We present enhancements for UDDI / DAML-S registries allowing cooperative discovery and selection of Web services with a focus on personalization. To find the most useful service in each instance of a request, not only explicit parameters of the request have to be matched against the service offers. Also user preferences or implicit assumptions of a user with respect to common knowledge in a certain domain have to be considered to improve the quality of service provisioning. In the area of Web services the notion of service ontologies together with cooperative answering techniques can take a lot of this responsibility. However, without quality assessments for the relaxation of service requests and queries a personalized service discovery and selection is virtually impossible. This paper focuses on assessing the semantic meaning of query relaxation plans over multiple conceptual views of the service ontology, each one representing a soft query constraint of the user request. Our focus is on the question what constitutes a minimum amount of necessary relaxation to answer each individual request in a cooperative manner. Incorporating such assessments as early as possible we propose to integrate ontology-based discovery directly into UDDI directories or query facilities in service provisioning portals. Using the quality assessments presented here, this integration promises to propel today's Web services towards an intuitive user-centered service provisioning.
Keywords: cooperative service discovery, personalization, preference-based service provisioning, semantic web, user profiling, web services
Cooperative middleware specialization for service oriented architectures BIBAKFull-Text 206-215
  Nirmal K. Mukhi; Ravi Konuru; Francisco Curbera
Service-oriented architectures (SOA) will provide the basis of the next generation of distributed software systems, and have already gained enormous traction in the industry through an XML-based instantiation, Web services. A central aspect of SOAs is the looser coupling between applications (services) that is achieved when services publish their functional and non-functional behavioral characteristics in a standardized, machine readable format. In this paper we argue that in the basic SOA model access to metadata is too static and results in inflexible interactions between requesters and providers. We propose specific extensions to the SOA model to allow service providers and requestors to dynamically expose and negotiate their public behavior, resulting in the ability to specialize and optimize the middleware supporting an interaction. We introduce a middleware architecture supporting this extended SOA functionality, and describe a conformant implementation based on standard Web services middleware. Finally, we demonstrate the advantages of this approach with a detailed real world scenario.
Keywords: metadata exchange, middleware reconfiguration, service-oriented architecture, web services


Web engineering with the visual software circuit board BIBAKFull-Text 216-217
  Hovhannes Avoyan; Barry Levine
The Visual Software Circuit Board (VSCB) platform supports a component based development methodology towards the development of software systems. The circuit board design techniques and methodologies have evolved for electronic device and component engineering for decades. The circuit board approach, now applied for software systems and applications, makes the component based development process easy to visualize and comprehend. This paper describes the VSCB based design methodology with a specific focus on usage of VSCB for web application engineering.
Keywords: circuit based software development, component based development, rapid application development, visual programming, web application development, web engineering
An efficient and systematic method to generate XSLT stylesheets for different wireless pervasive devices BIBAKFull-Text 218-219
  Thomas Kwok; Thao Nguyen; Linh Lam; Kakan Roy
It is a tedious and cumbersome process to update directly a WML document for the wireless Web because its content composes of both data and presentation. Thus, XML is used to handle the data while its XSLT stylesheet is used to extract and format the data for presentation. However, different stylesheets have to be used for different devices. An efficient and systematic method based on the idea of generating two separate sets of rules corresponding to content extracting and formatting parts of the stylesheet is described in this paper. The data extraction part is constructed from content rules while the formatting part is constructed from presentation rules. They are then combined together to form a stylesheet by an XSLT generator. A large number of stylesheets corresponding to different devices and a number of standard DTD documents or XML schemas can be generated in this way and stored in the pool during application setup stage. They will be individually selected from the pool by an XSLT engine to produce different WML documents for different devices during run time.
Keywords: PDA, WML, XML, XSLT, pervasive devices
An application server for the semantic web BIBAKFull-Text 220-221
  Daniel Oberle; Steffen Staab; Raphael Volz
The Semantic Web relies on the complex interaction of several technologies involving ontologies. Therefore, sophisticated Semantic Web applications typically comprise more than one software module. Instead of coming up with proprietary solutions, developers should be able to rely on a generic infrastructure for application development in this context. We call such an infrastructure Application Server for the Semantic Web whose design and development are based on existing Application Servers. However, we apply and augment their underlying concepts for use in the Semantic Web and integrate semantic technology within the server itself. We provide a short overview of requirements and design issues of such a server and present our implementation and ongoing work KAON SERVER.
Keywords: application server, ontology, semantic web
ProThes: thesaurus-based meta-search engine for a specific application domain BIBAKFull-Text 222-223
  Pavel Braslavski; Gleb Alshanski; Anton Shishkin
In this poster we introduce ProThes, a pilot meta-search engine (MSE) for a specific application domain. ProThes combines three approaches: meta-search, graphical user interface (GUI) for query specification, and thesaurus-based query techniques. ProThes attempts to employ domain-specific knowledge, which is represented by both a conceptual thesaurus and results ranking heuristics. Since the knowledge representation is separated from the MSE core, adjusting the system to a specific domain is trouble free. Thesaurus allows for manual query building and automatic query techniques. This poster outlines the overall system architecture, thesaurus representation format, and query operations. ProThes is implemented on J2EE platform as a Web service.
Keywords: information retrieval, meta-search, query operations, thesaurus, user interface, web services
PipeCF: a scalable DHT-based collaborative filtering recommendation system BIBAKFull-Text 224-225
  Bo Xie; Peng Han; Ruimin Shen
Collaborative Filtering (CF) technique has proved to be one of the most successful techniques in recommendation systems in recent years. However, traditional centralized CF system has suffered from its shortage in scalability as their calculation complexity increases quickly both in time and space when the record in user database increases. In this paper, we propose a decentralized CF algorithm, called PipeCF, based on distributed hash table (DHT) method. We also propose two novel approaches to improve the scalability and prediction accuracy of DHT-based CF algorithm. The experimental data show that our DHT-based CF system has better prediction accuracy, efficiency and scalability than traditional CF systems.
Keywords: collaborative filtering, distributed hash table
Event synchronization for interactive cyberdrama generation on the web: a distributed approach BIBAKFull-Text 226-227
  Stefano Ferretti; Marco Roccetti
The digital generation of a story in which users have influence over the narrative is emerging as an exciting example of computer-based interactive entertainment. Interactive storytelling has existed in non digital versions for thousand of years, but with the advent of the Web the demand for enabling distributed cyberdrama generation is becoming increasingly common. To govern the complexity stemming from the distributed generation of complex plots, we have devised an event synchronization service that may be exploited to support the distribution of interactive storytelling activities over the Web. The main novelty of our approach is that the semantics of the cyberdrama is exploited to discard obsolete events. This brings to the positive result of speeding up the activity of drama generation, thus enabling an augmented interactivity among dispersed players.
Keywords: computer-based entertainment, cyberdrama generation, interactive storytelling, web-based multiplayer games
Using context- and content-based trust policies on the semantic web BIBAKFull-Text 228-229
  Christian Bizer; Radoslaw Oldakowski
The current discussion about a future Semantic Web trust architecture is focused on reputational trust mechanisms based on explicit trust ratings. What is often overlooked is the fact that, besides of ratings, huge parts of the application-specific data published on the Semantic Web are also trust relevant and therefore can be used for flexible, fine-grained trust evaluations. In this poster we propose the usage of context- and content-based trust mechanisms and outline a trust architecture which allows the formulation of subjective and task-specific trust policies as a combination of reputation-, context- and content-based trust mechanisms.
Keywords: named graphs, semantic web, trust mechanisms, trust policies
EIOP: an e-commerce interoperability platform BIBAKFull-Text 230-231
  Yusuf Tambag
Interoperability has become one of the big problems of e-commerce since it was born. A number of B2B standards like ebXML, UDDI, RosettaNet, xCBL, etc. emerged recently to solve the interoperability problem.
   Currently, there exists many B2B standards each provide competing and complementary solutions to B2B interoperability. So, there is a need for serving implementation of these standards from a single, central store to ease the use and management of the implementations. This paper presents EIOP, an E-commerce Interoperability Platform. EIOP is designed to provide a central store for implementations of e-commerce specifications to be able to use and configure these implementations from a single, central point. It defines the term EIOP Component which corresponds to plug&play e-commerce applications that are stored in the EIOP.
Keywords: UDDI, e-commerce, ebIOP, ebXML, interoperability
Reactive rules inference from dynamic dependency models BIBAKFull-Text 232-233
  Asaf Adi; Opher Etzion; Dagan Gilat; Royi Ronen; Guy Sharon; Inna Skarbovsky
Defining dependency models is sometimes an easier, more intuitive way for ontology representation than defining reactive rules directly, as it provides a higher level of abstraction. We will shortly introduce the ADI (Active Dependency Integration) model capabilities, emphasizing new developments: 1. Support of automatic dependencies instantiation from an abstract definition that expresses a general dependency in the ontology, namely a "template". 2. Inference of rules for dynamic dependency models where dependencies and entities may be inserted deleted and updated. We use the eTrade example in order to exemplify those capabilities.
Keywords: active databases, active systems, dependency models, event correlation, reactive rules, relationships between entities, rule engine
SPT-based topology algorithm for constructing power efficient wireless ad hoc networks BIBAKFull-Text 234-235
  Szu-Chi Wang; David S. L. Wei; Sy-Yen Kuo
In this paper, we present a localized Shortest Path Tree (SPT) based algorithm for constructing a sub-network with the minimum-energy property for a given wireless ad hoc network. Each mobile node determines its own transmission power based only on its local information. The proposed algorithm constructs local shortest path trees from the unit disk graph. The performance improvements of our algorithm are demonstrated through simulations.
Keywords: power consumption, topology control, wireless ad hoc networks
Business objective based resource management BIBAKFull-Text 236-237
  Sarel Aiber; Dagan Gilat; Ariel Landau; Natalia Razinkov; Aviad Sela; Segev Wasserkrug
Enterprises today wish to manage their IT resources so as to optimize business objectives, such as income, rather than IT metrics, such as response times. Therefore, we introduce a new paradigm, which focuses on such business objective oriented resource management. Additionally, we define a general simulation-based autonomous process enabling such optimizations, and describe a case study, demonstrating the usefulness of such a process.
Keywords: IT policy, business objective, economic considerations, modeling techniques, optimization, simulation
Enhancing the SCORM metadata model BIBAKFull-Text 238-239
  D. Simões; R. Luís; N. Horta
Nowadays, the leading e-learning platforms are converging towards standardization. This paper presents an extension to the SCORM, today's most well acclaimed e-learning standard, enabling the modelling of course related entities that surround learning objects and content aggregations, therefore increasing the standard's modelling scope and allowing for gains in efficiency in knowledge dissemination. A prototype is being implemented and tested on VIANET, an original e-learning platform with extensible support for the SCORM. content aggregations.
Keywords: SCORM, e-learning, metadata, modelling, standards
Continuous web: a new image-based hypermedia and scape-oriented browsing BIBAKFull-Text 240-241
  Hiroya Tanaka; Katsumi Tanaka
Conventionally, Web pages have been recognized as documents described by HTML. Image data, such as photographs, logos, maps, illustrations, and decorated text, have been treated as sub-components of Web documents. However, we can alternatively recognize all Web pages as images on the screen. When a Web page is treated as an image, its HTML data is considered to be metadata which describes the image content. Taking such a viewpoint, we propose a new image-based hypermedia which we call continuous web. In our model, there is no distinction between Web images and other images such as photographs.
   Regarding everything on the Web as images leads us to consider a new style of browsing and navigating. We use the term scape-oriented browsing. We define a scape as a collection of continuously accumulated images. For example, whenever we walk in the real world, we can perceive and remember various forms of information through a scape process. Here, we describe new methods for scape-oriented browsing, such as see-through anchors, parallel navigation, and peripheral scape presentation. We have designed and implemented a prototype system based on our model. Our system offers continuous browsing and navigation to users. We explain our concepts and discuss the effectiveness and potential of this approach.
Keywords: hyperimage, images, scape
HPG: a tool for presentation generation in WIS BIBAKFull-Text 242-243
  Bas Rutten; Peter Barna; Flavius Frasincar; Geert-Jan Houben; Richard Vdovjak
Web Information Systems (WIS) support the process of retrieving information from sources on the Web and of presenting them as a hypermedia presentation. Most WIS design methodologies focus on the engineering of the abstract navigation (hyperlinks). The actual presentation generation is less supported. Hera is one of the few WIS methodologies that offer a tool for presentation generation (HPG). The HPG transforms RDF data obtained as the result of a query into a Web presentation suited to the user (in HTML or WML).
Keywords: RDF(S), WIS, XSLT, hypermedia, presentation generation
Automatically generating metadata for digital photographs with geographic coordinates BIBAFull-Text 244-245
  Mor Naaman; Yee Jiun Song; Andreas Paepcke; Hector Garcia-Molina
Given location information on digital photographs, we can automatically generate an abundance of photo-related metadata using off-the-shelf and web-based data sources. These metadata can serve as additional memory cues and filters when browsing a personal or global collection of photos.
Active e-course for constructivist learning BIBAKFull-Text 246-247
  Hai Zhuge; Yanyan Li
An active e-course is a self-representable and self-organizable document mechanism with a flexible structure. The kernel of the active e-course is to organize learning materials into a "concept space" rather than a "page space". Besides highly interactive service it supports adaptive learning by dynamically selecting organizing and presenting the learning materials for different students. During the learning progress it also provides assessments on students' learning performances and gives suggestions to guide them in further learning. We have implemented an authoring tool and a course prototype to support the constructivist learning.
Keywords: active e-course, constructivist learning, course ontology, semantic link network
Are web pages characterized by color? BIBAKFull-Text 248-249
  Norifumi Murayama; Suguru Saito; Manabu Okumura
When human guess the content of a web page, not only the text on the page but also its appearance is an important factor.
   However, there have been few studies on the relationship between the content and visual appearance of a web page.
   We investigating the tendency between them, especially web content and color use, we found a tendency to use color for some kinds of content pages. We think this result opens the way to estimating web content using color information.
Keywords: color, contents of web page
Gossip based streaming BIBAKFull-Text 250-251
  Xinyan Zhang; Jiangchuan Liu
In this paper, we propose a novel multicast streaming protocol for overlay networks, called Gossip Based Streaming (GBS). In GBS, streaming contents are not come from a single upstream source, but delivered from several sources to a client. Though being similar to existing gossip protocols, the unique requirements for streaming, such as continuous playback, are addressed in our design.
   Preliminary results show that GBS performs much better in dynamic user environments.
Keywords: multicast, overlay networks, streaming
A3: framework for user adaptation using XSLT BIBAKFull-Text 252-253
  Daisuke Kanjo; Yukiko Kawai; Katsumi Tanaka
We propose a system called "Adaptation Anywhere & Anytime (A3)", which is a framework for making web sites/applications adaptable to user's needs or interests, and we describe the implement of a web site on A3 by using XSLT. Web sites/applications built on A3 construct user ontologies for each user automatically and share them between sites/applications. Each site/application uses the user ontology to select an appropriate resource for the user and to present such resources in a suitable form. And A3 offers the method for constructing the adaptable web sites using XSLT. The author of web sites can easily make their sites adaptable by using XSLT.
Keywords: XSLT, ontology, semantic web, user adaptation
The PowerRank web link analysis algorithm BIBAKFull-Text 254-255
  Yizhou Lu; Benyu Zhang; Wensi Xi; Zheng Chen; Yi Liu; Michael R. Lyu; Wei-ying Ma
The web graph follows the power law distribution and has a hierarchy structure. But neither the PageRank algorithm nor any of its improvements leverage these attributes. In this paper, we propose a novel link analysis algorithm "the PowerRank algorithm", which makes use of the power law distribution attribute and the hierarchy structure of the web graph. The algorithm consists two parts. In the first part, special treatment is applied to the web pages with low "importance" score. In the second part, the global "importance" score for each web page is obtained by combining those scores together. Our experimental results show that: 1) The PowerRank algorithm computes 10%-30% faster than PageRank algorithm. 2) Top web pages in PowerRank algorithm remain similar to that of the PageRank algorithm.
Keywords: hierarchy structure, page rank algorithm, power distribution
Implementing a proxy agent based writable web for a dynamic information sharing system BIBAKFull-Text 256-257
  Noriharu Tashiro; Hiromitsu Hattori; Takayuki Ito; Toramatsu Shintani
In this paper, we propose a Web based information sharing system called the Proxy Agent-based Information Sharing (PAIS).
   We also developed a writable Web mechanism called Web browser-based Direct Editing (Wedit), that is a major component of PAIS. Wedit enables public users to effectively edit HTML text on an existing Web browser. Since Wedit was developed with conventional technologies, users quickly learn how to use it. PAIS is implemented by using Wedit and a proxy agent. PAIS enables users to share information via Web pages using Wedit. The proxy agent maintains users' editing data. The agent autonomously sends its user's modification data to other agents in the same community. In PAIS, certain confidential information in the community is not publicly shared by using the proxy agent.
Keywords: browsing support, information system, multiagent system
A query algebra for XML P2P databases BIBAKFull-Text 258-259
  Carlo Sartiani
This paper describes a query algebra for queries over XML p2p databases that provides explicit mechanisms for modeling data dissemination, replication constraints, and for capturing the transient nature of data and replicas.
Keywords: XML, peer data management systems, query algebras
Next generation web technologies in content management BIBAKFull-Text 260-261
  Norbeto Fernández-Garcia; Luis Sánchez-Fernandez; Jesús Villamor-Lugo
The development of information and communication technologies and the expansion of the Internet means that, nowadays, there are huge amounts of information available via these emergent media. A number of content management systems have appeared which aim to support the management of these large amounts of content. Most of these systems do not support collaboration among several, distributed sources of managed content. In this paper we present a proposal for an architecture, Infoflex, for the efficient and flexible management of distributed content using Next Generation Web Technologies: Web Services and Semantic Web facilities.
Keywords: content management, semantic web, web services
Web page classification without the web page BIBAKFull-Text 262-263
  Min-Yen Kan
Uniform resource locators (URLs), which mark the address of a resource on the World Wide Web, are often human-readable and can hint at the category of the resource. This paper explores the use of URLs for webpage categorization via a two-phase pipeline of word segmentation/expansion and classification. We quantify its performance against document-based methods, which require the retrieval of the source document.
Keywords: abbreviation expansion, text categorization, uniform resource locator, word segmentation
E-learning personalization based on itineraries and long-term navigational behavior BIBAKFull-Text 264-265
  Enric Mor; Julià Minguillón
In this paper we describe a practical framework for studying then a navigational behavior of the users of an e-learning environment integrated in a virtual campus. The students navigate through the web based virtual campus interacting with learning resources which are structured following the SCORM e-learning standard. Our main goal is to design a usage mining tool for analyzing such user navigational behavior and for extracting relevant information that can be used to validate several aspects related to virtual campus design and usability but also to determine the optimal scheduling for each course depending on user profile. We intend to extend the sequencing capabilities of the SCORM standard to include the concept of recommended itinerary, by combining teachers expertise with learned experience acquired by system usage analysis.
Keywords: SCORM, data mining, e-learning, navigational patterns, personalization
Post-processing InkML for random-access navigation of voluminous handwritten ink documents BIBAKFull-Text 266-267
  Khaireel A. Mohamed; Lioudmila Belenkaia; Thomas Ottmann
The goal of this research is the improvement of browsing voluminous InkML data in two areas: ease of rendering continuous ink-flow for replay-browsing, and ease of random access navigation in eLearning domains. The notion of real-time random access navigation in ink documents has not yet been fully exploited. Users of existing eLearning browsers are restricted to viewing static annotated slides that are inferior in quality when compared to actively replaying the same slides with sequenced ink-flow of the annotated freehand writings. We are developing a tool to investigate ways of managing massive InkML data for efficient "active visible scrolling" of recorded freehand writings in ink documents. This work will also develop and evaluate new post-processing techniques that take advantage of the relationship between ink volumes and active-rendering times for real-time random access navigation.
Keywords: InkML, digital ink, freehand writing, random access
Type based service composition BIBAKFull-Text 268-269
  Ion Constantinescu; Boi Faltings; Walter Binder
Service matchmaking and composition has recently drawn increasing attention in the research community. Most existing algorithms construct chains of services based on exact matches of input/output types. However, this does not work when the available services only cover a part of the range of the input type. We present an algorithm that also allows partial matches and composes them using switches that decide on the required service at runtime based on the actual data type. We report experiments on randomly generated composition problems that show that using partial matches can decrease the failure rate of the integration algorithm using only complete matches by up to 7 times with no increase in the number of directory accesses required. This shows that composition with partial matches is an essential and useful element of web service composition.
Keywords: large scale discovery, partial matches, runtime non-determinism, type based composition, web services
Interpreting distributed ontologies BIBAKFull-Text 270-271
  Yuzhong Qu; Zhiqiang Gao
Semantic Web is challenged by the URI meaning issues arising from putting ontologies in open and distributed environments. As a try to clarify some of the meaning issues, this paper proposes a new approach to interpreting distributed ontologies, it's built on the top of local models semantics, and extends it to deal with the URI sharing by harmonizing the local models via agreement on vocabulary provenance. The commitment relationship is presented to allow the URI sharing between ontologies with richer semantics.
Keywords: OWL, commitment relationship, distributed description logic, vocabulary provenance
A multimodal interaction manager for device independent mobile applications BIBAKFull-Text 272-273
  Florian Wegscheider; Thomas Dangl; Michael Jank; Rainer Simon
This poster presents an overview of the work on an interaction manager of a platform for multimodal applications in 2.5G and 3G mobile phone networks and WLAN environments. The poster describes the requirements for the interaction manager (IM), its tasks and the resulting structure. We examine the W3C's definition of an interaction manager and compare it to our implementation, which accomplishes some additional tasks.
Keywords: device independence, interaction manager, mobile network, multi-user applications, multimodal interface, session management
Modeling the growth of future web BIBAKFull-Text 274-275
  Hai Zhuge; Xue Chen; Xiang Li
The future Web can be imagined as a life network consisting of resource nodes and semantic relationship links between them. Any node has a life span from birth -- adding it to the network -- to death -- removing it from the network. Through establishing and investigating two types of models for such a network, we obtain the same scale free distribution of semantic links. Simulations and comparisons validate the rationality of the proposed models.
Keywords: distribution, evolution, power law, web
Semi-automatic annotation of contested knowledge on the world wide web BIBAKFull-Text 276-277
  Bertrand Sereno; Simon Buckingham Shum; Enrico Motta
We describe a strategy to support the semantic annotation of contested knowledge, in the context of the Scholarly Ontologies project, which aims at building a network of interpretations enriching a corpus of scholarly papers. To model such knowledge, which does not have 'right' and 'wrong' values, we are building on the notion of active recommendations as a means to sparkle annotators' interest. We finally argue for a different approach to the evaluation of its impact.
Keywords: annotation, contesting interpretations, interface, sense-making
An automatic semantic relationships discovery approach BIBAKFull-Text 278-279
  Hai Zhuge; Liping Zheng; Nan Zhang; Xiang Li
An important obstacle to the success of the Semantic Web is that the establishment of the semantic relationship is labor-intensive. This paper proposes an automatic semantic relationship discovering approach for constructing the semantic link network. The basic premise of this work is that the semantics of a web page can be reflected by a set of keywords, and the semantic relationship between two web pages can be determined by the semantic relationship between their keyword sets. The approach adopts the data mining algorithms to discover the semantic relationships between keyword sets, and then uses deductive and analogical reasoning to enrich the semantic relationships. The proposed algorithms have been implemented. Experiment shows that the approach is feasible.
Keywords: algorithm, analogical reasoning, data mining, semantic link network, semantic web
Scheduling web requests in broadcast environments BIBAKFull-Text 280-281
  Jianliang Xu; Wang-Chien Lee; Jiangchuan Liu
On-demand broadcast has been supported in the Internet to enhance system scalability. Unfortunately, most of existing on-demand scheduling algorithms did not consider the time constraints associated with web requests. This paper proposes a novel scheduling algorithm, called Slack Inverse Number of requests (SIN), that takes into account the urgency and productivity of serving pending requests. Trace-driven experiments demonstrate that SIN significantly out performs existing algorithms over a wide range of workloads.
Keywords: on-demand broadcast, scheduling algorithms, time constraints, web
Architecture of a p2p distributed adaptive directory BIBKFull-Text 282-283
  Gennaro Cordasco; Vittorio Scarano; Cristiano Vitolo
Keywords: adaptivity, bookmark sharing, peer to peer
Semantic web applications to e-science in silico experiments BIBAKFull-Text 284-285
  Jun Zhao; Carole Goble; Robert Stevens
This paper explains our research and implementations of manual, automatic and deep annotations of provenance logs for e-Science in silico experiments. Compared to annotating general Web documents, annotations for scientific data require more sophisticated professional knowledge to recognize concepts from documents, and more complex text extraction and mapping mechanisms. A simple automatic annotation approach based on "lexicons" and a deep annotation implemented by semantically populating, translating and annotating provenance logs are introduced in this paper. We used COHSE (Conceptual Open Hypermedia Services Environment) to annotate and browse provenance logs from my Grid project, which are conceptually linked together as a hypertext Web of provenance logs and experiment resources, based on the associated conceptual metadata and reasoning over these metadata.
Keywords: annotation, e-science, integration, ontology, provenance, semantic web
Matching web site structure and content BIBAKFull-Text 286-287
  Vassil Gedov; Carsten Stolz; Ralph Neuneier; Michal Skubacz; Dietmar Seipel
To keep an overview of a complex corporate web sites, it is crucial to understand the relationship of contents, structure and the user's behavior. In this paper, we describe an approach which is allowing us to compare web page content with the information implicitly defined by the structure of the web site. We start by describing each web page with a set of key words. We combine this information with the link structure in an algorithm generating a context based description. By comparing both descriptions, we draw conclusions about the semantic relationship of a web page and its neighbourhood. In this way, we indicate whether a page fits in the content of its neighbourhood. Doing this, we implicitly identify topics which span over several connected web pages. With our approach we support redesign processes by assessing the actual structure and content of a web site with designer's concepts.
Keywords: semantic description, web content mining, web structure
A web personalization system based on web usage mining techniques BIBAKFull-Text 288-289
  Massimiliano Albanese; Antonio Picariello; Carlo Sansone; Lucio Sansone
In the past few years, web usage mining techniques have grown rapidly together with the explosive growth of the web, both in the research and commercial areas. In this work we present a Web mining strategy for Web personalization based on a novel pattern recognition strategy which analyzes and classifies both static and dynamic features. The results of experiments on the data from a large commercial web site are presented to show the effectiveness of the proposed system.
Keywords: clustering, web personalization, web usage mining
Semantic information portals BIBAKFull-Text 290-291
  Dave Reynolds; Paul Shabajee; Steve Cayzer
In this paper, we describe the notion of a semantic information portal. This is a community information portal that exploits the semantic web standards to improve structure, extensibility, customization and sustainability. We are in the process of developing a prototype directory of environmental organizations as a demonstration of the approach and outline the design challenges involved and the current status of the work.
Keywords: information portals, semantic web
Design of a crawler with bounded bandwidth BIBAKFull-Text 292-293
  Michelangelo Diligenti; Marco Maggini; Filippo Maria Pucci; Franco Scarselli
This paper presents an algorithm to bound the bandwidth of a Web crawler. The crawler collects statistics on the transfer rate of each server to predict the expected bandwidth use for future downloads. The prediction allows us to activate the optimal number of fetcher threads in order to exploit the assigned bandwidth. The experimental results show the effectiveness of the proposed technique.
Keywords: bandwidth optimization, parallel web crawlers
A logic-based semantic web html generator -- a poor man's publishing approach BIBAKFull-Text 294-295
  Eero Hyvönen; Arttu Valo; Kim Viljanen; Markus Holi
This paper presents a method and a tool for publishing semantic web content in RDF(S) for the humans as a static HTML page site.
Keywords: content publishing, logic, ontology, semantic web
A method for modeling uncertainty in semantic web taxonomies BIBAKFull-Text 296-297
  Markus Holi; Eeru Hyvönen
We present a method for representing and reasoning with uncertainty in RDF(S) and OWL ontologies based on Bayesian networks.
Keywords: ontology, semantic web, uncertainty
Keyword-based fragment detection for dynamic web content delivery BIBAKFull-Text 298-299
  Daniel Brodie; Amrish Gupta; Weisong Shi
Fragment-based caching has been proposed as a promising technique for dynamic Web content delivery and caching. Most of these approaches either assume the fragment-based content is served by Web server automatically, or look at server-side caching only. There is no method of extracting fragments from an existing dynamic Web content, which is of great importance to the success of fragment-based caching. Also, current technologies for supporting dynamic fragments do not allow to take into account changes in fragment spatiality, which is a popular technique in dynamic and personalized Web site design. This paper describes our effort to address these short comings. The first, DyCA, a Dynamic Content Adapter, is a tool for creating fragment-based content from original dynamic content. Our second proposal is an augmentation to the ESI standard that will allow it to support looking up fragment locations in a mapping table that comes attached with the template. This allows the fragments to move across the document without needing to reserve the template.
Keywords: dynamic web content delivery, fragment detection
Accurate web recommendations based on profile-specific url-predictor neural networks BIBAKFull-Text 300-301
  Olfa Nasraoui; Mrudula Pavuluri
We present a Context Ultra-Sensitive Approach based on two-step Recommender systems (CUSA-2-step-Rec). Our approach relies on a committee of profile-specific neural networks. This approach provides recommendations that are accurate and fast to train because only the URLs relevant to a specific profile are used to define the architecture of each network. We compare the proposed approach with collaborative filtering showing that our approach achieves higher coverage and precision while being faster, and requiring lower main memory at recommendation time. While most recommenders are inherently context sensitive, our approach is context ultra-sensitive because a different recommendation model is designed for each profile separately.
Keywords: collaborative filtering, neural networks, web mining
Choosing the best knowledge base system for large semantic web applications BIBAKFull-Text 302-303
  Yuanbo Guo; Zhengxiang Pan; Jeff Heflin
We present an evaluation of four knowledge base systems with respect to use in large Semantic Web applications. We discuss the performance of each system. In particular, we show that existing systems need to place a greater emphasis on scalability.
Keywords: DAML+OIL, benchmark, evaluation, knowledge base system, semantic web
FADA: find all distinct answers BIBAKFull-Text 304-305
  Hui Yang; Tat-Seng Chua
The wealth of information available on the web makes it an attractive resource for seeking quick answers. While web-based question answering becomes an emerging topic in recent years, the problem of efficiently locating a complete set of distinct answers on the Web is far from being solved. We introduce our system, FADA, which relies on question event analysis, web page clustering, and natural language parsing, to find reliable distinct answers with high recall. The method has been found to be effective in strengthening state-of-the-art Web question answering techniques by emphasizing on answer completeness and uniqueness.
Keywords: question answering, web page classification
Meaning and the semantic web BIBAKFull-Text 306-307
  Bijan Parsia; Peter F. Patel-Schneider
The meaning of names (URI references) is a contentious issue in the Semantic Web. Numerous proposals have been given for how to provide meaning for names in the Semantic Web, ranging from a strict localized model-theoretic semantics to proposals for a unified single meaning. We argue that a slight expansion of the standard model-theoretic semantics for names is sufficient for the present, and can easily be augmented where necessary to allow communities of interest to strengthen this spartan theory of meaning.
Keywords: meaning, representation, semantic web
A semantic approach for designing business protocols BIBAKFull-Text 308-309
  Ashok U. Mallya; Munindar P. Singh
Business processes involve interactions among autonomous partners. We propose that these interactions be specified modularly as protocols. Protocols can be published, enabling implementors to independently develop components that respect published protocols and yet serve diverse interests. A variety of business protocols would be needed to capture subtle business needs. We propose that the same kinds of conceptual abstractions be developed for protocols as for information models. Specifically, we consider (1) refinement: a subprotocol may satisfy the requirements of a superprotocol, but support additional properties and (2) aggregation: a protocol may combine existing protocols. In support of the above, we develop a formal semantics for protocols, an operational characterization of them, and an algebra for protocol composition.
Keywords: business process composition, commitments, web services
Constraint SVG BIBAKFull-Text 310-311
  Cameron L. McCormack; Kim Marriott; Bernd Meyer
We believe it is important for web graphic standards such as SVG to support user interaction and diagrams that can adapt their layout and appearance to their viewing context so as to take into account viewing device characteristics and the viewer's requirements. Previously we suggested that adding expression-based attributes to SVG and using one-way constraints to evaluate these dynamically would considerably improve SVG's support for adaptive layout and user interaction. We describe a minimal backward compatible extension to SVG 1.1, called Constraint SVG (CSVG), that provides such expression-based attributes and its implementation on top of Batik. CSVG also provides another significant extension to SVG 1.1: it allows the author to define new custom elements using XSLT.
Keywords: CSVG, SVG, adaptivity, constraint-based graphics, constraints, differential scaling, document formats, interaction, scalable vector graphics, semantic zooming
A survey of public web services BIBAKFull-Text 312-313
  Su Myeon Kim; Marcel Catalin Rosu
This paper introduces a methodology to provide the first characterization of public Web Services in terms of their evolution, location, complexity, message size, and response time.
Keywords: SOAP, UDDI business registry, WSDL, measurement, web services, web services traffic characteristics
Providing ranked relevant results for web database queries BIBAKFull-Text 314-315
  Ullas Nambiar; Subbarao Kambhampati
Often Web database users experience difficulty in articulating their needs using a precise query. Providing ranked set of possible answers would benefit such users. We propose to provide ranked answers to user queries by identifying a set of queries from the query log whose answers are relevant to the given user query. The relevance detection is done using a domain and end-user independent content similarity estimation technique.
Keywords: content similarity, query suggestion, web-enabled database
On a web browsing support system with 3d visualization BIBAKFull-Text 316-317
  Toshihiro Yamaguchi; Hiromitsu Hattori; Takayuki Ito; Toramatsu Shintani
Existing commercial Web browsers provide various utilities and functions, e.g., Web bookmarks and a browsing history list. Since the bookmark and history functions only the title and URL of the Web page, users who cannot remember the contents of each Web page have difficulty retracing their steps. In this paper, we propose a bookmark system based on a 3D interface. Additionally, our system offers three main functions a 3D browsing history function, a marker function, and a look-ahead loading function. These functions enable users to browse Web pages more effectively.
Keywords: 3D technology, visualization, web browser
The web around the corner: augmenting the browser with gps BIBAKFull-Text 318-319
  Davide Carboni; Andrea Piras; Stefano Sanna; Sylvain Giroux
As programmable mobile devices (such as high-end cellular phones and Personal Digital Assistants) became widely adopted, users ask for Internet access on-the-road. While upcoming technologies like UMTS and Wi-Fi provide broadband wireless communication, Web services and Web browsers do not provide any sort of location-awareness yet. As GPS receivers get cheaper, positioning devices will be embedded into commercial mobile devices. Thus, the position of the user can be used to filter and tailor the information presented to the user as already done for language preferences and user-agent.
   This paper describes early results of an ongoing project called GPSWeb, which aims to provide GPS support for Web browsers and an application model for Location-Based Services. It introduces the Location-Based Browsing concept that enhances the classic Webuser-Website interaction.
Keywords: GPS, LBS, browser, JavaScript, location-awareness
Automatically collecting, monitoring, and mining Japanese weblogs BIBAKFull-Text 320-321
  Tomoyuki Nanno; Toshiaki Fujiki; Yasuhiro Suzuki; Manabu Okumura
We present a system that tries to automatically collect and monitor Japanese blog collections that include not only ones made with blog softwares but also ones written as normal web pages. Our approach is based on extraction of date expressions and analysis of HTML documents. Our system also extracts and mines useful information from the collected blog pages.
Keywords: document analysis, monitoring, text mining, trend analysis, weblogs
A scheme of service discovery and control on ubiquitous devices BIBAKFull-Text 322-323
  Mitsutaka Watanabe; Ken-ichi Takaya; Akishi Seo; Masatomo Hashimoto; Tomonori Izumida; Akira Mori
We have developed a set of hardware and software components to realize ubiquitous computing environments, based on two keywords, simple" (easy to implement) and "open"(adopt widely publicized specifications). Then this set has been resulted into UBKit (Ubiquity Building Toolkit). The Micro-Server an instance of UBKitenables existing consumer electronics to join in computer networks. In this paper we propose a scheme for discovery and control of devices attached to micro-servers."
Keywords: ad-hoc network, peer to peer, service discovery, ubiquitous computing
OREL: an ontology-based rights expression language BIBAKFull-Text 324-325
  Yuzhong Qu; Xiang Zhang; Huiying Li
This paper proposes an Ontology-based Rights Expression Language, called OREL. Based on OWL Web Ontology Language, OREL allows not only users but also machines to handle digital rights at semantics level. The ontology-based rights model of OREL is also presented. The usage of OREL and its advantages against existing RELs are discussed.
Keywords: OREL, OWL, XrML, rights expression language
A semantic matchmaker service on the grid BIBAKFull-Text 326-327
  Andreas Harth; Stefan Decker; Yu He; Hongsuda Tangmunarunkit; Carl Kesselman
A fundamental task on the Grid is to decide what jobs to run on what computing resources based on job or application requirements. Our previous work on ontology-based matchmaking discusses a resource matchmaking mechanism using Semantic Web technologies. We extend our previous work to provide dynamic access to such matchmaking capability by building a persistent online matchmaking service. Our implementation uses the Globus Toolkit for the Grid service development, and exploits the monitoring and discovery service in the Grid infrastructure to dynamically discover and update resource information. We describe the architecture of our semantic matchmaker service in the poster.
Keywords: grid services, networking and distributed web applications, resource allocation, resource selection, semantic web
Web page ranking using link attributes BIBAKFull-Text 328-329
  Ricardo Baeza-Yates; Emilio Davis
We present a variant of PageRank, WLRank, that considers different Web page attributes to give more weight to some links. Our evaluation shows that the precision of the answers can improve significantly.
Keywords: PageRank, web link ranking
CC-Buddy: an adaptive framework for maintaining cache coherency using peers BIBAKFull-Text 330-331
  Song Gao; Wee Siong Ng; Weining Qian; Ying Ao Zhou
In this paper, we propose a framework called CC-Buddy, for maintaining dynamic data coherency in peer-to-peer environment. Working on the basis of peer heterogeneity in data coherency requirement, peers in CC-Buddy cooperate with each other to disseminate the updates by pushing. Simulation results show that our solution not only improves the fidelity in data, but also reduces the workload of servers, therefore achieves high-scalability.
Keywords: cache coherency, dynamic data, peer-to-peer
Dynamic search in peer-to-peer networks BIBAKFull-Text 332-333
  Hsinping Wang; Tsungnan Lin; Chia Hung Chen; Yennan Shen
This work specifically addresses the search issues in unstructured peer-to-peer (P2P) systems that involve the design of an efficient search algorithm, the proposed dynamic search, and the modeling of P2P systems reflecting real measured P2P networks. Through simulations, we will show dynamic search outperforms other existing ones in terms of performance aspects.
Keywords: Modeling, P2P, Gnutella, search algorithm
A novel heterogeneous data integration approach for p2p semantic link network BIBAKFull-Text 334-335
  Hai Zhuge; Jie Liu
This paper proposes a novel approach to integrate heterogeneous data in P2P networks. The approach includes a tool for building P2P semantic link networks, mechanisms for peer schema mapping, criteria for peer similarity degree measurement, and algorithms for heterogeneous data integration. The approach has three advantages: First, it uses semantic links to describe semantic relationships between peers' data schemas. Second, it deals with the semantic heterogeneity, the structural heterogeneity and the data value inconsistency. Finally, it considers the semantic similarity and structural similarity to forward queries to relevant peers.
Keywords: P2P computing, data integration, semantic link, semantic web
ResEval: a web-based evaluation system for internal medicine house staff BIBAKFull-Text 336-337
  H. J. Feldman; M. M. Triola
The evaluation and assessment of physicians-in-training (house staff) is a complex task. Residency training programs are under increasing pressure [1] to provide accurate and comprehensive evaluations of performance of resident physicians [2,3]. For many years, the Internal Medicine training program at NYU School of Medicine used a single standardized paper form for all evaluation scenarios. This strategy was inadequate as physicians train in multiple diverse settings evaluation of physicians in the intensive care unit is quite different from those in the general clinics. The paper system resulted in poor compliance by house staff and faculty in the completion of evaluations. In addition, the data being collected from the paper forms was of poor quality due to the non-specific nature of the questions. A committee was formed in 2001, which created a new strategy for evaluating the core competencies of house staff. Given the ubiquity of web accessible computers in the clinical and non-clinical areas of hospitals and the flexibility a computerized system would provide, a web-based evaluation system was designed and implemented. This system allows for on-the-spot evaluations tailored to the evaluator, evaluatee and the venue of the evaluation. During the 2002 residency year, data was collected on satisfaction and use of the system and compared with the previous paper evaluation.
Keywords: HTML, assessment, education, evaluations, house staff, medicine, oracle, python, web
Affinity rank: a new scheme for efficient web search BIBAKFull-Text 338-339
  Y. Liu; B. Zhang; Z. Chen; M. R. Lyu; W. Ma
Maximizing only the relevance between queries and documents will not satisfy users if they want the top search results to present a wide coverage of topics by a few representative documents. In this paper, we propose two new metrics to evaluate the performance of information retrieval: diversity, which measures the topic coverage of a group of documents, and information richness, which measures the amount of information contained in a document. Then we present a novel ranking scheme, Affinity Rank, which utilizes these two metrics to improve search results. We demonstrate how Affinity Rank works by a toy data set, and verify our method by experiments on real-world data sets.
Keywords: affinity rank, diversity, information richness, link analysis
XJ: integration of XML processing into Java BIBAKFull-Text 340-341
  Matthew Harren; Mukund Raghavachari; Oded Shmueli; Michael G. Burke; Vivek Sarkar; Rajesh Bordawekar
The increased importance of XML as a universal data representation format has led to several proposals for enabling the development of applications that operate on XML data. These proposals range from runtime API-based interfaces to XML-based programming languages. The subject of this paper is XJ, a research language that proposes novel mechanisms for the integration of XML as a first-class construct into Java. The design goals of XJ distinguish it from pastwork on integrating XML support into programming languages -- specifically, the XJ design adheres to the XML Schema and XPath standards, and supports in-place updates of XML data thereby keeping with the imperative nature of Java. We have also built a prototype compiler for XJ, and our preliminary experimental results demonstrate that the performance of XJ programs can approach that of tradition allow level API-based interfaces, while providing a higher level of abstraction.
Keywords: Java, XML, data integration
Exploiting conceptual modeling for web application quality evaluation BIBAKFull-Text 342-343
  P. Fraternali; P. L. Lanzi; M. Matera; A. Maurino
This paper presents an approach and a toolset for exploiting the benefits of conceptual modeling in the quality evaluation tasks that take place both before the deployment and during the operational life of a Web application. The full version of the paper is available as a technical report at the address: http://www.elet.polimi.it/upload/fraterna/FLMM2004.pdf.
Keywords: conceptual modeling, web application quality, web mining
Web page summarization using dynamic content BIBAKFull-Text 344-345
  Adam Jatowt
Summarizing web pages have recently gained much attention from researchers. Until now two main types of approaches have been proposed for this task: content- and context-based methods. Both of them assume fixed content and characteristics of web documents without considering their dynamic nature. However the volatility of information published on the Internet argue for the implementation of more time-aware techniques. This paper proposes a new approach towards automatic web page description, which extends the concept of a web page by the temporal dimension. Our method provides a broader view on web document summarization and can complement the existing techniques.
Keywords: change detection, web document, web page summarization
Testbed for information extraction from deep web BIBAKFull-Text 346-347
  Yasuhiro Yamada; Nick Craswell; Tetsuya Nakatoh; Sachio Hirokawa
Search results generated by searchable databases are served dynamically and far larger than the static documents on the Web. These results pages have been referred to as the Deep Web. We need to extract the target data in results pages to integrate them on different searchable databases. We propose a test bed for information extraction from search results. We chose 100 databases randomly from 114,540 pages with search forms. Therefore, these databases have a good variety. We selected 51 databases which include URLs in a results page and manually identify target information to be extracted. We also suggest evaluation measures for comparing extraction methods and methods for extending the target data.
Keywords: deep web, meta search, testbed, wrapper
Ontological representation of learning objects: building interoperable vocabulary and structures BIBAKFull-Text 348-349
  Jian Qin; Naybell Hernández
The ontological representation of learning objects is a way to deal with the interoperability and reusability of learning objects (including metadata) through providing a semantic infrastructure that will explicitly declare the semantics and forms of concepts used in labeling learning objects. This paper reports the preliminary result from a learning object ontology construction project, which includes an in-depth study of 14 learning objects and over 500 components in these learning objects. An analysis of the types of components and terms used in these objects reveals that most terms fell into the form and subject categories few pedagogical terms were used. Drawing findings from literature and case study, the authors use a matrix to show relationships in learning objects and relevant knowledge and technologies. Strategies and methods in ontology development and implementation are also discussed.
Keywords: content structures, controlled vocabulary, learning objects, metadata, ontologies
Query and content suggestion based on latent interest and topic class BIBAKFull-Text 350-351
  Noriaki Kawaeme; Hideaki Suzuki; Osamu Mizuno
To improve the process of user information retrieval, we propose the concept of a latent semantic map (LSM), along with a method of generating this map. The novel aspect of the LSM is that it can archive user models and latent semantic analysis on one map to support instantaneous information retrieval. With this characteristic, the LSM can improve search engines in terms of not only user support but also search results.
Keywords: document categorization, document suggestion, information retrieval, latent semantic map, query suggestion
Random surfer with back step BIBAKFull-Text 352-353
  Marcin Sydow
We present a novel link-based ranking algorithm RBS, which may be viewed as an extension of PageRank by back-step feature.
Keywords: PageRank, back step, ranking algorithms
Copyright protection on the web: a hybrid digital video watermarking scheme BIBAKFull-Text 354-355
  Pat Pik-Wah Chan; Michael R. Lyu; Roland T. Chin
Video is one of the most popular data shared in the Web, and the protection of video copyright is of vast interest. In this paper, we present a comprehensive approach for protecting and managing video copyrights in the Internet with watermarking techniques. We propose a novel hybrid digital video watermarking scheme with scrambled watermarks and error correction codes. The effectiveness of this scheme is verified through a series of experiments, and the robustness of our approach is demonstrated using the criteria of the latest StirMark test.
Keywords: digital watermarking, hybrid, scene change, video
Distributed ranking over peer-to-peer networks BIBAKFull-Text 356-357
  Dexter Chi Wai Siu; Tak Pang Lau
Query flooding is a problem existing in Peer-to-Peer networks like Gnutella. Firework Query Model solves this problem by Peer Clustering and routes the query message more intelligently. However, it still contains drawbacks like query flooding inside clusters. The condition can be improved if the query message can send directly to the query destination, as the message does not need to send hop by hop. This can be achieved by ranking. By ranking, the network can know the destination and the information quality shared by each peer. We introduce distributed ranking in this paper. We give background of FQM, outline of the proposed method, and conduct a series of experiments that demonstrate the significant reduction of query flooding in a P2P network.
Keywords: distributed peer ranking, peer-to-peer networks
Spam attacks: p2p to the rescue BIBAKFull-Text 358-359
  Ernesto Damiani; Sabrina De Capitani di Vimercati; Stefano Paraboschi; Pierangela Samarati; Andrea Tironi; Luca Zaniboni
We propose a decentralized privacy-preserving approach to spam filtering. Our solution exploits robust digests to identify messages that are a slight variation of one another and a peer-to-peer architecture between mail servers to collaboratively share knowledge about spam.
Keywords: reputation, spam filtering, structured P2P
Site-to-site (s2s) searching using the p2p framework with cgi BIBAKFull-Text 360-361
  Wan Yeung Wong
Peer-To-Peer (P2P) networks like Gnutella improve some shortcomings of Conventional Search Engines (CSE) such as centralized and outdated indexing by distributing the search engines over the peers, which maintain their updated local contents. But they are designed for sharing and searching the contents in personal computers instead of websites. In this work, we propose a novel web information retrieval method called Site-To-Site (S2S) searching, which uses the P2P framework with CGI as protocol. It helps the site owners to turn their websites into autonomous search engines without extra hardware and software costs. In this paper, we introduce S2S searching with some related work. We also describe the system architecture and communication protocol. Finally, we summarize the experimental results, and show that S2S searching works well in one thousand sites.
Keywords: distributed system, peer-to-peer (P2P), search engine, site-to-site (S2S), web information retrieval
Distributed community crawling BIBAKFull-Text 362-363
  Fabrizio Costa; Paolo Frasconi
The massive distribution of the crawling task can lead to inefficient exploration of the same portion of the Web. We propose a technique to guide crawlers exploration based on the notion of Web communities. The stability properties of the method can be used as an implicit coordination mechanism to increase the efficiency of the crawling task.
Keywords: distributed crawling, web communities, web metrics
Web data integration using approximate string join BIBAKFull-Text 364-365
  Yingping Huang; Gregory Madey
Web data integration is an important preprocessing step for web mining. It is highly likely that several records on the web whose textual representations differ may represent the same real world entity. These records are called approximate duplicates. Data integration seeks to identify such approximate duplicates and merge them into integrated records. Many existing data integration algorithms make use of approximate string join, which seeks to (approximately) find all pairs of strings whose distances are less than a certain threshold. In this paper, we propose a new mapping method to detect pairs of strings with similarity above a certain threshold. In our method, each string is first mapped to a point in a high dimensional grid space, then pairs of points whose distances are 1 are identified. We implement it using Oracle SQL and PL/SQL. Finally, we evaluate this method using real data sets. Experimental results suggest that our method is both accurate and efficient.
Keywords: approximate string join, data integration
Filtering spam e-mail on a global scale BIBAKFull-Text 366-367
  G. Hulten; J. Goodman; R. Rounthwaite
In this paper we analyze a very large junk e-mail corpus which was generated by a hundred thousand volunteer users of the Hotmail e-mail service. We describe how the corpus is being collected, and analyze: the geographic origins of the e-mail who the e-mail is targeting and what the e-mail is selling.
Keywords: Junk E-mail, international e-mail, spam
The effect of different types of site maps on user's performance in an information-searching task BIBAKFull-Text 368-369
  A. Yip
This study examines the effects of different types of site maps on user's performance in an information-searching task for three web sites. Forty-two participants (22 males and 20 females) participated in the study. The results showed significant effects on the types of site maps used. It was found that participants found the correct answers more often, required less time, visited significantly fewer web pages, and required fewer clicks to complete the task when the site map was visible. However, it was found that the participants had a lower success rate in finding the correct answers when the site map had hyperlinks. In addition, the results showed significant performance differences among the three web sites and the effects of a site map were found to be more prominent for a larger web site.
Keywords: hypertext, site map, web navigation
The effect of the back button in a random walk: application for PageRank BIBAKFull-Text 370-371
  Fabien Mathieu; Mohamed Bouklit
Theoretical analysis of the Web graph is often used to improve the efficiency of search engines. The PageRank algorithm, proposed by Brin and Page, is used by the Google search engine to improve the results of the queries. The purpose of this article is to describe an enhanced version of the PageRank algorithm using a realistic model for the back button. We introduce a limited history stack model (you cannot click more than m times in a row), and show that when m=1, the computation of this Back PageRank can be as fast as that of a standard PageRank.
Keywords: back button, flow, PageRank, random walk, web analysis
Distribution of relevant documents in domain-level aggregates for topic distillation BIBAKFull-Text 372-373
  V. Plachouras; I. Ounis
In this paper, we study the distribution of relevant documents in aggregates, formed by grouping the retrieved documents according to their domain. For each aggregate, we take into account its size, and a measure of the correlation between its incoming and outgoing hyperlinks. We report on a preliminary experiment with two TREC topic distillation tasks, where we find that larger aggregates, or those aggregates with correlated hyperlinks, are more likely to contain relevant documents. This result shows that the distribution of domain-level aggregates is potentially useful for finding relevant documents.
Keywords: aggregates, distribution of relevant documents, web IR
A diagrammatic inference system for the web BIBAKFull-Text 374-375
  Michael Wollowski; Peter Nei; Chris Barrell
We developed a diagrammatic inference system for the World Wide Web. Our system enables the creation of diagrams such that the information contained in them can be searched and inference can be performed on it. We developed an XMLSchema for bar, line, and pie charts. Based on it, we developed software that transforms a corresponding XML file into an SVG image, which in turn is rendered by the client as an image. Additionally, we developed a search engine which enables a user to find information explicitly contained in the XML file, and as such in the image. Furthermore, we developed an inference engine which enables a user to locate information that is implicitly contained in the image.
Keywords: XML, inference, inference system, search, searchable diagrams
Digital repository interoperability: design, implementation and deployment of the ecl protocol and connecting middleware BIBAKFull-Text 376-377
  Ty Mey Eap; Marek Hatala; Griff Richards
This paper describes the design and implementation of the eduSource Communication Layer (ECL) protocol. ECL is one outcome of a pan-Canadian project called eduSource Canada to build an open network of interoperable digital repositories. The design goal was to achieve a highly flexible, easy-to-use, and platform independent communication layer protocol that allows new and existing repositories to communicate and share resources across a network. ECL conforms to IMS Digital Repository Interoperability (DRI) specifications and supports four main functions: search/expose, submit/store, gather/expose and request/deliver. The ECL protocol builds on the latest standards and is flexible with respect to metadata schemas and repository contents. To support easy adoption of the protocol we provide middleware components for connecting existing systems. The ECL is currently used in the eduSource network, and we have begun work bridging with other interoperable initiatives such as Open Knowledge Initiative (OKI). Based on our experience, ECL is truly flexible and easy to use.
Keywords: interoperability, middleware, protocols
MetaCrystal: visualizing the degree of overlap between different search engines BIBAKFull-Text 378-379
  Anselm Spoerri
MetaCrystal enables users to visualize and control the degree of overlap between the results returned by different search engines. Several linked overview tools support rapid exploration, facilitate complex filtering operations and guide users toward relevant information. MetaCrystal addresses the problem of the effective fusion of different search results by helping users to visually combine and filter the top results returned by the different engines. Users can apply weights to the search engines to create their own ranking functions. They can control the degree of overlap by modifying the URL directory depth used to match documents or by changing the number of top documents being compared.
Keywords: information visualization, meta searching
HuskySim: a simulation toolkit for application scheduling in computational grids BIBAKFull-Text 380-381
  Mohamed A. Kerasha; Ian Greenshields
Grid computing -- the assemblage of heterogeneous distributed clusters of computers viewed as a single virtual machine -- promises to serve as the next major paradigm in distributed computing. Since Grids are assemblages of (usually) autonomous systems (autonomous clusters, supercomputers, or even single workstations) scheduling can become a complex affair which must take into consideration not just the requirements (and scheduling decisions) made at the point of the job's origin, but also the scheduling requirements (and decisions) made at remote points on the fabric, and in particular scheduling decisions made by a remote autonomous system onto which the local job has been scheduled. The current existing scheduling models range from static, where each of the programs is assigned once to a processor before execution of the program commences, to dynamic, where a program may be reassigned to different processors, or a hybrid approach, which combines characteristics of both techniques [1,4,5].
   To address this issue, we have developed a JAVA based discrete event Grid simulator toolkit called HuskySim. The HuskySim toolkit provides core functionalities (e.g., compute objects, network objects, and scheduling objects) that can be used to simulate a distributed computing environment. Furthermore, it can be used to predict the performance of various classes of Grid scheduling algorithms including: Static scheduling algorithms, Dynamic scheduling, Adaptive Scheduling.
   In our design, we adopted an object-oriented design, which allows an easy mapping and integration of simulation objects into the simulation program. This approach simplifies the simulation of multitasking, and distributed data processing model. Our model of multitasking processing is based on an interrupt driven mechanism.
   As shown in Figure 1, the simulator works by relaying messages between the core engine and the simulation modules through the message handling sub-system. Once the architecture, the load distribution, and the scheduling algorithms are defined, the object registration subsystem sends a NEW OBJECT REQUEST MESSAGE to the object class libraries and builds a skeleton for the requested simulation experiment.
   Workloads traces can be generated using probabilistic models. The currently supported distributions are: Uniform, Poisson, Exponential, Normal, Erlang, and Power Tailed. It is also possible to use real world load traces. Moreover, we augmented the Simulator with a statistical module. Using the statistical module provided with the HuskySim, the core simulation engine can send messages to perform various type of analysis on the performance data including: variance reduction, regression, time series analysis, clustering, and data mining.
   In order to quantify the system performance, the simulator provides various performance metrics including: CPU utilization, disk utilization, application turnaround time, latency, make span, host to host bandwidth, jammed bandwidth, and TCP/IP traffic data. These measurements are handled through the measurement sub-system.
   Furthermore, the HuskySim can be used to simulate the classes of algorithmic and parametric adaptive Grid schedulers. In which, the scheduling algorithm may not be fixed in advance. Simply, the scheduling algorithm is selected at run time based on the current workload on the Grid fabric in order to operate at near optimal level.
Keywords: adaptive scheduling, computational grids, discrete event simulation, performance prediction
Computing personalized pageranks BIBAKFull-Text 382-383
  Franco Scarselli; Ah Chung Tsoi; Markus Hagenbuchner
A recently published approach to adaptive page rank, using the solution of quadratic optimization methods with a set of simple constraints, is modified to permit classification of web pages according to their page contents, URLs. This modification allows the approach to be more adapted to the needs of focussed crawlers, or personalized search engines.
Keywords: interface personalization, pagerank, search engines
Rank aggregation for meta-search engines BIBAKFull-Text 384-385
  Ka Wai Lam; Chi Ho Leung
In this paper, we present an algorithm for merging results from different data sources in meta-search engine. We further extend one that has developed for ranking players of a round-robin tournament to a more general one when the ranking input is given from multiple sources. The problem in meta-search engine can be represented by a complete directed graph which can be used by the Majority Spanning Tree (MST) algorithm. It is useful especially when the system must integrate and merge the query results that are returned from various search engines in a consistent manner.
Keywords: meta-search engines, rank aggregation
Using semantic web approach in augmented audio reality system for museum visitors BIBAKFull-Text 386-387
  Leila Kalantari; Marek Hatala; Jordan Willms
In this paper, we describe our work in progress on the reasoning module of ec(h)o, an augmented audio-reality interface for museum visitors utilizing spatialized soundscapes and a semantic web approach to information. We used ontologies to describe the semantics of sound objects and represent user model. A rule-based system for selecting sound object uses semantic description of objects, visitor's interaction history and heuristics for continuity of the dialogue between user and the system.
Keywords: augmented-audio reality, inference rules, ontologies, user model
A storage and indexing framework for p2p systems BIBAKFull-Text 388-389
  Adina Crainiceanu; Prakash Linga; Ashwin Machanavajjhala; Johannes Gehrke; Jayavel Shanmugasundaram
We present a modularized storage and indexing framework that cleanly separates the functional components of a P2P system, enabling us to tailor the P2P infrastructure to the specific needs of various Internet applications eat, without having to devise completely new storage management and index structures for each application.
Keywords: indexing framework, p2p, peer-to-peer
P-tree: a p2p index for resource discovery applications BIBAKFull-Text 390-391
  Adina Crainiceanu; Prakash Linga; Johannes Gehrke; Jayavel Shanmugasundaram
We propose a new distributed, fault-tolerant Peer-to-Peer index structure for resource discovery applications called the P-tree. P-trees efficiently support range queries in addition to equality queries.
Keywords: indexing, peer-to-peer, range queries, resource discovery
Updating PageRank with iterative aggregation BIBAKFull-Text 392-393
  Amy Nicole Langville; Carl Dean Meyer
We present an algorithm for updating the PageRank vector [1]. Due to the scale of the web, Google only updates its famous PageRank vector on a monthly basis. However, the Web changes much more frequently. Drastically speeding the PageRank computation can lead to fresher, more accurate rankings of the webpages retrieved by search engines. It can also make the goal of real-time personalized rankings within reach. On two small subsets of the web, our algorithm updates PageRank using just 25% and 14%, respectively, of the time required by the original PageRank algorithm. Our algorithm uses iterative aggregation techniques [7, 8] to focus on the slow-converging states of the Markov chain. The most exciting feature of this algorithm is that it can be joined with other PageRank acceleration methods, such as the dangling node lumpability algorithm [6], quadratic extrapolation [4], and adaptive PageRank [3], to realize even greater speedups (potentially a factor of 60 or more speedup when all algorithms are combined). every few weeks. Our solution harnesses the power of iterative aggregation principles for Markov chains to allow for much more frequent updates to the valuable ranking vectors.
Keywords: Markov chains, aggregation, disaggregation, link analysis, PageRank, power method, stationary vector, updating
Visual web mining BIBAKFull-Text 394-395
  Amir H. Youssefi; David J. Duke; Mohammed J. Zaki
Analysis of web site usage data involves two significant challenges: firstly the volume of data, arising from the growth of the web, and secondly, the structural complexity of web sites. In this paper we apply Data Mining and Information Visualization techniques to the web domain in order to benefit from the power of both human visual perception and computing we term this Visual Web Mining. In response to the two challenges, we propose a generic framework, where we apply Data Mining techniques to large web data sets and use Information Visualization methods on the results. The goal is to correlate the outcomes of mining Web Usage Logs and the extracted Web Structure by visually superimposing the results. We design several new information visualization diagrams.
Keywords: data mining, frequent access patterns, information visualization, visual data exploration, web usage mining
Small world peer networks in distributed web search BIBAKFull-Text 396-397
  R. Akavipat; L-S. Wu; F. Menczer
In ongoing research, a collaborative peer network application is being proposed to address the scalability limitations of centralized search engines. Here we introduce a local adaptive routing algorithm used to dynamically change the topology of the peer network based on a simple learning scheme driven by query response interactions among neighbors. We test the algorithm via simulations with 70 model users based on actual Web crawls. We find that the network topology rapidly converges from a random network to a small world network, with emerging clusters that match the user communities with shared interests.
Keywords: peer collaborative search, small world, topical crawlers
TV2Web: generating and browsing web with multiple LOD from video streams and their metadata BIBAKFull-Text 398-399
  Kazutoshi Sumiya; Mahendren Munisamy; Katsumi Tanaka
We propose a method of automatically constructing Web content from video streams with metadata that we call TV2Web. The Web content includes thumbnails of video units and caption data generated from metadata. Users can watch TV ona normal Web browser. They can also manipulate Web content with zooming metaphors to seamlessly alter the level of detail (LOD) of the content being viewed. They can search for favorite scenes faster than with analog video equipment, and experience a new cross-media environment. We also developed a prototype of the TV2Web system and discuss its implementation.
Keywords: generation of Web content, level of detail, metadata, video stream, web browser from video streams and their metadata
Self-learning web question answering system BIBAFull-Text 400-401
  Dmitri Roussinov; Jose Robles
While being quite successful in providing keyword based access to web pages, commercial search portals, such as Google, Yahoo, AltaVista, and AOL, still lack the ability to answer questions expressed in a natural language. In this paper, we present a probabilistic approach to automated question answering on the Web. Our approach is based on pattern matching and answer triangulation. By taking advantage of the redundancy inherent in the Web, each answer found by the system is triangulated (confirmed or disconfirmed) against other possible answers. Our approach is entirely self-learning: it does not involve any linguistic resources, nor it does require any manual tuning. Thus, the propose approach can easily be replicated in other information systems with large redundancy.
Integrating elliptic curve cryptography into the web's security infrastructure BIBAKFull-Text 402-403
  Vipul Gupta; Douglas Stebila; Sheueling Chang Shantz
RSA is the most popular public-key cryptosystem on the Web today but long-term trends such as the proliferation of smaller, simpler devices and increasing security needs will make continued reliance on RSA more challenging over time. We offer Elliptic Curve Cryptography (ECC) as a suitable alternative and describe our integration of this technology into several key components of the Web's security infrastructure. We also present experimental results quantifying the benefits of using ECC for secure web transactions.
Keywords: Apache, elliptic curve cryptography, Mozilla, openSSL
On mining webclick streams for path traversal patterns BIBAKFull-Text 404-405
  Hua-Fu Li; Suh-Yin Lee; Man-Kwan Shan
Mining user access patterns from a continuous stream of Web-clicks presents new challenges over traditional Web usage mining in a large static Web-click database. Modeling user access patterns as maximal forward references, we present a single-pass algorithm StreamPath for online discovering frequent path traversal patterns from an extended prefix tree-based data structure which stores the compressed and essential information about user's moving histories in the stream. Theoretical analysis and performance evaluation show that the space requirement of StreamPath is limited to a logarithmic boundary, and the execution time, compared with previous multiple-pass algorithms [2], is fast.
Keywords: data stream mining, path traversal patterns, web-click streams
Web image learning for searching semantic concepts in image databases BIBAKFull-Text 406-407
  Chu-Hong Hoi; Michael R. Lyu
Without textual descriptions or label information of images, searching semantic concepts in image databases is still a very challenging task. While automatic annotation techniques are yet along way off, we can seek other alternative techniques to solve this difficult issue. In this paper, we propose to learn Web images for searching the semantic concepts in large image databases. To formulate effective algorithms, we suggest to engage the support vector machines for attacking the problem. We evaluate our algorithm in a large image database and demonstrate the preliminary yet promising results.
Keywords: image retrieval, relevance feedback, semantic searching, support vector machine, web image learning
An XPath-based discourse analysis module for spoken dialogue systems BIBAKFull-Text 408-409
  Giuseppe Di Fabbrizio; Charles Lewis
This paper describes an XPath-based discourse analysis module for Spoken Dialogue Systems that allows the dialogue author to easily manipulate and query both the user input's semantic representation and the dialogue context using a simple and compact formalism. We show that, in managing the human-machine interaction, the discourse context and the dialogue history are effectively represented as Document Object Model (DOM) structures. DOM defines interfaces that dialogue scripts can use to dynamically access and update the content, the structure and the style of the documents. In general, this approach applies also to richer multimedia and multimodal interactions where the interpretation of the user input depends on a combination of input modality.
Keywords: XPath, discourse analysis, spoken dialogue systems
Metadata co-development: a process resulting in metadata about technical assistance to educators BIBAKFull-Text 410-411
  Michael B. Knapp; Sara Dexter; Robert McLaughlin
Metadata development can be challenging because the vocabulary should be flexible and extensible, widely applicable, interoperable, and both machine and human readable. We describe how we engaged members of organizations in the field of technical assistance to educators in a process of metadata development, and the challenges we faced. The result was a an ontology for the communities of practice that is interoperable and can evolve it was then used to catalogue resources for dissemination via the Semantic Web.
Keywords: RDF, education, metadata, resource cataloging, semantic web, technical assistance
RDF triples in XML BIBAKFull-Text 412-413
  Jeremy J. Carroll; Patrick Stickler
RDF/XML does not layer RDF on top of XML in a useful way. We use a simple direct representation of the RDF abstract syntax in XML. We add the ability to name graphs, noting that in practice this is already widely used. We use XSLT as a general syntactic extensibility mechanism to provide human friendly macros for our syntax. This provides a simple serialization solving a persistent problem in the Semantic Web.
Keywords: RDF, XML, semantic web
Automatic extraction of web search interfaces for interface schema integration BIBAKFull-Text 414-415
  Hai He; Weiyi Meng; Clement Yu; Zonghuan Wu
This paper provides an overview of a technique for extracting information from the Web search interfaces of e-commerce search engines that is useful for supporting automatic search interface integration. In particular, we discuss how to group elements and labels on a search interface into attributes and how to derive certain meta-information for each attribute.
Keywords: metasearch engine, search engine, search interface extraction, search interface representation
Clustering e-commerce search engines BIBAKFull-Text 416-417
  Qian Peng; Weiyi Meng; Hai He; Clement Yu
In this paper, we sketch a method for clustering e-commerce search engines by the type of products/services they sell. This method utilizes the special features of interface pages of such search engines. We also provide an analysis of different types of ESE interface pages.
Keywords: document clustering, search engine categorization
Publishing museum collections on the semantic web: the MuseumFinland portal BIBAKFull-Text 418-419
  Eero Hyvönen; Miikka Junnila; Suvi Kettula; Eetu Mäkelä; Samppa Saarela; Mirva Salminen; Ahti Syreeni; Arttu Valo; Kim Viljanen
Museum collections contain large amounts of data and semantically rich, mutually interrelated metadata in heterogeneous databases. The publication of museum collections on the web is therefore a very promising application domain for semantic web techniques. We present a semantic web portal called MuseumFinland -- Finnish Museums on the Semantic Web1" [3] that contains some 4,000 cultural artifacts from the collections of three museums using three different database schemas and database systems. The system is based on seven RDF(S) ontologies consisting of some 10,000 classes and individuals.
Keywords: content publishing, ontology, semantic web
Ontalk: ontology-based personal document management system BIBAKFull-Text 420-421
  Hak Lae Kim; Hong Gee Kim; Kyung-Mo Park
In this paper, we present our development of a document management and retrieval tool, which is named Ontalk. Our system provides a semi-automatic metadata generator and an ontology-based search engine for electronic documents. Ontalk can create or import various ontologies in RDFS or OWL for describing the metadata. Our system that is built upon. NET technology is easily communicated with or flexibly plugged into many different programs.
Keywords: document management, inference etc., knowledge management, ontology
Best bets: thousands of queries in search of a client BIBAKFull-Text 422-423
  Giuseppe Attardi; Andrea Esuli; Maria Simi
A number of applications require selecting targets for specific contents on the basis of criteria defined by the contents providers rather than selecting documents in response to user queries, as in ordinary information retrieval. We present a class of retrieval systems, called Best Bets, that generalize Information Filtering and encompass a variety of applications including editorial suggestions, promotional campaigns and targeted advertising, such as Google AdWords. We developed techniques for implementing Best Bets systems addressing performance issues for large scale deployment as efficient query search, incremental updates and dynamic ranking.
Keywords: information filtering, information retrieval, proactive content delivery, query, search
XML data mediator integrated solution for XML roundtrip from XML to relational BIBAKFull-Text 424-425
  Nianjun Zhou; George Mihaila; Dikran Meliksetian
This paper presents a system for efficient data transformations between XML and relational databases, called XML Data Mediator (XDM). XDM enables the transformation by externalizing the specification of the mapping in a script and using an efficient run-time engine that automates the conversion task. The runtime engine is independent from the mapping script. A parser converts a mapping script into an internal conversion object. For the mapping from relational to XML, we use a tagging tree as a conversion object inside the runtime engine, and use an SQL outer-join scheme to combine multiple SQL queries in order to reduce the number of backend relational database accesses. For the mapping from XML to relational, the conversion object is a shredding tree, and we use an innovative algorithm to process the XML as a stream in order to achieve linear complexity with respect to the size of the XML document.
Keywords: RDBMS, XML, XSL, relational database, shredding
Browser-based applications: positive transference or interference? BIBAKFull-Text 426-427
  Mark S. Silver; Sidne G. Ward
Applications that run on top of web browsers dominate the Internet today. Given the many similarities among these applications' features, positive transference from one to another is often seen as an important source of ease-of-use for such applications. This paper examines the many differences in the way similar features are implemented in different browser-based applications, analyzing the way these inconsistencies can lead to negative transference (interference) that degrades rather than enhances usability.
Keywords: browser-based applications, interference, transference, usability
SEMPL: a semantic portal BIBAKFull-Text 428-429
  M. S. Perry; E. Stiles
Semantic Web technology is intended for the retrieval, collection, and analysis of meaningful data with significant automation afforded by machine understandability of data [1]. As one illustration of semantic web technology in action, we present SEMPL, a semantic web portal for the Large Scale Distributed Information Systems lab (LSDIS) at the University of Georgia. SEMPL, which is powered by a state of the art commercial system, Semagix Freedom [7], uses an ontology-driven approach to provide semantic browsing, linking, and contextual querying of content within the portal. By using the ontology based information integration technique, SEMPL can specify the context of a particular piece of research information, annotate web pages, and provide links to semantically related areas enabling a rich contextual retrieval of information.
Keywords: ontology, semantic portal, semantic web
Can I find what I'm looking for? BIBAKFull-Text 430-431
  Patrizia Andronico; Marina Buzzi; Barbara Leporini
In recent years, search engine research has grown rapidly in areas such as algorithms, strategies and architecture, increasing both effectiveness and quality of results. However, a very important aspect that is often neglected is the user interface. In this work we analyzed the interfaces of several popular search tools from the user's point of view, and collected individual feedback in order to determine whether it is possible to improve interface design.
Keywords: accessibility, search engine, usability, user interface
Online feedback by tests and reporting for eLearning and certification programs BIBAKFull-Text 432-433
  Dirk Bade; Georg Nüssel; Gerd Wilts
The evaluation of eLearning success is an indispensable business requirement of education programs: the easy registration of 'visits' to eLearning websites is, however, not sufficient in most cases. Additional metrics from authenticated logins and reports of learning activity and success -- as obtained from specific online tests -- are required. The aim is to document the acceptance, progress and return of investment (ROI) of eLearning programs, and set up additional training well tailored to the needs of a specific learning community. An example from a corporate certification program proves the applicability of the proposed processes.
Keywords: blended learning, eLearning, online tests
A generic uiml vocabulary for device- and modality independent user interfaces BIBAKFull-Text 434-435
  Rainer Simon; Michael Jank Kapsch; Florian Wegscheider
We present in this poster our work on a User Interface Markup Language (UIML) vocabulary for the specification of device- and modality independent user interfaces. The work presented here is part of an application-oriented project. One of the results of the project is a prototype implementation of a generic platform for device independent multimodal mobile applications. The poster presents the requirements for a generic user interface description format and explains our approach on an integrated description of user interfaces for both graphical and voice modality. A basic overview of the vocabulary structure, its language elements and main features is presented.
Keywords: UIML, device-independence, generic user interface description, mobile devices, mobile networks, multimodal user interfaces, multimodality, voice interfaces
Semantic api matching for automatic service composition BIBAFull-Text 436-437
  Doina Caragea; Tanveer Syeda-Mahmood
In this paper, we address the problem of matching I/O descriptions of services to enable their automatic service composition. Specifically, we develop a method of semantic schema matching and apply it to the API schemas constituting the I/O descriptions of services. The algorithm assures an optimal match of corresponding entities by obtaining a maximum matching in a bi-partite graph formed from the attributes.
Delivering web service coordination capability to users BIBAKFull-Text 438-439
  Tom Oinn; Matthew Addis; Justin Ferris; Darren Marvin; Mark Greenwood; Carole Goble; Anil Wipat; Peter Li; Tim Carver
As web service technology matures there is growing interest in exploiting workflow techniques to coordinate web services. Bioinformaticians are a user community who combine web resources to perform in silico experiments. These users are scientists and not information technology experts they require workflow solutions that have a low cost of entry for service users and providers. Problems satisfying these requirements with current techniques led to the development of the Simple conceptual unified flow language (Scufl). Scufl is supported by the Freefluo enactment engine [1], and the Taverna editing workbench [3]. The extensibility of Scufl, supported by these tools, means that workflows coordinating web services can be matched to how users view their problems. The Taverna workbench exploits the web to keep Scufl simple by retrieving detail from URIs when required, and by scavenging the web for services. Scufl and its tools are not bioinformatics specific. They can be exploited by other communities who require user-driven composition and execution of workflows coordinating web resources.
Keywords: bioinformatics, e-science, scientific workflows, web programming, web service coordination, web services
Dealing with different distributions in learning from BIBAKFull-Text 440-441
  Xiaoli Li; Bing Liu
In the problem of learning with positive and unlabeled examples, existing research all assumes that positive examples P and the hidden positive examples in the unlabeled set U are generated from the same distribution. This assumption may be violated in practice. In such cases, existing methods perform poorly. This paper proposes a novel technique A-EM to deal with the problem. Experimental results with product page classification demonstrate the effectiveness of the proposed technique.
Keywords: classification, positive and unlabeled learning
TCOZ approach to semantic web services design BIBAKFull-Text 442-443
  Jin Song Dong; Yuan Fang Li; Hai Wang
Complex Semantic Web (SW) services may have intricate data state, autonomous process behavior and concurrent interactions. The design of such SW service systems requires precise and powerful modelling techniques to capture not only the ontology domain properties but also the services' process behavior and functionalities. In this paper we apply an integrated formal modeling language, Timed Communicating Object Z (TCOZ), to design SW services. Furthermore, the paper presents the development of the systematic translation rules and tools which can automatically extract the SW ontology and services semantic markup from the formal TCOZ design model.
Keywords: DAML+OIL, DAML-S, TCOZ, formal methods, semantic web
C3W: clipping, connecting and cloning for the web BIBAKFull-Text 444-445
  Jun Fujima; Aran Lunzer; Kasper Hornbæk; Yuzuru Tanaka
Many of today's Web applications support just simple trial-and error retrievals: supply one set of parameters, obtain one set of results. For a user who wants to examine a number of alternative retrievals, this form of interaction is inconvenient and frustrating. It can be hard work to keep finding and adjusting the parameter specification widgets buried in a Web page, and to remember or record each result set. Moreover, when using diverse Web applications in combination -- transferring result data from one into the parameters for another -- the lack of an easy way to automate that transfer merely increases the frustration. Our solution is to integrate techniques for each of three key activities: clipping elements from Web pages to wrap an application; connecting wrapped applications using spreadsheet-like formulas; and cloning the interface elements so that several sets of parameters and results may be handled in parallel. We describe a prototype that implements this solution, showing how it enables rapid and flexible exploration of the resources accessible through user-chosen combinations of Web applications. Our aim in this work is to contribute to research on making optimal use of the wealth of information on the Web, by providing interaction techniques that address very practical needs.
Keywords: Web application linkage, Web navigation, intelligent Pad, interfaces, subjunctive
A web services architecture for learning object discovery and assembly BIBAKFull-Text 446-447
  Claus Pahl; Ronan Barrett
Courseware systems are often based on an assembly of different components, addressing the different needs of storage and delivery functionality. The Learning Technology Standard Architecture LTSA provides a generic architectural framework for these systems. Recent developments in Web technology -- e.g. the Web services framework -- have greatly enhanced the flexible and interoperable implementation of courseware architectures.
   We argue that in order to make the Web services philosophy work, two enhancements to the LTSA approach are required. Firstly, a combination with metadata annotation is needed to support the discovery of educational Web services. Secondly, if these components are to be provided in form of services, more support is needed for their assembly. Architectural patterns of a finer degree of granularity shall satisfy this need.
Keywords: architecture, assembly, discovery, interface descriptions, metadata, teaching and learning environments, web services
On the temporal dimension of search BIBAKFull-Text 448-449
  Philip S. Yu; Xin Li; Bing Liu
Web search is probably the single most important application on the Internet. The most famous search techniques are perhaps the PageRank and HITS algorithms. These algorithms are motivated by the observation that a hyperlink from a page to another is an implicit conveyance of authority to the target page. They exploit this social phenomenon to identify quality pages, e.g., "authority" pages and "hub" pages. In this paper we argue that these algorithms miss an important dimension of the Web, the temporal dimension. The Web is not a static environment. It changes constantly. Quality pages in the past may not be quality pages now or in the future. These techniques favor older pages because these pages have many in-links accumulated over time. New pages, which may be of high quality, have few or no in-links and are left behind. Bringing new and quality pages to users is important because most users want the latest information. Research publication search has exactly the same problem. This paper studies the temporal dimension of search in the context of research publication search. We propose a number of methods deal with the problem. Our experimental results show that these methods are highly effective.
Keywords: publication search, temporal dimension of search, web search
Integrating learning objects into an open learning environment: evaluation of learning processes in an informatics learning lab BIBAKFull-Text 450-451
  Johann Magenheim; Olaf Scheel
The Didactics of Informatics research group at the University of Paderborn is involved in efforts to design implement and evaluate a web-based learning laboratory for informatics (ILL). The ILL mainly serves the purpose of an open interactive learning environment for software engineering. The poster presentation shows the main components of an ILL and the types of media that are used. A didactical concept, learning strategies and the efforts to create self-organizing learning communities in the ILL are also topics of the poster. Finally, an evaluation concept will be presented including some basic results of empirical research which was done during a seminar held in the summer term 2003.
Keywords: blended learning, computer-based exploration environment, deconstruction of software, informatics learning lab, learning communities, learning objects
Combining link and content analysis to estimate semantic similarity BIBAKFull-Text 452-453
  Filippo Menczer
Search engines use content and link information to crawl, index, retrieve, and rank Web pages. The correlations between similarity measures based on these cues and on semantic associations between pages therefore crucially affects the performance of any search tool. Here I begin to quantitatively analyze the relationship between content, link, and semantic similarity measures across a massive number of Web page pairs. Maps of semantic similarity across textual and link similarity highlight the potential and limitations of lexical and link analysis for relevance approximation, and provide us with a way to study whether and how text and link based measures should be combined.
Keywords: Web search, content and link similarity, precision, recall, semantic maps
Graph-based text database for knowledge discovery BIBAKFull-Text 454-455
  Junji Tomita; Hidekazu Nakawatase; Megumi Ishii
While we expect to discover knowledge in the texts available on the Web, such discovery usually requires many complex analysis steps, most of which require different text handling operations such as similar text search or text clustering. Drawing an analogy from the relational data model, we propose a text representation model that simplifies the steps. The model represents texts in a formal manner, Subject Graphs, described herein, provides text handling operations whose inputs and outputs are identical in form, i.e. a set of subject graphs. We develop a graph-based text database, which is based on the model, and an interactive knowledge discovery system. Trials of the system show that it allows the user to interactively and intuitively discover knowledge in Web pages by combining text handling operations defined on subject graphs in various orders.
Keywords: interactive search, knowledge discovery, subject graphs
Combining individual tutoring with automatic course sequencing in WBT systems BIBAKFull-Text 456-457
  Denis Helic; Hermann Maurer; Nick Scerbakov
Usually, the success of systems using automatic course sequencing depends strongly on careful authoring and foreseeing of all curriculum alternatives before any learning session even starts. We believe that tutors, starting from a simple generic curriculum, and assuming that they have the proper tools, can much easier create curriculum alternatives as immediate response to the current learning situation. In this paper we present a tool that provides a flexible environment for tutors allowing them to customize, and develop the curriculum on-the-fly. However, since individual tutoring is quite expensive we shortly discuss possibilities for enabling automatic adjustment of course curriculum to learners' needs by combining on-the-fly curriculum alternatives created by tutors with well-known automatic course sequencing techniques.
Keywords: WBT, course curriculum, course sequencing, tutoring
Time-based contextualized-news browser (t-cnb) BIBAKFull-Text 458-459
  Akiyo Nadamoto; Katsumi Tanaka
We propose a new way of browsing contextualized-news articles. Our prototype browser system is called a Time-based Contextualized-News Browser (T-CNB). The T-CNB concurrently and automatically presents a series of related pages for one news source while browsing the user-specified page. It extracts the past related pages from a user-specified news articles on the web. The related pages outline the progress of user-specified news articles. We call the related pages 'contextual pages'.
   Using the T-CNB, a user only needs to specify one news article on the web. The user then automatically receives past related news articles, which provide a wider understanding of the topic. The T-CNB automatically generates and presents contextualized news articles.
Keywords: contextualized news articles, topic graph, web browser
Similarity spreading: a unified framework for similarity calculation of interrelated objects BIBAKFull-Text 460-461
  Gui-Rong Xue; Hua-Jun Zeng; Zheng Chen; Wei-Ying Ma; Yong Yu
In many Web search applications, similarities between objects of one type (say, queries) can be affected by the similarities between their interrelated objects of another type (say, Web pages), and vice versa. We propose a novel framework called similarity spreading to take account of the interrelationship and improve the similarity calculation. Experiment results show that the proposed framework can significantly improve the accuracy of the similarity measurement of the objects in a search engine.
Keywords: interrelated, mutual reinforcement, similarity spreading
SLA based profit optimization in web systems BIBAKFull-Text 462-463
  Li Zhang; Danilo Ardagna
With the rapid growth of eBusiness, the Web services are becoming a commodity. To reduce the management cost for the IT infrastructure, companies often outsource their IT services to third party service providers. Large service centers have been setup to provide services to many customers by sharing the IT resources. This leads to the efficient use of resources and a reduction of the operating cost. The service provider and their customers often negotiate utility based Service Level Agreements (SLAs) to determine the cost and penalty based on the achieved performance level. The system is based on a centralized controller which can control the request volumes at various servers and the scheduling policy at each server. The controller can also decide to turn ON or OFF servers depending on the system load. This paper designs a resource allocation scheduler for such web environments so as to maximize the profits associated with multiple class SLAs.
Keywords: SLA optimization, load balancing, quality of service, resource allocation, utility function
OpenMVC: a non-proprietary component-based framework for web applications BIBAKFull-Text 464-465
  Ronan Barrett; Sarah Jane Delany
The lack of standardised approaches in the development of web-based systems is an ongoing issue for the developers of commercial software. To address this issue we proposes a hybrid development framework for web-based solutions that combines much of the best attributes of existing frameworks but utilises open, standardised W3C technologies where possible. This framework called openMVC is an evolution of the Model-View-Controller (MVC) pattern. An implementation of openMVC has been built over a 5-tier architecture using Java and .NET.
Keywords: MVC, W3C, XML, XML schema, XSLT, frameworks, patterns, web services
Structuring and presenting annotated media repositories BIBAKFull-Text 466-467
  Lloyd Rutledge; Jacco van Ossenbruggen; Lynda Hardman
We generate hypermedia presentations from annotated media repositories using simple document structure as an intermediate phase. This poster applies Web style technologies to this process. Results include style specification for accessing semantically annotated media repositories, for determining document structure from semantic structure and for applying this document structure to the final presentation.
Keywords: RDF, XHTML+SMIL, XSLT, document structure, semantics, style
Distributed location aware web crawling BIBAKFull-Text 468-469
  Odysseas Papapetrou; George Samaras
Distributed crawling has shown that it can overcome important limitations of the today's crawling paradigm. However, the optimal benefits of this approach are usually limited to the sites hosting the crawler. In this work, we propose a location-aware method, called IPMicra, that utilizes an IP address hierarchy, and allows crawling of links in a near optimal location aware manner.
Keywords: distributed web crawling, location aware web crawling
BizCQ: using continual queries to cope with changes in business information exchange BIBAKFull-Text 470-471
  Wei Tang; Kipp Jones; Ling Liu; Calton Pu
In this poster, we propose the framework of BizCQ, a system to apply Continual Queries [7][8] on Web-based content to manage information exchanges between two business partners. In this poster, we describe ways to leverage previous research in Web monitoring techniques applied to the everyday problem of managing change within a business environment, and focus on the difficulties of managing changes that are caused by external parties in business-to-business (B2B) information exchanges.
Keywords: B2B, business-to-business, change response, continual query, information quality, semantic web
Towards a flash search engine based on expressive semantics BIBAKFull-Text 472-473
  Dawei Ding; Jun Yang; Qing Li; Liping Wang; Liu Wenyin
Flash, as a multimedia format, becomes more and more popular on the Web. However, previous works on Flash are totally based on low-level features, which make it unpractical to build a content-based Flash search engine. To address this problem, our paper proposes expressive semantics for bridging the gap between low-level features and user queries. To smoothly incorporate expressive semantics into a search engine, an eigenvector-based model is devised to map a user query to expressive semantics with the aid of link analysis method. Our experiment results confirm that expressive semantics is a promising approach to understanding and hence searching Flash movies more efficiently.
Keywords: classification, eigenvector, expressive semantics, flash retrieval, search engine, web application
VersaTutor: architecture for a constraint-based intelligent tutor generator BIBAKFull-Text 474-475
  Viswanathan Kodaganallur; Rob R. Weitz; David Rosenthal
Intelligent tutoring systems have demonstrated their utility in a variety of domains. However, they are notoriously resource intensive to build. We report here on the development of a software tool that enables non-software developers to declaratively create intelligent tutors. This intelligent tutor generator creates applications with rich user interaction and powerful theory-based remediation capabilities. It utilizes the Constraint Based Tutoring paradigm and is generic enough to create tutors in several domains. It is easily extensible through plug-ins.
Keywords: constraint based tutors, distance learning, instructional technology, intelligent tutoring, model tracing tutors
Efficient web change monitoring with page digest BIBAKFull-Text 476-477
  David Buttler; Daniel Rocco; Ling Liu
The Internet and the World Wide Web have enabled a publishing explosion of useful online information, which has produced the unfortunate side effect of information overload: it is increasingly difficult for individuals to keep abreast of fresh information. In this paper we describe an approach for building a system for efficiently monitoring changes to Web documents. This paper has three main contributions. First, we present a coherent framework that captures different characteristics of Web documents. The system uses the Page Digest encoding to provide a comprehensive monitoring system for content, structure, and other interesting properties of Web documents. Second, the Page Digest encoding enables improved performance for individual page monitors through mechanisms such as short-circuit evaluation, linear time algorithms for document and structure similarity, and data size reduction. Finally, we develop a collection of sentinel grouping techniques based on the Page Digest encoding to reduce redundant processing in large-scale monitoring systems by grouping similar monitoring requests together. We examine how effective these techniques are over a wide range of parameters and have seen an order of magnitude speed up over existing Web-based information monitoring systems.
Keywords: document storage, scalability, web document monitoring
Experiments with Persian text compression for web BIBAKFull-Text 478-479
  Farhad Oroumchian; Ehsan Darrudi; Fattane Taghiyareh; Neeyaz Angoshtari
The increasing importance of Unicode for text encoding implies a possible doubling of data storage space and data transmission time, with a corresponding need for data compression. The approach presented in this paper aims to reduce the storage and the transmission time for Persian text files in web-based applications and Internet. The basic idea here is to compute the most repetitive n-grams in the Persian text and replace them by a single character in the user-defined sections of the Unicode. The compression will be done on the server side once and the decompression process is eliminated completely. The rendering process in the browser will do the decompression. There is no need for any additional program or add-ins for decompression to be installed on the browser or client side. The user needs only to download the proper Unicode font once. A genetic algorithm is utilized to select the most appropriate n-grams. In the best case, we have achieved 52.26% reduction of the file size. The method is general, and applies equally well to English and other languages.
Keywords: Farsi, genetic algorithm, n-gram compression, unicode
Pride: peer-to-peer reputation infrastructure for decentralized environments BIBAKFull-Text 480-481
  Prashant Dewan; Partha Dasgupta
Peer-to-peer (P2P) networks use the fundamental assumption that the nodes in the network will cooperate and will not cheat. In the absence of any common goals shared by the nodes of a peer-to-peer network, external motivation to cooperate and be trustworthy is mandated. Digital Reputations can be used to inject trust among the nodes of a network. This paper presents PRIDE, a reputation system for decentralized peer-to-peer networks. PRIDE uses self-certification a scheme for identification of peers using digital certificates similar to SDSI certificates, an elicitation-storage protocol for exchange of recommendations and IP Based Safeguard (IBS) to mitigate a peer's vulnerability to 'liar farms.
Keywords: peer-to-peer, reputation systems, security
Answering similarity queries in peer-to-peer networks BIBKFull-Text 482-483
  Panos Kalnis; Wee Siong Ng; Beng Chin Ooi; Kian-Lee Tan
Keywords: image, peer-to-peer, similarity
Efficient PageRank approximation via graph aggregation BIBAKFull-Text 484-485
  Andrei Z. Broder; Ronny Lempel; Farzin Maghoul; Jan Pedersen
We present a framework for approximating random-walk based probability distributions over Web pages using graph aggregation. We (1) partition the Web's graph into classes of quasi-equivalent vertices, (2) project the page-based random walk to be approximated onto those classes, and (3) compute the stationary probability distribution of the resulting class-based random walk. From this distribution we can quickly reconstruct a distribution on pages. In particular, our framework can approximate the well-known PageRank distribution by setting the classes according to the set of pages on each Web host. We experimented on a Web-graph containing over 1.4 billion pages, and were able to produce a ranking that has Spearman rank-order correlation of 0.95 with respect to PageRank. A simplistic implementation of our method required less than half the running time of a highly optimized implementation of PageRank, implying that larger speedup factors are probably possible.
Keywords: link analysis, search engines, web information retrieval
Outlink estimation for PageRank computation under missing data BIBAFull-Text 486-487
  Sreangsu Acharyya; Joydeep Ghosh
The enormity and rapid growth of the web-graph forces quantities such as its pageRank to be computed under missing information consisting of outlinks of pages that have not yet been crawled. This paper examines the role played by the size and distribution of this missing data in determining the accuracy of the computed PageRank, focusing on questions such as (i) the accuracy of pageranks under missing information, (ii) the size at which a crawl process may be aborted while still ensuring reasonable accuracy of pageranks, and (iii) algorithms to estimate pageranks under such missing information. The first couple of questions are addressed on the basis of certain simple bounds relating the expected distance between the true and computed pageranks and the size of the missing data. The third question is explored by devising algorithms to predict the pageranks when full information is not available. A key feature of the "dangling link estimation" and "clustered link estimation" algorithms proposed is that, they do not need to run the pagerank iteration afresh once the outlinks have been estimated.
Converting UML to OWL ontologies BIBAKFull-Text 488-489
  Dragan Gasevic; Dragan Djuric; Vladan Devedzic; Violeta Damjanovi
This paper presents automatic generation of the Web Ontology Language (OWL) from an UML model. The solution is based on an MDA-defined architecture for ontology development and the Ontology UML Profile (OUP). A conversion, that we present here, transforms an ontology from its OUP definition (i.e. XML Metadata Interchange -- XMI) into OWL description. Accordingly, we illustrate how an OUP-developed ontology can be shared with ontological engineering tools (i.e. Protégé).
Keywords: OWL, UML profiles, XSLT, ontology
DANTE: annotation and transformation of web pages for visually impaired users BIBAKFull-Text 490-491
  Yeliz Yesilada; Simon Harper; Carole Goble; Robert Stevens
Most Web pages are designed for visual interaction so the mobility, or ease of travel, of visually impaired Web travellers is reduced [2]. Objects that support travel and mobility are not in an appropriate form for nonvisual interaction. Our goal is to enhance the mobility of visually impaired Web travellers by annotating pages with a travel ontology that aims to encapsulate rich structural and navigational knowledge. We propose a semi-automated tool 'Dante' which aims to analyse Web pages to extract travel objects, discover their roles, annotate them with a travel ontology and transform pages based on the annotations to enhance the provided mobility support. This poster introduces the travel ontology and presents how Web pages are annotated with this ontology to guide the transformations.
Keywords: mobility, semantic annotation, tool, travel, visual impairment
An agent system reasoning about the web and the user BIBAKFull-Text 492-493
  Giovambattista Ianni; Francesco Ricca; Francesco Calimeri; Vincenzino Lio; Stefania Galizia
The paper describes some innovations related to the ongoing work on the GSA prototype, an integrated information retrieval agent. In order to improve the original system effectiveness, we propose the GSA2 system, introducing a new internal architecture based on a message-passing framework and on an ontology description formalism (WOLF, Web ontology Framework). GSA2 is conceived in order to describe and easily perform reasoning on "facts about the web and the user". The most innovative aspect of the project is its customizable and flexible reasoning system, based on Answer Set Programming it plays the role of the central decision making module, and allows the Agent to take proactive decisions. The introduction of a logic language allows one to describe, program and plan behaviors of the Agent easily and quickly, and to experiment with a large variety of Information Retrieval strategies. Both the System Architecture and WOLF are general and reusable, and the result constitutes a good example of real implementation of agents based on logics.
Keywords: agents, answer set programming, information retrieval, logic programming
Associative sources and agents for zero-input publishing BIBAKFull-Text 494-495
  David Wolber; Christopher H. Brooks
This paper presents an associative agent that allows seamless navigation from one's own personal space to third-party associative sources, as well as the personal spaces of other users. The agent provides users with access to a dynamically growing list of information sources, all of which follow a common associative sources API that we have defined. The agent also allows users act as sources themselves and take part in peer-to peer knowledge sharing.
Keywords: agents, aggregation, associativity, context, polymorphism, reconnaissance, web services
Surfing the web by site BIBAKFull-Text 496-497
  David Gibson
We provide a system for surfing the web at a high level of abstraction, which is an analogy of the web browser, but which displays entire sites at a time. It allows a principled investigation of what is present, based on an overview of all available information. We show a site's relation to other sites, the broad nature of the information contained and how it is structured, and how it has changed over time. Our current system maintains a continuously updated archive of 40 million sites representing 1.9 billion web pages, and enables real-time navigation through the sea of web sites.
Keywords: large scale systems, novel browsing paradigms, web navigation strategies
Compositional knowledge management for medical services on semantic web BIBAKFull-Text 498-499
  Yugyung Lee; Chintan Patel; Soon Ae Chun; James Geller
The vision of the Semantic Web is to reduce manual discovery and usage of Web resources (documents and services) and to allow software agents to automatically identify these Web resources, integrate them and execute them for achieving the intended goals of the user. Such a composed Web service may be represented as a workflow, called service flow. Current Web service standards are not sufficient for automatic composition. This paper presents different types of compositional knowledge required for Web service discovery and composition. As a proof of concept, we have implemented our framework in a cardiovascular domain which requires advanced service discovery and composition across heterogeneous platforms of multiple organizations.
Keywords: pragmatic knowledge, service composition
OntoMiner: bootstrapping ontologies from overlapping domain specific web sites BIBAKFull-Text 500-501
  Hasan Davulcu; Srinivas Vadrevu; Saravanakumar Nagarajan
In this paper, we present automated techniques for bootstrapping and populating specialized domain ontologies by organizing and mining a set of relevant overlapping Web sites provided by the user. We develop algorithms that detect and utilize HTML regularities in the Web documents to turn them into hierarchical semantic structures encoded as XML. Next, we present tree-mining algorithms that identify key domain concepts and their taxonomical relationships. We also extract semi-structured concept instances annotated with their labels whenever they are available. Experimental evaluation for the News, Travel, and Shopping domains indicates that our algorithms can bootstrap and populate domain specific ontologies with high precision and recall.
Keywords: data mining, ontology, semantic web, web mining
Lessons from a Gnutella-web gateway BIBAKFull-Text 502-503
  Brian D. Davison; Wei Zhang; Baoning Wu
We present a gateway between the WWW and the Gnutella peer-to-peer network that permits searchers on one side to be able to search and retrieve files on the other side of the gateway. This work improves the accessibility of files across different delivery platforms, making it possible to use a single search modality. We outline our design and implementation, present access statistics from a test deployment and discuss lessons learned.
Keywords: Gnutella, World Wide Web, peer-to-peer, search engine
Hybrid multicasting in large-scale service networks BIBAKFull-Text 504-505
  Jingwen Jin; Klara Nahrstedt
The importance of service composition has been widely recognized in the Internet research community due to its high flexibility in allowing development of customized applications. So far little attention has been paid to composite services' runtime performance-related aspects, which are of great importance to wide-area applications. Service composition in the wide area actually creates a new type of routing problem which we call QoS service routing. We study this problem in large networks (e.g., the Web) and provide distributed and scalable routing solutions with various optimization goals. Most importantly, we propose ways to reduce redundancies of data delivery and service execution through explorations of different types of multicast (service multicast and data multicast) in one-to-many application scenarios.
Keywords: QoS, multicast, service composition