| Need for non-visual feedback with long response times in mobile HCI | | BIBAK | Full-Text | 775-781 | |
| Virpi Roto; Antti Oulasvirta | |||
| When browsing Web pages with a mobile device, the system response times are
variable and much longer than on a PC. Users must repeatedly glance at the
display to see when the page finally arrives, although mobility demands a
Minimal Attention User Interface. We conducted a user study with 27
participants to discover the point at which visual feedback stops reaching the
user in mobile context. In the study, we examined the deployment of attention
during page loading to the phone vs. the environment in several different
everyday mobility contexts, and compared these to the laboratory context. The
first part of the page appeared on the screen typically in 11 seconds, but we
found that the user's visual attention shifted away from the mobile browser
usually between 4 and 8 seconds in the mobile context. In contrast, the
continuous span of attention to the browser was more than 14 seconds in the
laboratory condition. Based on our study results, we recommend mobile
applications provide multimodal feedback for delays of more than four seconds. Keywords: attention, mobile web, mobility, multimodal feedback, usability | |||
| An environment for collaborative content acquisition and editing by coordinated ubiquitous devices | | BIBAK | Full-Text | 782-791 | |
| Yutaka Kidawara; Tomoyuki Uchiyama; Katsumi Tanaka | |||
| Digital content is not only stored by servers on the Internet, but also on
various embedded devices belonging to ubiquitous networks. In this paper, we
propose a content processing mechanism for use in an environment enabling
collaborative acquisition of embedded digital content in real-world situations.
We have developed a network management device that makes it possible to acquire
embedded content using coordinated ubiquitous devices. The management device
actively configures a network that includes content-providing devices and
browsing devices to permit sharing of various items with digital content. We
also developed a Functional web mechanism for processing embedded web content
in the real-world without a keyboard. This mechanism adds various functions to
conventional web content. These functions are activated by messages from a
Field in a content processing device. We construct a practical prototype
system, which is simple enough for children to use, that we called the "Virtual
Insect Catcher". Through a test with 48 children, we demonstrated that this
system can be used to acquire embedded web content, retrieve related content
from the Internet, and then create new web content. We will also describe the
proposed mechanism and the system testing. Keywords: RFID, embedded content, functional web, multiple device operating,
ubiquitous computing, ubiquitous network | |||
| Can semantic web be made to flourish? | | BIBA | Full-Text | 792 | |
| David Wood; Zavisa Bjelogrlic; Bernadette Hyland; Jim Hendler; Kanzaki Masahide | |||
| This panel's objective will be to discuss whether the Semantic Web can be made to grow in a "viral" manner, like the World Wide Web did in the early 1990s. The scope of the discussion will include efforts by the World Wide Web Consortium's Semantic Web Best Practices & Deployment Working Group to identify and publish best practices of Semantic Web practitioners, and the barriers to adoption of those practices by a wider community. The concept of "best practices" as it applies to a distributed, diverse and partially-defined Semantic Web will be discussed and its relevance debated. Specifically, panelists will discuss the capability of standards bodies, commercial companies and early adopters to create a viral technology. | |||
| Current trends in the integration of searching and browsing | | BIBA | Full-Text | 793 | |
| Andrei Z. Broder; Yoelle S. Maarek; Krishna Bharat; Susan Dumais; Steve Papa; Jan Pedersen; Prabhakar Raghavan | |||
| Searching and browsing are the two basic information discovery paradigms, since the early days of the Web. After more than ten years down the road, three schools seem to have emerged: (1) The search-centric school argues that guided navigation is superfluous since free form search has become so good and the search UI so common, that users can satisfy all their needs via simple queries (2) The taxonomy navigation school claims that users have difficulties expressing informational needs and (3) The meta-data centric school advocates the use of meta-data for narrowing large sets of results, and is successful in e-commerce where it is known as "multi faceted search". This panel brings together experts and advocates for all three schools, who will discuss these approaches and share their experiences in the field. We will ask the audience to challenge our experts with real information architecture problems. | |||
| Do we need more web performance research? | | BIBA | Full-Text | 794 | |
| Michael Rabinovich; Giovanni Pacifici; Michele Colajanni; Krithi Ramamritham; Bruce Maggs | |||
| This panel will discuss the future and purpose of Web performance research, concentrating on the reasons for modest success in the adoption of research results in practice. The panel will in particular examine factors that hinder technology transfer in the Web performance area, consider examples of past successes and failures in this arena, and stimulate the discussion on how to make Web performance research more relevant. | |||
| Mobile multimedia services | | BIBA | Full-Text | 795 | |
| Behzad Shahraray; Wei-Ying Ma; Avideh Zakhor; Noboru Babaguchi | |||
| This panel will mainly focus on the role that media processing can play in creating mobile communications, information, and entertainment services. A major premise of our discussion is that media processing techniques go beyond compression and can be employed to monitor, filter, convert, and repurpose information. Such automated techniques can serve to create personalized information and entertainment services in a cost-effective way, adapt existing content for consumption on mobile devices, and circumvent the inherent limitations of mobile devices. Some examples of the applications of media processing techniques for mobile service generation will be given. | |||
| On culture in a world-wide information society: toward the knowledge society -- the challenge | | BIBA | Full-Text | 796 | |
| Alfredo M. Ronchi; Lynn Thiesmeyer; Antonella Quacchia; Georges Mihajes; Katsuhiro Onoda; Ranjit Makkuni | |||
| Starting from more then ten years of experience and achievements in online cultural content, the panel aims to provide a comprehensive view on controversial issues, or unsolved problems, both in the WWW and Cultural community to stimulate lively, thoughtful, and sometimes provocative discussions. Panelists will outline the relevance of digital collections of intangible heritage and endangered archives and discuss the following topics: the "global" Web vs. the preservation of "local" cultural identities, cultural diversities and their relevance in delivering web based services, preservation & future of digital memories, Web-based development and sustainability models. We expect the panelists to actively engage the audience and help them broaden their understanding of the issues. URL: http://www.medicif.org/Events/MEDICI_events/WWW2005/default.htm. | |||
| Exploiting the dynamic networking effects of the web | | BIBA | Full-Text | 797 | |
| Ramesh Sarukkai; Soumen Chakrabarthi; Gary William Flake; Narayanan Shivakumar; Asim M. Ansari | |||
| This panel aims to explore the dynamic networking effects of the Web. Today, linkages on the Web are augmented with dynamic connectivities based on various monetization strategies: e.g. ads and sponsored links. Such linkages change the dynamics of user click/flow on the Web. The key focus of this panel is to debate whether/how such dynamic effects on the Web can be modeled and best exploited. How can we derive cooperative placement strategies that are optimal from a customer perspective? As the World Wide Web becomes more dynamic with fluid link placements guided by different factors, optimizing link placement in a cooperative fashion across the Web will be an integral and crucial component. URL: http://research.yahoo.com/workshops/www2005/NetworkingEffectsWeb/. | |||
| Querying the past, present and future: where we are and where we will be | | BIBA | Full-Text | 798 | |
| Ling Liu; Andrei Z. Broder; Dieter Fensel; Carole Goble; Calton Pu | |||
| This panel will focus on exploring future enhancements of Web technology for active Internet-scale information delivery and dissemination. It will ask the questions of whether the current Web technology is sufficient, what can be leveraged in this endeavor, and how a combination of ideas from a variety of existing disciplines can help in meeting the new challenges of large scale information dissemination. Relevant existing technologies and research areas include: active databases, agent systems, continual queries, event Web, publish/subscribe technology, sensor and stream data management. We expect that some suggestions may be in conflict with current, well-accepted approaches. | |||
| Web engineering: technical discipline or social process? | | BIBA | Full-Text | 799 | |
| Bebo White; David Lowe; Martin Gaedke; Daniel Schwabe; Yogesh Deshpande | |||
| This panel aims to explore the nature of the emerging Web engineering discipline. It will attempt to strongly engage with the issue of whether Web Engineering is currently, and (more saliently) should be in the future, viewed primarily as a technical design discipline with its attention firmly on the way in which Web technologies can be leveraged in the design process, or whether it should be viewed primarily as a socio-positioned discipline which focuses on the nature of the way in which projects are managed, needs are understood and users interact. | |||
| Web services considered harmful? | | BIBA | Full-Text | 800 | |
| Rohit Khare; Jeff Barr; Mark Baker; Adam Bosworth; Tim Bray; Jeffery McManus | |||
| It has been estimated that all of the Web Services specifications and proposals ("WS-*") weigh in at several thousand pages by now. At the same time, their predecessor technologies such as XML-RPC have developed alongside other "grassroots" technologies like RSS. This debate has arguably even risen to the architectural level, contrasting "service-oriented architectures" with REST-based architectural styles. Unfortunately, the multiple overlapping specifications, standards bodies, and vendor strategies tend to obscure the very real successes of providing machine-automatable services over the Web today. This panel asks: Are current community processes for developing, debating, and adopting Web Services are helping or hindering the adoption of Web Services technology? URL: http://labs.commerce.net/wiki/images/1/19/CN-TR-04-05.pdf. | |||
| A personalized search engine based on web-snippet hierarchical clustering | | BIBAK | Full-Text | 801-810 | |
| Paolo Ferragina; Antonio Gulli | |||
| In this paper we propose a hierarchical clustering engine, called snaket,
that is able to organize on-the-fly the search results drawn from 16 commodity
search engines into a hierarchy of labeled folders. The hierarchy offers a
complementary view to the flat-ranked list of results returned by current
search engines. Users can navigate through the hierarchy driven by their search
needs. This is especially useful for informative, polysemous and poor queries.
SnakeT is the first complete and open-source system in the literature that offers both hierarchical clustering and folder labeling with variable-length sentences. We extensively test SnakeT against all available web-snippet clustering engines, and show that it achieves efficiency and efficacy performance close to the best known engine Vivisimo.com. Recently, personalized search engines have been introduced with the aim of improving search results by focusing on the users, rather than on their submitted queries. We show how to plug SnakeT on top of any (un-personalized) search engine in order to obtain a form of personalization that is fully adaptive, privacy preserving, scalable, and non intrusive for underlying search engines. Keywords: information extraction, new search applications and interfaces, personalized
web ranking, search engines, web snippets clustering | |||
| Ranking definitions with supervised learning methods | | BIBAK | Full-Text | 811-819 | |
| Jun Xu; Yunbo Cao; Hang Li; Min Zhao | |||
| This paper is concerned with the problem of definition search. Specifically,
given a term, we are to retrieve definitional excerpts of the term and rank the
extracted excerpts according to their likelihood of being good definitions.
This is in contrast to the traditional approaches of either generating a single
combined definition or simply outputting all retrieved definitions. Definition
ranking is essential for the task. Methods for performing definition ranking
are proposed in this paper, which formalize the problem as either
classification or ordinal regression. A specification for judging the goodness
of a definition is given. We employ SVM as the classification model and Ranking
SVM as the ordinal regression model respectively, such that they rank
definition candidates according to their likelihood of being good definitions.
Features for constructing the SVM and Ranking SVM models are defined. An
enterprise search system based on this method has been developed and has been
put into practical use. Experimental results indicate that the use of SVM and
Ranking SVM can significantly outperform the baseline methods of using
heuristic rules or employing the conventional information retrieval method of
Okapi. This is true both when the answers are paragraphs and when they are
sentences. Experimental results also show that SVM or Ranking SVM models
trained in one domain can be adapted to another domain, indicating that generic
models for definition ranking can be constructed. Keywords: classification, ordinal regression, search of definitions, text mining, web
mining, web search | |||
| Identifying link farm spam pages | | BIBAK | Full-Text | 820-829 | |
| Baoning Wu; Brian D. Davison | |||
| With the increasing importance of search in guiding today's web traffic,
more and more effort has been spent to create search engine spam. Since link
analysis is one of the most important factors in current commercial search
engines' ranking systems, new kinds of spam aiming at links have appeared.
Building link farms is one technique that can deteriorate link-based ranking
algorithms. In this paper, we present algorithms for detecting these link farms
automatically by first generating a seed set based on the common link set
between incoming and outgoing links of Web pages and then expanding it. Links
between identified pages are re-weighted, providing a modified web graph to use
in ranking page importance. Experimental results show that we can identify most
link farm spam pages and the final ranking results are improved for almost all
tested queries. Keywords: HITS, PageRank, link analysis, spam, web search engine | |||
| The volume and evolution of web page templates | | BIBAK | Full-Text | 830-839 | |
| David Gibson; Kunal Punera; Andrew Tomkins | |||
| Web pages contain a combination of unique content and template material,
which is present across multiple pages and used primarily for formatting,
navigation, and branding. We study the nature, evolution, and prevalence of
these templates on the web. As part of this work, we develop new randomized
algorithms for template extraction that perform approximately twenty times
faster than existing approaches with similar quality. Our results show that
40-50% of the content on the web is template content. Over the last eight
years, the fraction of template content has doubled, and the growth shows no
sign of abating. Text, links, and total HTML bytes within templates are all
growing as a fraction of total content at a rate of between 6 and 8% per year.
We discuss the deleterious implications of this growth for information
retrieval and ranking, classification, and link analysis. Keywords: algorithms, boilerplate, data cleaning, data mining, templates, web mining | |||
| The infocious web search engine: improving web searching through linguistic analysis | | BIBAK | Full-Text | 840-849 | |
| Alexandros Ntoulas; Gerald Chao; Junghoo Cho | |||
| In this paper we present the Infocious Web search engine [23]. Our goal in
creating Infocious is to improve the way people find information on the Web by
resolving ambiguities present in natural language text. This is achieved by
performing linguistic analysis on the content of the Web pages we index, which
is a departure from existing Web search engines that return results mainly
based on keyword matching. This additional step of linguistic processing gives
Infocious two main advantages. First, Infocious gains a deeper understanding of
the content of Web pages so it can better match users' queries with indexed
documents and therefore can improve relevancy of the returned results. Second,
based on its linguistic processing, Infocious can organize and present the
results to the user in more intuitive ways. In this paper we present the
linguistic processing technologies that we incorporated in Infocious and how
they are applied in helping users find information on the Web more efficiently.
We discuss the various components in the architecture of Infocious and how each
of them benefits from the added linguistic processing. Finally, we
experimentally evaluate the performance of a component which leverages
linguistic information in order to categorize Web pages. Keywords: concept extraction, crawling, indexing, information retrieval, language
analysis, linguistic analysis of web text, natural language processing,
part-of-speech tagging, phrase identification, web search engine, web
searching, word sense disambiguation | |||
| How to make web sites talk together: web service solution | | BIBAK | Full-Text | 850-855 | |
| Hoang Pham Huy; Takahiro Kawamura; Tetsuo Hasegawa | |||
| Integrating web sites to provide more efficient services is a very promising
way in the Internet. For example searching house for rent based on train system
or preparing a holiday with several constrains such as hotel, air ticket,
etc... From resource view point, current web sites in the Internet already
provide quite enough information. However, the challenge is these web sites
just provide information but do not support any mechanism to exchange them. As
a consequence, it is very often that a human user has to take the role to
"link" several web sites by browsing each one and get the concrete information.
The reason comes from a historical objective. Web sites were developed for
human users browsing and so, they do not support any machine-understandable
mechanism.
Current researches in WWW environment already propose several solutions to make newly web sites become understandable to other web sites so that they can be integrated. However, the question is how to integrate existing web sites to these new one. Evidently, redeveloping all of them is an unacceptable solution. In this paper, we propose a solution of Web Service Gateway to "wrap" existing web sites in Web services. Thus, without any efforts to duplicate the Web sites code, these services inherit all features from the sites while can be enriched with other Web service features like UDDI publishing, semantic describing, etc. This proposal was developed in Toshiba with Web Service Gateway and Wrapper Generator System. By using these systems, several integrated-applications were built and they are also presented and evaluated in this paper. Keywords: WSDL, service development, web service, web site, wrapper | |||
| Diversified SCM standard for the Japanese retail industry | | BIBAK | Full-Text | 856-863 | |
| Koichi Hayashi; Naoki Koguro; Reki Murakami | |||
| In this paper, we present the concept of a diversified SCM (supply chain
management) standard and distributed hub architecture which were used in B2B
experiments for the Japanese retail industry. The conventional concept of B2B
standards develops a single ideal set of business transactions to be supported.
In contrast, our concept allows a wide range of diverse business transaction
patterns necessary for industry supply chains. An industry develops a standard
SCM model that partitions the whole supply chain into several transaction
segments, each of which provides alternative business transaction patterns. For
B2B collaboration, companies must agree on a collaboration configuration, which
chooses the transaction alternatives from each segment. To support the
development of a B2B system that executes an agreed collaboration, we introduce
an SOA (service oriented architecture) based pattern called a distributed hub
architecture. As a hub of B2B collaboration, it includes a complete set of
services that can process every possible business transaction included in a
standard SCM model. However, it does not function as a centralized service that
coordinates participants. Instead, it is deployed on every participant and
executes the assigned part of the supply chain collaboratively with other
distributed hubs. Based on this concept, we analyzed actual business
transactions in the Japanese retail industry and developed a standard SCM
model, which represents more than a thousand possible transaction patterns.
Based on the model, we developed an experimental system for the Japanese retail
industry. The demonstration experiment involved major players in the industry
including one of the largest general merchandise stores, one of the largest
wholesalers, and major manufacturers in Japan. Keywords: B2B collaboration, SOA (service oriented architecture), business process
management, ebXML, retail industry, standardization, supply chain management,
web services | |||
| Crawling a country: better strategies than breadth-first for web page ordering | | BIBAK | Full-Text | 864-872 | |
| Ricardo Baeza-Yates; Carlos Castillo; Mauricio Marin; Andrea Rodriguez | |||
| This article compares several page ordering strategies for Web crawling
under several metrics. The objective of these strategies is to download the
most "important" pages "early" during the crawl. As the coverage of modern
search engines is small compared to the size of the Web, and it is impossible
to index all of the Web for both theoretical and practical reasons, it is
relevant to index at least the most important pages.
We use data from actual Web pages to build Web graphs and execute a crawler simulator on those graphs. As the Web is very dynamic, crawling simulation is the only way to ensure that all the strategies considered are compared under the same conditions. We propose several page ordering strategies that are more efficient than breadth- first search and strategies based on partial Pagerank calculations. Keywords: scheduling policy, web crawler, web page importance | |||
| Internet search engines: past and future | | BIBA | Full-Text | 873 | |
| Jan O. Pedersen | |||
| I will review the short history of Internet Search Engines from early first generation systems to the current crop of stock market darlings. Many of the underlying technology problems remain the same, but the business has become significantly more sophisticated and high-powered. I will touch on some of the economics driving the remarkable success of these services and make some predictions about future trends. | |||
| News in the age of the web | | BIBA | Full-Text | 874 | |
| Krishna Bharat | |||
| One of the most exciting and successful examples of the Web impacting society is online news. The history of the news industry from print to the online medium is an interesting journey. Broadcast news transformed society by making news available instantly rather than once a day. While more channels became available, barriers to entry remained high and mainstream opinions continued to dominate. News on the net has brought in a number of valuable transformations, allowing news to be made (potentially) more accessible, diverse, democratic, personalized and interactive than before. Blogging has now made "citizen reporting" possible. As with any disruptive technology online news has both positive and negative implications, such as the threat of disinformation. Computer assisted news is a fun area of research that draws upon prior work in information retrieval, data mining and user interfaces. Given the volume of online news being generated today, the ability to find news and related facts quickly and with high relevance affects both readers and journalists. The talk will address the social implications as well as the technical challenges in the dissemination of online news, with a focus on Google News. Google News is an automated service that makes over 4,500 online, English sources searchable and browseable in real time, with an emphasis on breadth of coverage. | |||
| Technical challenges in exploiting the web as a business resource | | BIBA | Full-Text | 875 | |
| Andrew Tomkins | |||
| In this talk, I'll describe some recent indicators suggesting that businesses are on the cusp of operational exploitation of the web as a decision support resource. From consumer research and purchasing behavior to enterprise brand tracking, intelligence gathering, and advertising, the web is suddenly on everybody's mind -- not as an exciting future possibility, but as an exploitable resource. I'll describe some technological approaches to employing this resource, talk about what's possible today, and describe some challenges for the future. As a running example, I'll cover IBM's WebFountain system: its architecture, analytical model, and applications. | |||
| DoCoMo's challenge towards new mobile services | | BIBA | Full-Text | 876 | |
| Kiyoyuki Tsujimura | |||
| NTT DoCoMo, the provider of "i-mode" mobile Internet service, which accommodates over 40 million subscribers in Japan, is now working to create new types of mobile communications services featuring visual content and contactless IC technology. | |||
| Automatic text processing to enhance product search for on-line shopping | | BIBA | Full-Text | 877 | |
| Gilles Vandelle | |||
| The growing eCommerce business requires an advanced way of searching for products. Buyers today are not only using the web to accomplish transactions but also to search for and select products that fit their needs. The products are now global but the users want a site that uses their language when shopping. This talk will describe how Kelkoo built a solution used across. The multiple European languages have been addressed with a simple linguistic approach combined with machine learning technologies. In this talk we will put the emphasis on the use of machine learning to address local diversity. | |||
| How search engines shape the web | | BIBA | Full-Text | 879 | |
| Byron Dom; Krishna Bharat; Andrei Broder; Marc Najork; Jan Pedersen; Yoshinobu Tonomura | |||
| The state of the web today has been and continues to be greatly influenced by the existence of web-search engines. This panel will discuss the ways in which search engines have affected the web in the past and ways in which they may affect it in the future. Both positive and negative effects will be discussed as will potential measures to combat the latter. Besides the obvious ways in which search engines help people find content, other effects to be discussed include: the whole phenomenon of web-page spam, based on both text and link (e.g. link farms), the business of "Search Engine Optimization" (optimizing pages to rank highly in web-search results), the bided-terms business and the associated problem of click fraud, to name a few. | |||
| The anatomy of a news search engine | | BIBAK | Full-Text | 880-881 | |
| A. Gulli | |||
| Today, news browsing and searching is one of the most important Internet
activity. This paper introduces a general framework to build a News search
engine by describing Velthune, an academic News search engine available on
line. Keywords: extraction, information, news search engines, syndication | |||
| Preferential walk: towards efficient and scalable search in unstructured peer-to-peer networks | | BIBAK | Full-Text | 882-883 | |
| Hai Zhuge; Xue Chen; Xiaoping Sun | |||
| To improve search efficiency and reduce unnecessary traffic in Peer-to-Peer
(P2P) networks, this paper proposes a trust-based probabilistic search
algorithm, called preferential walk (P-Walk). Every peer ranks its neighbors
according to searching experience. The highly ranked neighbors have higher
probabilities to be queried. Simulation results show that P-Walk is not only
efficient, but also robust against malicious behaviors. Furthermore, we measure
peers' rank distribution and draw implications. Keywords: P2P, power-law, probability, search, trust | |||
| Car racing through the streets of the web: a high-speed 3D game over a fast synchronization service | | BIBAK | Full-Text | 884-885 | |
| Stefano Cacciaguerra; Stefano Ferretti; Marco Roccetti; Matteo Roffilli | |||
| The growth of the Internet brought a new age for game developers. New
exciting, highly interactive Massively Multiplayer Online Games (MMOGs) may be
now deployed on the Web, thanks to new scalable distributed solutions and
amazing 3D graphics systems plugged directly into standard browsers. Along this
line, taking advantage of a mirrored game server architecture, we developed a
3D car racing multiplayer game for use over the Web, freely inspired to
Armagetron. Game servers are kept synchronized through the use of a fast
synchronization scheme which is able to drop obsolete game events to uphold the
playability degree while preserving the game state consistency. Preliminary
results confirm that smart 3D spaces may be created over the Web where the
magic of gaming is reproduced for the pleasure of a huge number of players.
This result may be obtained only by converging highly accurate event
synchronization technologies with 3D scene graph based rendering software. Keywords: MMOG, scene graph, synchronization | |||
| A fast XPATH evaluation technique with the facility of updates | | BIBAK | Full-Text | 886-887 | |
| Ashish Virmani; Suchit Agarwal; Rahul Thathoo; Shekhar Suman; Sudip Sanyal | |||
| This paper addresses the problem of fast retrieval of data from XML
documents by providing a labeling schema that can easily handle simple as well
as complex XPATH queries and also provide for updates without the need for the
entire document being re-indexed in the RDBMS. We introduce a new labeling
schema called the "Z-Label" for efficiently processing XPATH queries involving
child and descendant axes.
The use of "Z-Label" coupled with the indexing schema provides for smooth updates in the XML document. Keywords: Dewey indexing, XML, XPath query optimization, biaxes path expression,
updates | |||
| Mapping XML instances | | BIBA | Full-Text | 888-889 | |
| Sai Anand; Erik Wilde | |||
| For XML-based applications in general and B2B applications in particular, mapping between differently structured XML documents, to enable exchange of data, is a basic problem. A generic solution to the problem is of interest and desirable both in an academic and practical sense. We present a case study of the problem that arises in an XML based project, which involves mapping of different XML schemas to each other. We describe our approach to solving the problem, its advantages and limitations. We also compare and contrast our approach with previously known approaches and commercially available software solutions. | |||
| How much is a keyword worth? | | BIBAK | Full-Text | 890-891 | |
| Ramesh R. Sarukkai | |||
| How much is a keyword worth? At the crux of every search is a query that is
composed of search keywords. Sponsors bid for placement on such keywords using
a variety of factors, the key being the relative demand for the keyword, and
its ability to drive customers to their site. In this paper, we explore the
notion of "worth of a keyword". We determine the keyword's worth by tying it to
the end criteria that needs to be maximized. As an illustrative example,
keyword searches that drive e-commerce transactions are modeled and methods for
estimating the Return On Investment/value of a keyword from the association
data is discussed. Keywords: ROI, e-commerce, optimization, search keyword valuation, sponsored keyword
recommendation, sponsored listing | |||
| Predicting outcomes of web navigation | | BIBAK | Full-Text | 892-893 | |
| Jacek Gwizdka; Ian Spence | |||
| Two exploratory studies examined the relationships among web navigation
metrics, measures of lostness, and success on web navigation tasks. The web
metrics were based on counts of visits to web pages, properties of the web
usage graph, and similarity to an optimal path. Metrics based on similarity to
an optimal path were good predictors of lostness and task success. Keywords: compactness, lostness, path similarity, stratum, web navigation | |||
| XAR-miner: efficient association rules mining for XML data | | BIBAK | Full-Text | 894-895 | |
| Sheng Zhang; Ji Zhang; Han Liu; Wei Wang | |||
| In this paper, we propose a framework, called XAR-Miner, for mining ARs from
XML documents efficiently. In XAR-Miner, raw data in the XML document are first
preprocessed to transform to either an Indexed Content Tree (IX-tree) or
Multi-relational databases (Multi-DB), depending on the size of XML document
and memory constraint of the system, for efficient data selection and AR
mining. Task-relevant concepts are generalized to produce generalized
meta-patterns, based on which the large ARs that meet the support and
confidence levels are generated. Keywords: XML data, association rule mining, meta-patterns | |||
| X-warehouse: building query pattern-driven data | | BIBAK | Full-Text | 896-897 | |
| Ji Zhang; Wei Wang; Han Liu; Sheng Zhang | |||
| In this paper, we propose an approach to materialize XML data warehouses
based on the frequent query patterns discovered from historical queries issued
by users. The schemas of integrated XML documents in the warehouse are built
using these frequent query patterns represented as Frequent Query Pattern Trees
(FreqQPTs). Using hierarchical clustering technique, FreqQPTs are clustered and
merged to produce a specified number of integrated XML documents for actual
data feeding. Maintenance issue of the data warehouse is also treated in this
paper. Keywords: XML data, data integration, data warehouse, query patterns | |||
| TotalRank: ranking without damping | | BIBAK | Full-Text | 898-899 | |
| Paolo Boldi | |||
| PageRank is defined as the stationary state of a Markov chain obtained by
perturbing the transition matrix of a web graph with a damping factor α
that spreads part of the rank. The choice of α is eminently empirical,
but most applications use α = 0.85; nonetheless, the selection of α
is critical, and some believe that link farms may use this choice
adversarially. Recent results [1] prove that the PageRank of a page is a
rational function of α, and that this function can be approximated quite
efficiently: this fact can be used to define a new form of ranking, TotalRank,
that averages PageRanks over all possible α's. We show how this rank can
be computed efficiently, and provide some preliminary experimental results on
its quality and comparisons with PageRank. Keywords: Kendall's τ, link farms, pageRank, ranking | |||
| MemoSpace: a visualization tool for web navigation | | BIBAK | Full-Text | 900-901 | |
| Jacqueline Waniek; Holger Langner; Falk Schmidsberger | |||
| A central aspect of reducing orientation problems in web navigation concerns
the design of adequate navigation aids. Visualization of users' navigation path
in form of a temporal-spatial template can function as external memory of
users' search history, thereby supporting the user to find previously visited
sites, getting an overview of the search process and moreover, provide
structure for the complex WorldWideWeb (WWW) environment. This paper presents
an application for dynamic 2 and 3 dimensional visualizations of users'
navigation paths, called MemoSpace. In an explorative study, users behavior and
subjective evaluation of a MemoSpace application was examined. Keywords: MemoSpace, navigation | |||
| The indexable web is more than 11.5 billion pages | | BIBAK | Full-Text | 902-903 | |
| A. Gulli; A. Signorini | |||
| In this short paper we estimate the size of the public indexable web at 11.5
billion pages. We also estimate the overlap and the index size of Google, MSN,
Ask/Teoma and Yahoo! Keywords: index sizes, search engines, size of the web | |||
| A language for expressing user-context preferences in the web | | BIBAK | Full-Text | 904-905 | |
| Juan Ignacio Vázquez; A Diego López de Ipińa | |||
| In this paper, we introduce WPML (WebProfiles Markup Language) for
expressing user-context preferences information in the Web. Using WPML a
service provider can negotiate and obtain user-related information to
personalise service experience without explicit manual configuration by the
user, while preserving his privacy using P3P. Keywords: HTTP, ambient intelligence, context-aware, cookies, profiles, state
management, web | |||
| Retrieving multimedia web objects based on PageRank algorithm | | BIBAK | Full-Text | 906-907 | |
| Christopher C. Yang; K. Y. Chan | |||
| Hyperlink analysis has been widely investigated to support the retrieval of
Web documents in Internet search engines. It has been proven that the hyperlink
analysis significantly improves the relevance of the search results and these
techniques have been adopted in many commercial search engines, e.g. Google.
However, hyperlink analysis is mostly utilized in the ranking mechanism of Web
pages only but not including other multimedia objects, such as images and
video. In this project, we propose a modified Multimedia PageRank algorithm to
support the searching of multimedia objects in the Web. Keywords: HITS, PageRank, content based retrieval, hyperlink analysis, multimedia
retrieval, web search engines | |||
| Automatic generation of web portals using artificial ants | | BIBAK | Full-Text | 908-909 | |
| Hanene Azzag; Gilles Venturini; Christiane Guinot | |||
| We present in this work a new model (named AntTree) based on artificial ants
for document hierarchical clustering. This model is inspired from the
self-assembly behavior of real ants. We have simulated this behavior to build a
hierarchical tree-structured partitioning of a set of documents, according to
the similarities between these documents. We have successfully compared our
results to those obtained by ascending hierarchical clustering. Keywords: artificial ants, hierarchical clustering, portals sites, web | |||
| Persistence in web based collaborations | | BIBAK | Full-Text | 910-911 | |
| N. Bryan-Kinns; P. G. T. Healey; J. Lee | |||
| We outline work on web based support for group creativity. We focus on a
study of the effect persistence of participants' musical contributions has on
their mutual engagement. Keywords: HCI, collaboration, creativity, music, user interfaces | |||
| Popular web hot spots identification and visualization | | BIBAK | Full-Text | 912-913 | |
| D. Avramouli; J. Garofalakis; D. J. Kavvadias; C. Makris; Y. Panagis; E. Sakkopoulos | |||
| This work aims a two-fold contribution: it presents a software to analyse
logfiles and visualize popular web hot spots and, additionally, presents an
algorithm to use this information in order to identify subsets of the website
that display large access patterns. Such information is extremely valuable to
the site maintainer, since it indicates points that may need content
intervention or/and site graph restructuring. Experimental validation verified
that the visualization tool, when coupled with algorithms that infer frequent
traversal patterns, is both effective in indicating popular hot spots and
efficient in doing so by using graph-based representations of popular
traversals. Keywords: access visualization, maximal forward path, usage mining | |||
| Information flow using edge stress factor | | BIBAK | Full-Text | 914-915 | |
| Franco Salvetti; Savitha Srinivasan | |||
| This paper shows how a corpus of instant messages can be employed to detect
de facto communities of practice automatically. A novel algorithm based on the
concept of Edge Stress Factor is proposed and validated. Results show that this
approach is fast and effective in studying collaborative behavior. Keywords: graph clustering, social network analysis | |||
| Adaptive filtering of advertisements on web pages | | BIBAK | Full-Text | 916-917 | |
| Babak Esfandiari; Richard Nock | |||
| We present a browser extension to dynamically learn to filter unwanted
images (such as advertisements or flashy graphics) based on minimal user
feedback. To do so, we apply the weighted majority algorithm using pieces of
the Uniform Resource Locators of such images as predictors. Experimental
results tend to confirm that the accuracy of the predictions converges quickly
to very high levels. Keywords: advertisement filtering, interface agents, weighted majority | |||
| WEBCAP: a capacity planning tool for web resource management | | BIBAK | Full-Text | 918-919 | |
| Sami Habib; Maytham Safar | |||
| A staggering number of multimedia applications are being introduced every
day. Yet, the inordinate delays encountered in retrieving multimedia documents
make it difficult to use the Web for real-time applications such as educational
broadcasting, video conferencing, and multimedia streaming. The problem of
delivering multimedia documents in time while placing the least demands on the
client, network and server resources is a challenging optimization problem. The
WEBCAP is ongoing project that explores applying capacity planning techniques
to manage or tune the Web resources (client, network, server) for optimal or
near optimal performance, subject to minimizing the retrieval cost while
satisfying the real-time constraints and available resources. The WEBCAP
project consists of four software modules: object extractor, object
representer, object scheduler, and system tuner. The four modules are connected
serially with 3 feedback-loops. In this paper, we focus on how to extract
objects from multimedia document and how to represent them as object and
operation flow graphs while maintaining precedence relations among the objects. Keywords: capacity-planning, multimedia, scheduling | |||
| Finding the search engine that works for you | | BIBAK | Full-Text | 920-921 | |
| Kin F. Li; Wei Yu; Shojiro Nishio; Yali Wang | |||
| A search engine evaluation model that considers over seventy performance and
feature parameters is presented. The design of a web-based system that allows
the user to tailor the model to his/her own preference, and to evaluate search
engines of interest, is introduced. The results presented to the user identify
the most suitable search engine that suits his/her needs. Keywords: performance evaluation, personalization, search engines | |||
| Information retrieval in P2P networks using genetic algorithm | | BIBAK | Full-Text | 922-923 | |
| Wan Yeung Wong; Tak Pang Lau; Irwin King | |||
| Hybrid Peer-to-Peer (P2P) networks based on the direct connection model have
two shortcomings which are high bandwidth consumption and poor semi-parallel
search. However, they can further be improved by the query propagation model.
In this paper, we propose a novel query routing strategy called GAroute based
on the query propagation model. By giving the current P2P network topology and
relevance level of each peer, GAroute returns a list of query routing paths
that cover as many relevant peers as possible. We model this as the Longest
Path Problem in a directed graph which is NP-complete and we obtain high
quality (0.95 in 100 peers) approximate solutions in polynomial time by using
Genetic Algorithm (GA). We describe the problem modeling and proposed GA for
finding long paths. Finally, we summarize the experimental results which
measure the scalability and quality of different searching algorithms.
According to these results, GAroute works well in some large scaled P2P
networks. Keywords: P2P, genetic algorithm, longest path problem, query routing | |||
| An investigation of cloning in web applications | | BIBAK | Full-Text | 924-925 | |
| Damith C. Rajapakse; Stan Jarzabek | |||
| Cloning (ad hoc reuse by duplication of design or code) speeds up
development, but also hinders future maintenance. Cloning also hints at reuse
opportunities that, if exploited systematically, might have positive impact on
development and maintenance productivity. Unstable requirements and tight
schedules pose unique challenges for Web Application engineering that encourage
cloning. We are conducting a systematic study of cloning in Web Applications of
different sizes, developed using a range of Web technologies, and serving
diverse purposes. Our initial results show cloning rates up to 63% in both
newly developed and already maintained Web Applications. Expected contribution
of this work is two-fold: (1) to confirm potential benefits of reuse-based
methods in addressing clone related problems of Web engineering, and (2) to
create a framework of metrics and presentation views to be used in other
similar studies. Keywords: clone analysis, clone metrics, clones, software maintenance, software reuse,
web applications, web engineering | |||
| A more precise model for web retrieval | | BIBAK | Full-Text | 926-927 | |
| Junli Yuan; Hung Chi; Qibin Sun | |||
| Most research works on web retrieval latency are object-level based, which
we think is insufficient and sometimes inaccurate. In this paper, we propose a
fine grained operation-level Web Retrieval Dependency Model (WRDM) to provide
more precise capture of web retrieval process. Our model reveals some new
factors in web retrieval which cannot be seen at object level but are very
important to studies in the web retrieval area. Keywords: dependency, latency, model, performance, web retrieval | |||
| Extracting semantic structure of web documents using content and visual information | | BIBAK | Full-Text | 928-929 | |
| Rupesh R. Mehta; Pabitra Mitra; Harish Karnick | |||
| This work aims to provide a page segmentation algorithm which uses both
visual and content information to extract the semantic structure of a web page.
The visual information is utilized using the VIPS algorithm and the content
information using a pre-trained Naive Bayes classifier. The output of the
algorithm is a semantic structure tree whose leaves represent segments having
unique topic. However contents of the leaf segments may possibly be physically
distributed in the web page. This structure can be useful in many web
applications like information retrieval, information extraction and automatic
web page adaptation. This algorithm is expected to outperform other existing
page segmentation algorithms since it utilizes both content and visual
information. Keywords: DOM, VIPS, naive Bayes classifier, page segmentation, topic hierarchy | |||
| A quality framework for web site quality: user satisfaction and quality assurance | | BIBAK | Full-Text | 930-931 | |
| Brian Kelly; Richard Vidgen | |||
| Web site developers need to use of standards and best practices to ensure
that Web sites are functional, accessible and interoperable. However many Web
sites fail to achieve such goals. This short paper describes how a Web site
quality assessment method (E-Qual) might be used in conjunction with a quality
assurance framework (QA Focus) to provide a rounded view of Web site quality
that takes account of end user and developer perspectives. Keywords: best practices, quality assurance, standards, web site quality | |||
| WebRogue: virtual presence in web sites | | BIBAK | Full-Text | 932-933 | |
| Alessandro Soro; Ivan Marcialis; Davide Carboni; Gavino Paddeu | |||
| WebRogue is an application for virtual presence over the Web. It provides
the Web Browser with a chat subwindow that allows users connected to the same
Web site to meet, share opinions and cooperate in a totally free, non moderated
and uncensored environment. Each time the user loads a Web page in the Web
Browser, WebRogue opens a discussion channel in a centralized server
application, that is completely decoupled from the Web server, using the URL of
the Web site as a key. Thus whenever a new page is loaded the user can see who
is connected, as if entering a physical site. Interactivity is supported by
means of two type of commands: comunication commands allow synchronous
interaction as with chat or instant messaging software; Social commands allow
cooperation: group surfing, exchange of visit-cards and wait in line. Keywords: chat, virtual presence, web, web communities | |||
| An economic model of the worldwide web | | BIBAK | Full-Text | 934-935 | |
| George Kouroupas; Elias Koutsoupias; Christos H. Papadimitriou; Martha Sideri | |||
| We believe that much novel insight into the worldwide web can be obtained
from taking into account the important fact that it is created, used, and run
by selfish optimizing agents: users, document authors, and search engines.
On-going theoretical and experimental analysis of a simple abstract model of
www creation and search based on user utilities illustrates this point: We find
that efficiency is higher when the utilities are more clustered, and that
power-law statistics of document degrees emerge very naturally in this context.
More importantly, our work sets up many more elaborate questions, related,
e.g., to www search algorithms seen as author incentives, to search engine
spam, and to search engine quality and competition. Keywords: economic model, game theory, market, power laws, price of anarchy, utility
function, web search | |||
| Adaptive page ranking with neural networks | | BIBAK | Full-Text | 936-937 | |
| Franco Scarselli; Sweah Liang Yong; Markus Hagenbuchner; Ah Chung Tsoi | |||
| Recent developments in the area of neural networks provided new models which
are capable of processing general types of graph structures. Neural networks
are well-known for their generalization capabilities. This paper explores the
idea of applying a novel neural network model to a web graph to compute an
adaptive ranking of pages. Some early experimental results indicate that the
new neural network models generalize exceptionally well when trained on a
relatively small number of pages. Keywords: adaptive page rank, graph processing, neural networks | |||
| The WT10G dataset and the evolution of the web | | BIBAK | Full-Text | 938-939 | |
| Wei-Tsen Milly Chiang; Markus Hagenbuchner; Ah Chung Tsoi | |||
| The purpose of this paper is threefold. First, we study the evolution of the
web based on data available from an earlier snapshot of the web and compare the
results with those predicted in [2]. Secondly, we establish whether the WT10G
dataset, a popular benchmark for the development and evaluation of internet
based applications is appropriate for the tasks. Finally, is there a need for a
collection of a new dataset for such purposes. The findings are that the
appropriateness of using the popular WT10G dataset in recent Internet-based
experiments is questionable and that there is a need for a new collection of
dataset for development and evaluation purposes of algorithms related to
Internet search engine developments. Keywords: rate of change, standard datasets, web evolution | |||
| A semantic-link-based infrastructure for web service discovery in P2P networks | | BIBAK | Full-Text | 940-941 | |
| Jie Liu; Hai Zhuge | |||
| An important issue arising from P2P applications is how to accurately and
efficiently retrieve the required Web services from large-scale repositories.
This paper resolves this issue by organizing services in the overlay combining
the Semantic Service Link Network and the Chord P2P network. A service request
will first be routed in the Chord according to the given service operation
names and keywords. Then, the same request will be routed in the Semantic Link
Network according to the service link type and semantic matching. Compared with
previous P2P service discovery approaches, the proposed approach has two
advantages: (1) produce more accurate and meaning results when searching for
particular services in a P2P network; and (2) enable users and peers to
discover services in a more flexible way. Keywords: peer-to-peer, semantic link, web service | |||
| Automatic generation of link collections and their visualization | | BIBAK | Full-Text | 942-943 | |
| Osamu Segawa; Jun Kawai; Kazuyuki Sakauchi | |||
| In this paper, we describe a method of generating link collections in a
user-specified category by comprehensively collecting existing link collections
and analyzing their hyperlink references. Moreover, we propose a visualization
method for a bird's-eye view of the generated link collections. Our methods are
effective in grasping intuitively the trend of significant sites and keywords
in a category. Keywords: hyperlink analysis, link collection, visualization | |||
| Predictive ranking: a novel page ranking approach by estimating the web structure | | BIBAK | Full-Text | 944-945 | |
| Haixuan Yang; Irwin King; Michael R. Lyu | |||
| PageRank (PR) is one of the most popular ways to rank web pages. However, as
the Web continues to grow in volume, it is becoming more and more difficult to
crawl all the available pages. As a result, the page ranks computed by PR are
only based on a subset of the whole Web. This produces inaccurate outcome
because of the inherent incomplete information (dangling pages) that exist in
the calculation. To overcome this incompleteness, we propose a new variant of
the PageRank algorithm called, Predictive Ranking (PreR), in which different
classes of dangling pages are analyzed individually so that the link structure
can be predicted more accurately. We detail our proposed steps. Furthermore,
experimental results show that this algorithm achieves encouraging results when
compared with previous methods. Keywords: PageRank, link analysis, predictive ranking | |||
| Webified video: media conversion from TV program to web content and their integrated viewing method | | BIBAK | Full-Text | 946-947 | |
| Hisashi Miyamori; Katsumi Tanaka | |||
| A method is proposed for viewing broadcast content that converts TV programs
into Web content and integrates the results with related information retrieved
using local and/or Internet content. Keywords: fusion of broadcast and web content, media conversion, metadata generation,
next-generation storage TV, scene search, topic segmentation | |||
| Personal TV viewing by using live chat as metadata | | BIBAK | Full-Text | 948-949 | |
| Hisashi Miyamori; Satoshi Nakamura; Katsumi Tanaka | |||
| We propose a new TV viewing method by personalizing TV programs with live
chat information on the Web. It enables a new way of viewing TV content from
different perspectives reflecting viewers' viewpoints. Keywords: digest, fusion of broadcast and web content, live chat, metadata generation,
semantic analysis, viewer, viewpoint | |||
| Accuracy enhancement of function-oriented web image classification | | BIBAK | Full-Text | 950-951 | |
| Koji Nakahira; Toshihiko Yamasaki; Kiyoharu Aizawa | |||
| We propose a function-oriented classification of web images and show new
applications using this categorization. We defined nine categories of images
taking into account of their functions used in web pages, and classified web
images by using Support Vector Machine (SVM) in tree structured way. In order
to achieve high accuracy of classification, we employed two kinds of features,
image-based features and text-based features, and the two kinds can be used
together or separately for the stages of the classification. We also utilized
DCT coefficients to classify photo images and illustrations. As a result,
accurate classification has been achieved. Finally, we show the page
summarization as a new application that is made feasible for the first time by
our new categories of WWW images. Keywords: classification, support vector machine, web images | |||
| Hera presentation generator | | BIBAK | Full-Text | 952-953 | |
| Flavius Frasincar; Geert-Jan Houben; Peter Barna | |||
| Semantic Web Information Systems (SWIS) are Web Information Systems that use
Semantic Web technologies. Hera is a model-driven design methodology for SWIS.
In Hera, models are represented in RDFS and model instances in RDF. The Hera
Presentation Generator (HPG) is an integrated development environment that
supports the presentation generation layer of the Hera methodology. The HPG is
based on a pipeline of data transformations driven by different Hera models. Keywords: RDF(S), SWIS, WIS, design environment, semantic web | |||
| Can link analysis tell us about web traffic? | | BIBAK | Full-Text | 954-955 | |
| Marcin Sydow | |||
| In this paper we measure correlation between link analysis characteristics
for Web pages such as in- and out-degree, PageRank and RBS with those obtained
from real Web traffic analysis. Measurements made on real data from the Polish
Web show that PageRank is observably but not strongly correlated with actual
visits made by Web users to Web pages and that our RBS algorithm[2] is more
correlated with traffic data than PageRank in some cases. Keywords: PageRank, RBS, link analysis, web traffic analysis | |||
| Analyzing web page headings considering various presentation | | BIBAK | Full-Text | 956-957 | |
| Yushin Tatsumi; Toshiyuki Asahi | |||
| Exploiting document structure can solve the usability problem when browsing
web pages designed for PCs with non-PC terminals. For example, by exploiting
headings among document structure and showing them selectively within a
display, users can easily grasp a page's overview. In this paper, as a basic
part of document structure analysis, we propose a heading analysis method for
web pages considering various presentation. Results of evaluation experiments
confirmed that our proposed method could extract many headings that could not
be extracted by using HTML element names. Keywords: content adaptation, heading analysis, web document analysis | |||
| Predicting navigation patterns on the mobile-internet using time of the week | | BIBAK | Full-Text | 958-959 | |
| Martin Halvey; Mark T. Keane; Barry Smyth | |||
| A predictive analysis of user navigation in the Internet is presented that
exploits time-of-the-week data. Specifically, we investigate time as an
environmental factor in making predictions about user navigation. An analysis
is carried out of a large sample of user, navigation data (over 3.7 million
sessions from 0.5 million users) in a mobile-Internet context to determine
whether user surfing patterns vary depending on the time of the week on which
they occur. We find that the use of time improves the predictive accuracy of
navigation models. Keywords: WAP, WWW, browsing, log file analysis, mobile, mobile-web, navigation,
prediction, user modeling | |||
| Finding group shilling in recommendation system | | BIBAK | Full-Text | 960-961 | |
| Xue-Feng Su; Hua-Jun Zeng; Zheng Chen | |||
| In the age of information explosion, recommendation system has been proved
effective to cope with information overload in e-commerce area. However,
unscrupulous producers shill the systems in many ways to make profit, and it
makes the system imprecise and unreliable in a long term. Among many shilling
behaviors, a new form of attack, called group shilling, appears and does great
harm to the system. Because group shilling users are now well organized and
become more hidden among various normal users, it is hard to find them by
traditional methods. However, these group shilling users are similar to some
extent, for they both shill the target items. We bring out a similarity
spreading algorithm to find these group shilling users and protect
recommendation system from unfair ratings. In our algorithm, we try to find
these cunning group shilling users through propagating similarities from items
to users iteratively. The experiment shows our similarity spreading algorithm
improves the precision of the system and provides the system a reliable
protection. Keywords: collaborative filtering, group shilling, recommendation system | |||
| SLL: running my web services on your WS platforms | | BIBAK | Full-Text | 962-963 | |
| Donald Kossmann; Christian Reichel | |||
| Today, the choice for a particular programming language limits the
alternative products that can be used to deploy the program. For instance, a
Java program must be executed using a Java VM. This limitation is particularly
harmful for the emergence of a new programming paradigm like SOA and Web
Services because platforms for new innovative programming languages are
typically not as stable and mature as the established platforms for traditional
programming paradigms. The purpose of this work is to break the strong ties
between programming languages and runtime environments and thus make it
possible to innovate at both ends independently. Thereby, the specific focus is
on Web Services and Service-Oriented Architectures; focusing on this domain
makes it possible to achieve this goal with affordable efforts. The key idea is
to introduce a Service Language Layer (SLL) which gives a high-level
abstraction of a service-oriented program and which can easily and efficiently
be executed on alternative Web Services platforms. Keywords: XML, XML-based service language, decoupling, service language layer,
transformation, web services | |||
| An agent system for ontology sharing on WWW | | BIBAK | Full-Text | 964-965 | |
| Kotaro Nakayama; Takahiro Hara; Shojiro Nishio | |||
| Semantic Web Services (SWS), a new generation WWW technology, will
facilitate the automation of Web service tasks, including automated Web service
discovery, execution, composition and mediation by using XML based metadata and
ontology. There have been several efforts to build knowledge representation
languages for Web Services. However, only few attempts have so far been made to
develop applications based on SWS. Especially, front-end agent systems for
users are one of the urgent research areas. The purpose of this paper is to
introduce our new integrated front-end agent system for ontology management and
SWS management. Keywords: agent technologies, ontology, semantic web, web services | |||
| Introducing multimodal character agents into existing web applications | | BIBAK | Full-Text | 966-967 | |
| Kimihito Ito | |||
| This paper proposes a framework in which end-users can instantaneously
modify existing Web applications by introducing multimodal user-interface. The
authors use the IntelligentPad architecture and MPML as the basis of the
framework. Example applications include character agents that read the latest
news on a news Web site. The framework does not require users to write any
program codes or scripts to introduce multimodal user-interface to existing Web
applications. Keywords: IntelligentPad, MPML, multimodal user interface, web application | |||
| Interactive web-wrapper construction for extracting relational information from web documents | | BIBAK | Full-Text | 968-969 | |
| Tsuyoshi Sugibuchi; Yuzuru Tanaka | |||
| In this paper, we propose a new user interface to interactively specify Web
wrappers to extract relational information from Web documents. In this study,
we focused on improving user's trial-and-error repetitions for constructing a
wrapper. Our approach is a combination of a light-weight wrapper construction
method and the dynamic previewing interface which quickly previews how
generated wrapper works. We adopted a simple algorithm which can construct a
Web wrapper from given extraction examples in less than 100 milliseconds. By
using the algorithm, our system dynamically generates a new wrapper from a
stream of user's mouse events for specifying extraction examples, and
immediately updates a preview result that shows how the generated wrapper
extracts HTML nodes from a source Web document. Through this animated display,
a user can make a lot of wrapper construction trials with various different
combinations of extraction examples by only moving a mouse on the Web document,
and reach a good set of examples to obtain an intended wrapper in a short time. Keywords: information extraction, user interfaces, web wrappers | |||
| Multispace information visualization framework for the intercomparison of data sets retrieved from web services | | BIBAK | Full-Text | 970-971 | |
| Masahiko Itoh; Yuzuru Tanaka | |||
| We introduce a new visualization framework for the intercomparison of more
than one data set retrieved from Web services. In our framework, we use more
than one visualization space simultaneously, each of which visualizes a single
data set retrieved from the Web service. For this purpose, we provide a new 3D
component for accessing Web services, and provide a 3D space component, in
which data set retrieved from the Web service is visualized. Moreover, our
framework provides users with various operations applicable to these space
components, i.e., union, intersection, set-difference, cross-product,
selection, projection, and joins. Keywords: IntelligentBox, WorldBottle, visualization, web service | |||
| On the feasibility of low-rank approximation for personalized PageRank | | BIBAK | Full-Text | 972-973 | |
| András A. Benczúr; Károly Csalogány; Tamás Sarlós | |||
| Personalized PageRank expresses backlink-based page quality around
user-selected pages in a similar way to PageRank over the entire Web.
Algorithms for computing personalized PageRank on the fly are either limited to
a restricted choice of page selection or believed to behave well only on
sparser regions of the Web. In this paper we show the feasibility of computing
personalized PageRank by a k < 1000 low-rank approximation of the Page-Rank
transition matrix; by our algorithm we may compute an approximate personalized
Page-Rank by multiplying an n x k, a k x n matrix and the n-dimensional
personalization vector. Since low-rank approximations are accurate on dense
regions, we hope that our technique will combine well with known algorithms. Keywords: link analysis, low-rank approximation, personalized PageRank, singular value
decomposition, web information retrieval | |||
| An architecture for personal semantic web information retrieval system | | BIBAK | Full-Text | 974-975 | |
| Haibo Yu; Tsunenori Mine; Makoto Amamiya | |||
| The semantic Web and Web service technologies have provided both new
possibilities and challenges to automatic information processing. There are a
lot of researches on applying these new technologies into current personal Web
information retrieval systems, but no research addresses the semantic issues
from the whole life cycle and architecture point of view. Web services provide
a new way for accessing Web resources, but until now, they have been managed
separately from traditional Web contents resources. In this poster, we propose
a conceptual architecture for a personal semantic Web information retrieval
system. It incorporates semantic Web, Web services and multi-agent technologies
to enable not only precise location of Web resources but also the automatic or
semi-automatic integration of hybrid Web contents and Web services. Keywords: information retrieval system, semantic web, web portal, web services | |||
| TruRank: taking PageRank to the limit | | BIBAK | Full-Text | 976-977 | |
| Sebastiano Vigna | |||
| PageRank is defined as the stationary state of a Markov chain depending on a
damping factor α that spreads uniformly part of the rank. The choice of
α is eminently empirical, and in most cases the original suggestion
α=0.85 by Brin and Page is still used. It is common belief that values of
α closer to 1 give a "truer to the web" PageRank, but a small α
accelerates convergence. Recently, however, it has been shown that when
α=1 all pages in the core component are very likely to have rank 0 [1].
This behaviour makes it difficult to understand PageRank when α≈1,
as it converges to a meaningless value for most pages. We propose a simple and
natural modification to the standard preprocessing performed on the adjacency
matrix of the graph, resulting in a ranking scheme we call TruRank. TruRank
ranks the web with principles almost identical to PageRank, but it gives
meaningful values also when α≈1. Keywords: PageRank, web graph | |||
| An information extraction engine for web discussion forums | | BIBAK | Full-Text | 978-979 | |
| Hanny Yulius Limanto; Nguyen Ngoc Giang; Vo Tan Trung; Jun Zhang; Qi He; Nguyen Quang Huy | |||
| In this poster, we present an information extraction engine for web-based
forums. The engine analyzes the HTML files crawled from web forums, deduces the
wrapper (template) of the pages and extracts the information about posts (e.g.,
author, title, content, number of replies and views, etc.). Extraction is an
important module for forum search engine, since it helps to understand the
content of a forum HTML page and facilitates ranking during retrieval. We
discuss the system architecture of the extraction engine in the context of a
forum search engine and present various components in the extraction engine. We
also introduce briefly the extraction process and discuss some implementation
issues. Keywords: discussion board, forums, information extraction, information retrieval,
search engine | |||
| Mining web site's topic hierarchy | | BIBAK | Full-Text | 980-981 | |
| Nan Liu; C. Yang | |||
| Searching and navigating a Web site is a tedious task and the hierarchical
models, such as site maps, are frequently used for organizing the Web site's
content. In this work, we propose to model a Web site's content structure using
the topic hierarchy, a directed tree rooted at a Web site's homepage in which
the vertices and edges correspond to Web pages and hyperlinks. Our algorithm
for mining a Web site's topic hierarchy utilizes three types of information
associated with a Web site: link structure, directory structure and Web pages'
content. Keywords: content structure, topic hierarchy, web mining | |||
| Consistency checking of UML model diagrams using the XML semantics approach | | BIBAK | Full-Text | 982-983 | |
| Yasser Kotb; Takuya Katayama | |||
| A software design is often modeled as a collection of unified Modeling
Language (UML) diagrams. There are different aspects of the software system
that are covered by many different UML diagrams. This leads for big risk that
the overall specification of the system becomes inconsistent and
incompleteness. This inherits the necessary to check the consistency between
these related UML diagrams. In addition, as the software system gets evolution,
those diagrams get modified that leads again to possible inconsistency and
incompleteness between the different versions of these diagrams. In this paper,
we plan to employ our previous novel XML semantics approach, which proposed for
checking the semantic consistency of XML documents using attribute grammar
techniques, to check the consistency of UML diagrams. The key idea here is
translating the UML diagrams to its equivalent XMI documents. Then checking the
consistency of these XMI documents, they are special forms of XML, by employing
them to our previous XML semantics approach. Keywords: UML, XMI, XML, attribute grammars, model checking | |||
| Delivering new web content reusing remote and heterogeneous sites. A DOM-based approach | | BIBAK | Full-Text | 984-985 | |
| Luis Álvarez Sabucedo; Luis Anido Rifón | |||
| This contribution addresses the development of new web sites reusing already
existing contents from external sources. Unlike common links to other
resources, which retrieves the whole resource, we propose an approach where
partial retrieval is possible: the unit for data reuse is a node in a DOM tree.
This solution permits the partial reuse of external and heterogeneous web
contents with no need for client (browser) modifications and just minor changes
for web servers. Keywords: DOM, HTTP, URL, content reuse, hypertext, interoperability, reusability, web
server | |||
| Multi-step media adaptation: implementation of a knowledge-based engine | | BIBAK | Full-Text | 986-987 | |
| Peter Soetens; Matthias De Geyter | |||
| Continuing changes in the domains of consumer devices and multimedia formats
demand for a new approach to media adaptation. The publication of customized
content on a device requires an automatic adaptation engine that takes into
account the specifications of both the device and the material to be published.
These specifications can be expressed using a single domain ontology that
describes the concepts of the media adaptation domain. In this document, we
provide insight into the implementation of an adaptation engine that exploits
this domain knowledge. We explain how this engine, through the use of
description matching and Semantic Web Services, composes a chain of adaptation
services which will alter the original content to the needs of the target
device. Keywords: OWL, content adaptation, device independence, multimedia, semantic web,
services, standards | |||
| A clustering method for news articles retrieval system | | BIBAK | Full-Text | 988-989 | |
| Hiroyuki Toda; Ryoji Kataoka | |||
| Organizing the results of a search facilitates the user in overviewing the
information returned. We regard the clustering task as the tasks of making
labels for a list of items and we focus on news articles and propose a
clustering method that uses named entity extraction. Keywords: document clustering, named entity, search result organization | |||
| The language observatory project (LOP) | | BIBAK | Full-Text | 990-991 | |
| Yoshiki Mikami; Pavol Zavarsky; Mohd Zaidi Abd Rozan; Izumi Suzuki; Masayuki Takahashi; Tomohide Maki; Irwan Nizan Ayob; Paolo Boldi; Massimo Santini; Sebastiano Vigna | |||
| The first part of the paper provides a brief description of the Language
Observatory Project (LOP) and highlights the major technical difficulties to be
challenged. The latter part gives how we responded to these difficulties by
adopting UbiCrawler as a data collecting engine for the project. An interactive
collaboration between the two groups is producing quite satisfactory results. Keywords: character sets, language, language digital divide, language identification,
scripts, web crawler | |||
| Association search in semantic web: search + inference | | BIBAK | Full-Text | 992-993 | |
| Liang Bangyong; Tang Jie; Li Juanzi | |||
| Association search is to search for certain instances in semantic web and
then make inferences from and about the instances we have found. In this paper,
we propose the problem of association search and our preliminary solution for
it using Bayesian network. We first minutely define the association search and
its categorization. We then define tasks in association search. In terms of
Bayesian network, we take ontology taxonomy as network structure in Bayesian
network. We use the query log of instances to estimate the network parameters.
After the Bayesian network is constructed, we give the solution for association
search in the network. Keywords: Bayesian network, inference, knowledge management, ontology | |||
| XHTML meta data profiles | | BIBAK | Full-Text | 994-995 | |
| Tantek Çelik; Eric A. Meyer; Matthew Mullenweg | |||
| In this paper, we describe XHTML Meta Data Profiles (XMDP) which use XHTML
to define a simple profile format which is both human and machine readable.
XMDP can be used to extend XHTML by defining new link relationships, meta data
properties/values, and class name semantics. XMDP has already been used to
extend semantic XHTML to represent social networks, document licensing, voting,
and tagging. Keywords: HTML, WWW, XFN, XHTML, XMDP, class names, link relationships, lowercase
semantic web, meta data, microformats, profiles, reuse, schema, world wide web | |||
| An adaptive middleware infrastructure for mobile computing | | BIBAK | Full-Text | 996-997 | |
| Ronnie Cheung | |||
| In a mobile environment where mobile applications suffer from the limitation
and variation of system resources availability, it is desirable for the
applications to adapt their behaviors to resource limitations and variations.
It is also necessary to exploit optimal application performance. However,
adaptation mechanisms by mobile applications usually suffers from the problem
of unfairness to other applications, in contrast, adaptation by the operation
system focuses more on the overall system performance, while neglecting the
needs of individual applications. Hence, the adaptation task is best
coordinated by a middleware that is able to cater for individual application's
need on a fair ground, while maintaining optimal system performance. This is
achieved by a context-aware mobile middleware that sits in between the mobile
application and the operating environment. Keywords: adaptation, middleware infrastructure, mobile environments | |||
| Data versioning techniques for internet transaction management | | BIBAK | Full-Text | 998-999 | |
| Ramkrishna Chatterjee; Gopalan Arun | |||
| An Internet transaction is a transaction that involves communication over
the Internet using standard Internet protocols such as HTTPS. Such transactions
are widely used in Internet-based applications such as e-commerce. With the
growth of the Internet, the volume and complexity of Internet transactions are
rapidly increasing. We present data versioning techniques that can reduce the
complexity of managing Internet transactions and improve their scalability and
reliability. These techniques have been implemented using standard database
technology, without any change in database kernel. Our initial empirical
results argue for the effectiveness of these techniques in practice. Keywords: internet transaction, scalability, versioning | |||
| Using visual cues for extraction of tabular data from arbitrary HTML documents | | BIBAK | Full-Text | 1000-1001 | |
| Bernhard Krüpl; Marcus Herzog; Wolfgang Gatterbauer | |||
| We describe a method to extract tabular data from web pages. Rather than
just analyzing the DOM tree, we also exploit visual cues in the rendered
version of the document to extract data from tables which are not explicitly
marked with an HTML table element. To detect tables, we rely on a variant of
the well-known X-Y cut algorithm as used in the OCR community. We implemented
the system by directly accessing Mozilla's box model that contains the
positional data for all HTML elements of a given web page. Keywords: table detection, visual analysis, web information extraction | |||
| Describing namespaces with GRDDL | | BIBAK | Full-Text | 1002-1003 | |
| Erik Wilde | |||
| Describing XML Namespaces is an open issue for many users of XML
technologies, and even though namespaces are one of the foundations of XML,
there is no generally accepted and widely used format for namespace
descriptions. We present a framework for describing namespaces based on GRDDL
using a controlled vocabulary. Using this frame-work, namespace descriptions
can be easily generated, harvested and published in human- or machine-readable
form. Keywords: languages, management | |||
| Building an open source meta-search engine | | BIBAK | Full-Text | 1004-1005 | |
| A. Gulli; A. Signorini | |||
| In this short paper we introduce Helios, a flexible and efficient open
source meta-search engine. Helios currently runs on the top of 18 search
engines (in Web, Books, News, and Academic publication domains), but additional
search engines can be easily plugged in. We also report some performance
mesured during its development. Keywords: meta search engines, open source | |||
| Design and implementation of a feedback controller for slowdown differentiation on internet servers | | BIBAK | Full-Text | 1006-1007 | |
| Jianbin Wei; Cheng-Zhong Xu | |||
| Proportional slowdown differentiation (PSD) aims to maintain slowdown ratios
between different classes of clients according to their pre-specified
differentiation parameters. In this paper, we design a feedback controller to
allocate processing rate on Internet servers for PSD. In this approach, the
processing rate of a class is adjusted by an integral feedback controller
according to the difference between the target slowdown ratio and the achieved
one. The initial rate class is estimated based on predicted workload using
queueing theory. We implement the feedback controller in an Apache Web server.
The experimental results under various environments demonstrate the
controller's effectiveness and robustness. Keywords: feedback control, quality of service, slowdown | |||
| MiSpider: a continuous agent on web pages | | BIBAK | Full-Text | 1008-1009 | |
| Yujiro Fukagaya; Tadachika Ozono; Takayuki Ito; Toramatsu Shintani | |||
| In this paper, we propose a Web based agent system called MiSpider, which
provides intelligent web services on web browsers. MiSpider enables users to
use agents on existing browsers. Users can use MiSpider all over the world only
to access the Internet. MiSpider Agent has persistency, and agents condition
doesn't change if users change a browsing page. Moreover, agents have a message
passing skill to communicate among the agents. Keywords: browsing support, information system, multiagent system | |||
| Automatically learning document taxonomies for hierarchical classification | | BIBAK | Full-Text | 1010-1011 | |
| Kunal Punera; Suju Rajan; Joydeep Ghosh | |||
| While several hierarchical classification methods have been applied to web
content, such techniques invariably rely on a pre-defined taxonomy of
documents. We propose a new technique that extracts a suitable hierarchical
structure automatically from a corpus of labeled documents. We show that our
technique groups similar classes closer together in the tree and discovers
relationships among documents that are not encoded in the class labels. The
learned taxonomy is then used along with binary SVMs for multi-class
classification. We demonstrate the efficacy of our approach by testing it on
the 20-Newsgroup dataset. Keywords: automatic taxonomy learning, hierarchical classification | |||
| Web page marker: a web browsing support system based on marking and anchoring | | BIBAK | Full-Text | 1012-1013 | |
| Takahiro Koga; Noriharu Tashiro; Tadachika Ozono; Takayuki Ito; Toramatsu Shintani | |||
| In this paper, we propose a web browsing support system, called WPM, which
provides marking and anchoring functions on ordinary web browsers. WPM users
can mark words and phrases on web pages by using their browsers without any
extra plug-ins like similar systems, and can anchor words to refer them later.
WPM makes it possible to carry out marking to the existing Web page so that
marking carried out to paper. By changing character decoration partially, the
text is indicated by emphasis and improve readability. WPM is implemented using
proxy agent. This system can be used in everyday browsing, without a user being
conscious of a system by using a proxy. Keywords: browsing support, marking, proxy agent | |||
| An approach for realizing privacy-preserving web-based services | | BIBK | Full-Text | 1014-1015 | |
| Wei Xu; R. Sekar; I. V. Ramakrishnan; V. N. Venkatakrishnan | |||
Keywords: information flow, privacy, web service | |||
| Exploiting the web for point-in-time file sharing | | BIBAK | Full-Text | 1016-1017 | |
| Roberto J. Bayardo; Sebastian Thomschke | |||
| We describe a simple approach to "point-in-time" file sharing based on time
expiring web links and personal webservers. This approach to file sharing is
useful in environments where instant messaging clients are varied and don't
necessarily support (compatible) file transfer protocols. We discuss the
features of such an approach along with a successfully deployed implementation
now in wide use throughout the IBM corporation. Keywords: file sharing, instant messaging, personal web server | |||
| Using OWL for querying an XML/RDF syntax | | BIBAK | Full-Text | 1018-1019 | |
| Rubén Tous; Jaime Delgado | |||
| Some recent initiatives try to take profit from RDF to make XML documents
interoperate at the semantic level. Ontologies are used to establish semantic
connections among XML languages, and some mechanisms have been defined to query
them with natural XML query languages like XPath and XML Query. Generally
structure-mapping approaches define a simple translation between trivial XPath
expressions and some RDF query language like RDQL; however some XPath
constructs cannot be covered in a structure-mapping strategy. In contrast, our
work takes the model-mapping approach, respectful with node order, that allows
mapping all XPath axis. The obtained XPath implementation has the properties of
schema-awareness and IDREF-awareness, so it can be used to exploit inheritance
hierarchies defined in one or more XML schemas. Keywords: RDF, XML, XPath, idref-awareness, interoperability, ontologies,
schema-awareness, semantic integration | |||
| Signing individual fragments of an RDF graph | | BIBAK | Full-Text | 1020-1021 | |
| Giovanni Tummarello; Christian Morbidoni; Paolo Puliti; Francesco Piazza | |||
| Being able to determine the provenience of statements is a fundamental step
in any SW trust modeling. We propose a methodology that allows signing of small
groups of RDF statements. Groups of statements signed with this methodology can
be safely inserted into any existing triple store without the loss of
provenance information since only standard RDF semantics and constructs are
used. This methodology has been implemented and is both available as open
source library and deployed in a SW P2P project. Keywords: RDF, digital signature, semantic web, trust | |||
| Hybrid semantic tagging for information extraction | | BIBAK | Full-Text | 1022-1023 | |
| Ronen Feldman; Benjamin Rosenfeld; Moshe Fresko; Brian D. Davison | |||
| The semantic web is expected to have an impact at least as big as that of
the existing HTML based web, if not greater. However, the challenge lays in
creating this semantic web and in converting existing web information into the
semantic paradigm. One of the core technologies that can help in migration
process is automatic markup, the semantic markup of content, providing the
semantic tags to describe the raw content. This paper describes a hybrid
statistical and knowledge-based information extraction model, able to extract
entities and relations at the sentence level. The model attempts to retain and
improve the high accuracy levels of knowledge-based systems while drastically
reducing the amount of manual labor by relying on statistics drawn from a
training corpus. The implementation of the model, called TEG (Trainable
Extraction Grammar), can be adapted to any IE domain by writing a suitable set
of rules in a SCFG (Stochastic Context Free Grammar) based extraction language,
and training them using an annotated corpus. The experiments show that our
hybrid approach outperforms both purely statistical and purely knowledge-based
systems, while requiring orders of magnitude less manual rule writing and
smaller amount of training data. We also demonstrate the robustness of our
system under conditions of poor training data quality. This makes the system
very suitable for converting legacy web pages to semantic web pages. Keywords: HMM, information extraction, rules based systems, semantic web, text mining | |||
| GalaTex: a conformant implementation of the XQuery full-text language | | BIBAK | Full-Text | 1024-1025 | |
| Emiran Curtmola; Sihem Amer-Yahia; Philip Brown; Mary Fernández | |||
| We describe GalaTex, the first complete implementation of XQuery Full-Text,
a W3C specification that extends XPath 2.0 and XQuery 1.0 with full-text
search. XQuery Full-Text provides composable full-text search primitives such
as keyword search, Boolean queries, and keyword-distance predicates. GalaTex is
intended to serve as a reference implementation for XQuery Full-Text and as a
platform for addressing new research problems such as scoring full-text query
results, optimizing XML queries over both structure and text, and evaluating
top-k queries on scored results. GalaTex is an all-XQuery implementation
initially focused on completeness and conformance rather than on efficiency. We
describe its implementation on top of Galax, a complete XQuery implementation. Keywords: XQuery, conformant prototype, full-text | |||
| Guidelines for developing trust in health websites | | BIBAK | Full-Text | 1026-1027 | |
| E. Sillence; P. Briggs; L. Fishwick; P. Harris | |||
| How do people decide which health websites to trust and which to reject?
Thirteen participants all diagnosed with hypertension were invited to search
for information and advice relating to hypertension. Participants took part in
a four-week study engaging in both free and directed web searches. A content
analysis of the group discussions revealed support for a staged model of trust
in which mistrust or rejection of websites is based on design factors and trust
or selection of websites is based on content factors such as source credibility
and personalization. A number of guidelines for developing trust in health
websites are proposed. Keywords: computer mediated communication, credibility, health, internet, social
identity, trust | |||
| Efficient structural joins with on-the-fly indexing | | BIBAK | Full-Text | 1028-1029 | |
| Kun-Lung Wu; Shyh-Kwei Chen; Philip S. Yu | |||
| Previous work on structural joins mostly focuses on maintaining offline
indexes on disks. Most of them also require the elements in both sets to be
sorted. In this paper, we study an on-the-fly, in-memory indexing approach to
structural joins. There is no need to sort the elements or maintain indexes on
disks. We identify the similarity between the structural join problem and the
stabbing query problem, and extend a main memory-based indexing technique for
stabbing queries to structural joins. Keywords: XML, containment queries, structural joins | |||
| Processing link structures and linkbases on the web | | BIBAK | Full-Text | 1030-1031 | |
| François Bry; Michael Eckert | |||
| Hyperlinks are an essential feature of the World Wide Web, highly
responsible for its success. XLink improves on HTML's linking capabilities in
several ways. In particular, links after XLink can be "out-of-line" (i.e., not
defined at a link source) and collected in (possibly several) linkbases, which
considerably ease building complex link structures.
Modeling of link structures and processing of linkbases under the Web's "open world linking" are aspects neglected by XLink. Adding a notion of "interface" to XLink, as suggested in this work, considerably improves modeling of link structures. When a link structure is traversed, the relevant linkbase(s) might become ambiguous. We suggest three linkbase management modes governing the binding of a linkbase to a document to resolve this ambiguity. Keywords: XLink, hyperlink, link modeling and processing, linkbase | |||
| A comprehensive comparative study on term weighting schemes for text categorization with support vector machines | | BIBAK | Full-Text | 1032-1033 | |
| Man Lan; Chew-Lim Tan; Hwee-Boon Low; Sam-Yuan Sung | |||
| Term weighting scheme, which has been used to convert the documents as
vectors in the term space, is a vital step in automatic text categorization. In
this paper, we conducted comprehensive experiments to compare various term
weighting schemes with SVM on two widely-used benchmark data sets. We also
presented a new term weighting scheme tf-rf to improve the term's
discriminating power. The controlled experimental results showed that this
newly proposed tf-rf scheme is significantly better than other widely-used term
weighting schemes. Compared with schemes related with tf factor alone, the idf
factor does not improve or even decrease the term's discriminating power for
text categorization. Keywords: SVM, categorization, term weighting schemes, text | |||
| A model for short-term content adaptation | | BIBAK | Full-Text | 1034-1035 | |
| Marco Benini; Alberto Trombetta; Michela Acquaviva | |||
| This paper proposes a model for short-term content adaptation whose aim is
to satisfy the contingent needs of users by adjusting the information a
web-application provides on the basis of a short-term user profile. The
mathematical model results in the design of an adaptive filter that profiles
users by observing their queries to the application and that adjusts the
answers of the application according to the inferred user needs. Also, the
mathematical model ensures the correctness of the filter, that is, the filter
is guaranteed to exhibit a coherent short-term adaptive behaviour. Keywords: information filtering, user modelling | |||
| Semantic virtual environments | | BIBAK | Full-Text | 1036-1037 | |
| Karsten A. Otto | |||
| Today's Virtual Environment (VE) systems share a number of issues with the
HTML-based World Wide Web. Their content is usually designed for presentation
to humans, and thus is not suitable for machine access. This is complicated by
the large number of different data models and network protocols in use.
Accordingly, it is difficult to develop VE software, such as agents, services,
and tools.
In this paper we adopt the Semantic Web idea to the field of virtual environments. Using the Resource Description Framework (RDF) we establish a machine-understandable abstraction of existing VE systems -- the Semantic Virtual Environments (SVE). On this basis it is possible to develop system-independent software, which could even operate across VE system boundaries. Keywords: components, framework, integration, semantic web, virtual environments | |||
| Verify Feature Models using protegeowl | | BIBAK | Full-Text | 1038-1039 | |
| Hai Wang; Yuan Fang Li; Jing Sun; Hongyu Zhang | |||
| Feature models are widely used in domain engineering to capture common and
variant features among systems in a particular domain. However, the lack of a
widely-adopted means of precisely representing and formally verifying feature
models has hindered the development of this area. This paper presents an
approach to modeling and verifying feature diagrams using Semantic Web
ontologies. Keywords: OWL, feature modeling, ontologies, semantic web | |||
| Multiple strategies detection in ontology mapping | | BIBAK | Full-Text | 1040-1041 | |
| Jie Tang; Yong Liang; Zi Li | |||
| Ontology mapping is the task of finding semantic relationships between
entities (i.e. concept, attribute and relation) of two ontologies. In the
existing literatures, many (semi-)automatic approaches have found considerable
interest by combining several mapping strategies (namely multi-strategy
mapping). However, experiments show that multi-strategy based mapping does not
always outperform its single-strategy counterpart. We here mainly consider the
following questions: For a new, unseen mapping task, should one use a
multi-strategy or a single-strategy? And if the task is suitable for
multi-strategy, then which strategies should be selected in the combined
scenario? This paper proposes an approach of multiple strategies detection for
ontology mapping. The results obtained so far show that multi-strategy
detection improves both on precision and recall significantly. Keywords: multi-strategy detection, ontology mapping, semantic web | |||
| A study on combination of block importance and relevance to estimate page relevance | | BIBAK | Full-Text | 1042-1043 | |
| Shen Huang; Yong Yu; Shengping Li; Gui-Rong Xue; Lei Zhang | |||
| Some work showed that segmenting web pages into "semantic independent"
blocks could help to improve the whole page retrieval. One key and unexplored
issue is how to combine the block importance and relevance to a given query. In
this poster, we first propose an automatic way to measure block importance to
improve retrieval. After that, user information need is also concerned to
refine block importance for different users. Keywords: block importance, block relevance, information need, iterative combination | |||
| Towards autonomic web-sites based on learning automata | | BIBAK | Full-Text | 1044-1045 | |
| Pradeep S; Chitra Ramachandran; Srinath Srinivasa | |||
| Autonomics or self-reorganization becomes pertinent for web-sites serving a
large number of users with highly varying workloads. An important component of
self-adaptation is to model the behaviour of users and adapt accordingly. This
paper proposes a learning-automata based technique for model discovery. User
access patterns are used to construct an FSM model of user behaviour that in
turn is used for prediction and prefetching. The proposed technique uses a
generalization algorithm to classify behaviour patterns into a small number of
generalized classes. It has been tested on both synthetic and live data-sets
and has shown a prediction hit-rate of up to 89% on a real web-site. Keywords: autonomic website, generalization, learning automata | |||
| On business activity modeling using grammars | | BIBAK | Full-Text | 1046-1047 | |
| Savitha Srinivasan; Arnon Amir; Prasad Deshpande; Vladimir Zbarsky | |||
| Web based applications offer a mainstream channel for businesses to manage
their activities. We model such business activity in a grammar-based framework.
The Backus Naur form notation is used to represent the syntax of a regular
grammar corresponding to Web log patterns of interest. Then, a deterministic
finite state machine is used to parse Web logs against the grammar. Detected
tasks are associated with metadata such as time taken to perform the activity,
and aggregated along relevant corporate dimensions. Keywords: data mining, web log analysis | |||
| Soundness proof of Z semantics of OWL using institutions | | BIBAK | Full-Text | 1048-1049 | |
| Dorel Lucanu; Yuan Fang Li; Jin Song Dong | |||
| The correctness of the Z semantics of OWL is the theoretical foundation of
using software engineering techniques to verify Web ontologies. As OWL and Z
are based on different logical systems, we use institutions to represent their
underlying logical systems and use institution morphisms to prove the
correctness of the Z semantics for OWL DL. Keywords: OWL, Z, comorphism of institutions, institution | |||
| An analysis of search engine switching behavior using click streams | | BIBAK | Full-Text | 1050-1051 | |
| Yun-Fang Juan; Chi-Chao Chang | |||
| In this paper, we propose a simple framework to characterize the switching
behavior between search engines based on click streams. We segment users into a
number of categories based on their search engine usage during two adjacent
time periods and construct the transition probability matrix across these usage
categories. The principal eigenvector of the transposed transition probability
matrix represents the limiting probabilities, which are proportions of users in
each usage category at steady state. We experiment with this framework using
click streams focusing on two search engines: one with a large market share and
the other with a small market share. The results offer interesting insights
into search engine switching. The limiting probabilities provide empirical
evidence that small engines can still retain its fair share of users over time. Keywords: Markov chain, clustering, limiting probabilities, principal eigenvectors,
probability matrix, search engines, sequence, session, switching behavior,
transition | |||
| Comparing relevance feedback algorithms for web search | | BIBAK | Full-Text | 1052-1053 | |
| Vishwa Vinay; Ken Wood; Natasa Milic-Frayling; Ingemar J. Cox | |||
| We evaluate three different relevance feedback (RF)algorithms, Rocchio,
Robertson/Sparck-Jones (RSJ) and Bayesian, in the context of Web search. We use
a target-testing experimental procedure whereby a user must locate a specific
document. For user relevance feedback, we consider all possible user choices of
indicating zero or more relevant documents from a set of 10 displayed
documents. Examination of the effects of each user choice permits us to compute
an upper-bound on the performance of each RF algorithm.
We ind that there is a significant variation in the upper-bound performance o the three RF algorithms and that the Bayesian algorithm approaches the best possible. Keywords: evaluation, relevance feedback, web search | |||
| SAT-MOD: moderate itemset fittest for text classification | | BIBAK | Full-Text | 1054-1055 | |
| Jianlin Feng; Huijun Liu; Jing Zou | |||
| In this paper, we present a novel association-based method called SAT-MOD
for text classification. SAT-MOD views a sentence rather than a document as a
transaction, and uses a novel heuristic called MODFIT to select the most
significant itemsets for constructing a category classifier. The effectiveness
of SAT-MOD has been demonstrated comparable to well-known alternatives such as
LinearSVM and much better than current document-level words association based
methods on the Reuters corpus. Keywords: MODFIT (moderate itemset fittest) heuristic, text classification | |||
| Applying NavOptim to minimise navigational effort | | BIBAK | Full-Text | 1056-1057 | |
| David Lowe; Xiaoying Kong | |||
| A major factor in the effectiveness of the interaction which users have with
Web applications is the ease with which they can locate information and
functionality which they are seeking. Effective design is however complicated
by the multiple design purposes and diverse users which Web applications
typically support. In this paper we describe a navigational design method aimed
at optimising designs through minimizing navigational entropy. The approach
uses a theoretical navigational depth for the various information and service
components to moderate a nested hierarchical clustering of the content. Keywords: design, efforts metrics, navigation architecture | |||
| Building reactive web applications | | BIBAK | Full-Text | 1058-1059 | |
| Federico M. Facca; Stefano Ceri; Jacopo Armani; Vera Demaldé | |||
| The Adaptive Web is a new research area addressing the personalization of
the Web experience for each user. In this paper we propose a new high-level
model for the specification of Web applications that take into account the
manner users interact with the application for supplying appropriate contents
or gathering profile data. We therefore consider entire processes (rather than
single properties) as smallest information units, allowing for automatic
restructuring of application components. For this purpose, a high-level
Event-Condition-Action (ECA) paradigm is proposed, which enables capturing
arbitrary (and timed) clicking behaviors. Keywords: adaptive web, design method, eca rule, user modeling | |||
| Detection of phishing webpages based on visual similarity | | BIBAK | Full-Text | 1060-1061 | |
| Liu Wenyin; Guanglin Huang; Liu Xiaoyue; Zhang Min; Xiaotie Deng | |||
| An approach to detection of phishing webpages based on visual similarity is
proposed, which can be utilized as a part of an enterprise solution for
anti-phishing. A legitimate webpage owner can use this approach to search the
Web for suspicious webpages which are visually similar to the true webpage. A
webpage is reported as a phishing suspect if the visual similarity is higher
than its corresponding preset threshold. Preliminary experiments show that the
approach can successfully detect those phishing webpages for online use. Keywords: anti-phishing, information filtering, visual similarity, web document
analysis | |||
| Modeling the author bias between two on-line computer science citation databases | | BIBAK | Full-Text | 1062-1063 | |
| Vaclav Petricek; Ingemar J. Cox; Hui Han; Isaac G. Councill; C. Lee Giles | |||
| We examine the difference and similarities between two on-line computer
science citation databases DBLP and CiteSeer. The database entries in DBLP are
inserted manually while the CiteSeer entries are obtained autonomously. We show
that the CiteSeer database contains considerably fewer single author papers.
This bias can be modeled by an exponential process with intuitive explanation.
The model permits us to predict that the DBLP database covers approximately 30%
of the entire literature of Computer Science. Keywords: DBLP, acquisition bias, bibliometrics, citeSeer | |||
| Hubble: an advanced dynamic folder system for XML | | BIBAK | Full-Text | 1064-1065 | |
| Ning Li; Joshua Hui; Hui-I Hsiao; Kevin Beyer | |||
| Organizing large document collections for finding information easily and
quickly has always been an important user requirement. This paper describes a
flexible and powerful dynamic folder technology, called Hubble, which exploits
XML semantics to precisely categorize XML documents into categories or folders. Keywords: XML, categorization, content navigation, dynamic folder | |||
| Support for arbitrary regions in XSL-FO: a proposal for extending XSL-FO semantics and processing model | | BIBAK | Full-Text | 1066-1067 | |
| Ana Cristina B. da Silva; Joao B. S. de Oliveira; Fernando T. M. Mano; Thiago B. Silva; Leonardo L. Meirelles; Felipe R. Meneguzzi; Fabio Giannetti | |||
| This paper proposes an extension of the XSL-FO standard which allows the
specification of an unlimited number of arbitrarily shaped page regions. These
extensions are built on top of XSL-FO 1.1 to enable flow content to be laid out
into arbitrary shapes and allowing for page layouts currently available only to
desktop publishing software. Such a proposal is expected to leverage XSL-FO
towards usage as an enabling technology in the generation of content intended
for personalized printing. Keywords: LaTeX, SVG, XML, XSL-FO, digital printing | |||
| Improved timing control for web server systems using internal state information | | BIBAK | Full-Text | 1068-1069 | |
| Xue Liu; Rong Zheng; Jin Heo; Lui Sha | |||
| How to effectively allocate system resource to meet the Service Level
Agreement (SLA) of Web servers is a challenging problem. In this paper, we
propose an improved scheme for autonomous timing performance control in Web
servers under highly dynamic traffic loads. We devise a novel delay regulation
technique called Queue Length Model Based Feedback Control utilizing server
internal state information to reduce response time variance in presence of
bursty traffic. Both simulation and experimental studies using synthesized
workloads and real-world Web traces demonstrate the effectiveness of the
proposed approach. Keywords: SLA, control theory, feedback, queueing model, web server | |||
| Service discovery and measurement based on DAML-QoS ontology | | BIBAK | Full-Text | 1070-1071 | |
| Chen Zhou; Liang-Tien Chia; Bu-Sung Lee | |||
| As more and more Web services are deployed, Web service's discovery
mechanisms become essential. Similar services can have quite different QoS
behaviors. For service selection and management purpose, it is necessary to
clearly specify QoS constraints and metrics definitions for Web services. We
investigate on the semantic QoS specification and introduce our design
principles on it. Based on the specification refinement and conformance, we
introduce the QoS matchmaking algorithm with multiple matching degrees. The
matchmaking prototype is designed to prove the feasibility. Well-defined
Metrics can be further utilized by measurement organizations to monitor and
evaluate the promised service level objectives. Keywords: QoS, matchmaking, semantic web, web service discovery | |||
| Boosting SVM classifiers by ensemble | | BIBAK | Full-Text | 1072-1073 | |
| Yan-Shi Dong; Ke-Song Han | |||
| By far, the support vector machines (SVM) achieve the state-of-the-art
performance for the text classification (TC) tasks. Due to the complexity of
the TC problems, it becomes a challenge to systematically develop classifiers
with better performance. We try to attack this problem by ensemble methods,
which are often used for boosting weak classifiers, such as decision tree,
neural networks, etc., and whether they are effective for strong classifiers is
not clear. Keywords: classifier design and evaluation, information filtering, machine learning,
neural nets, text processing | |||
| Adaptive query routing in peer web search | | BIBAK | Full-Text | 1074-1075 | |
| Le-Shin Wu; Ruj Akavipat; Filippo Menczer | |||
| An unstructured peer network application was proposed to address the query
forwarding problem of distributed search engines and scalability limitations of
centralized search engines. Here we present novel techniques to improve local
adaptive routing, showing they perform significantly better than a simple
learning scheme driven by query response interactions among neighbors. We
validate prototypes of our peer network application via simulations with 500
model users based on actual Web crawls. We finally compare the quality of the
results with those obtained by centralized search engines, suggesting that our
application can draw advantages from the context and coverage of the peer
collective. Keywords: adaptive query routing, peer collaborative search, topical crawlers | |||
| Transforming web contents into a storybook with dialogues and animations | | BIBAK | Full-Text | 1076-1077 | |
| Kaoru Sumi; Katsumi Tanaka | |||
| This paper describes a medium, called Interactive e-Hon, for helping
children to understand contents from the Web. It works by transforming
electronic contents into an easily understandable "storybook world." In this
world, easy-to-understand contents are generated by creating 3D animations that
include contents and metaphors, and by using a child-parent model with dialogue
expression and a question-answering style comprehensible to children. Keywords: agent, animation, dialogue, information presentation, media conversion | |||
| AVATAR: an approach based on semantic reasoning to recommend personalized TV programs | | BIBAK | Full-Text | 1078-1079 | |
| Yolanda Blanco; José J. Pazos; Alberto Gil; Manuel Ramos; Ana Fernández; Rebeca P. Díaz; Martín López; Belén Barragáns | |||
| In this paper a TV recommender system called AVATAR (AdVAnce Telematic
search of Audiovisual contents by semantic Reasoning) is presented. This tool
uses the experience gained in the field of the Semantic Web to personalize the
TV programs shown to the end users. The main contribution of our system is a
process of semantic reasoning carried out on the descriptions of the TV
contents -- provided by means of metainformation -- and on the viewer
preferences -- contained in personal profiles. Such process allows to diversify
the offered suggestions maintaining the personalization, given that the aim is
to find contents appealing for the users, which are related semantically to
their programs of interest.
Here the framework proposed for this reasoning is introduced, by including (i) the OWL ontology chosen to represent the knowledge of our application domain, (ii) the organization of the user profiles, (iii) the query language LIKO, which is intended to browse the ontology and (iv) the semantic relations inferred from the system knowledge base. Keywords: TV recommender system, inference of semantic relations, ontologies, semantic
web | |||
| WAND: a meta-data maintenance system over the internet | | BIBAK | Full-Text | 1080-1081 | |
| Anubhav Bhatia; Saikat Mukherjee; Saugat Mitra; Srinath Srinivasa | |||
| WAND is a meta-data management system that provides a file-system tree for
users of an internet based P2P network. The tree is robust and retains its
structure even when nodes (peers) enter and leave the network. The robustness
is based on a concept of virtual folders that are automatically created to
retain paths to lower level folders whenever a node hosting a higher-level
folder moves away. Other contributions of the WAND system include its novel
approach towards managing root directory information and handling network
partitions. Keywords: maintenance, meta-data, peer-to-peer, wide-area distributed file system | |||
| Composite event queries for reactivity on the web | | BIBAK | Full-Text | 1082-1083 | |
| James Bailey; François Bry; Paula-Lavinia Pätrânjan | |||
| Reactivity on the Web is an emerging issue. The capability to automatically
react to events (such as updates to Web resources) is essential for both Web
services and Semantic Web systems. Such systems need to have the capability to
detect and react to complex, real life situations. This presentation gives
flavours of the high-level language XChange, for programming reactive behaviour
on the Web. Keywords: composite events, event-condition-action rules, reactive languages, web | |||
| Learning how to learn with web contents | | BIBAK | Full-Text | 1084-1085 | |
| Akihiro Kashihara; Shinobu Hasegawa | |||
| Learning Web contents requires learners not only to navigate the Web pages
to construct their own knowledge from the contents learned at and between the
pages, but also to control their own navigation and knowledge construction
processes. However, it is not so easy to control the learning processes. The
main issue addressed is how to help learners learn how to learn with Web
contents. This paper discusses how to design a meta-learning tool. Keywords: hyperspace, learning affordance, meta-learning, navigational learning, web
contents | |||
| From user-centric web traffic data to usage data | | BIBAK | Full-Text | 1086-1087 | |
| Thomas Beauvisage; Houssem Assadi | |||
| In this paper, we describe a user-centric Internet usage data processing
platform. Raw usage data is collected using a software probe installed on a
panel of Internet users' workstations. It is then processed by our platform.
The transformation of raw usage data into qualified and usable information by
Internet usage sociology researchers means setting up a series of relatively
complex processes using quite a wide variety of resources. We use a combination
of ad hoc rule-based systems and external resources to qualify the visited Web
pages. We also implemented topological and temporal indicators in order to
describe the dynamics of Web sessions. Keywords: internet uses, traffic analysis, usage data, user-centric traffic data, web
usage mining | |||
| Multichannel publication of interactive media documents in a news environment | | BIBAK | Full-Text | 1088-1089 | |
| Tom Beckers; Nico Oorts; Filip Hendrickx; Rik Van De Walle | |||
| Multichannel publication of multimedia presentations poses a significant
challenge on the generic description of the presentation content and the system
necessary to convert these descriptions into final-form presentations. We
present a solution based on the XiMPF document model and a component based
system architecture. Keywords: XML, device independence, framework, interactivity, multichannel
publication, multimedia, standards | |||
| Advanced fault analysis in web service composition | | BIBAK | Full-Text | 1090-1091 | |
| L. Ardissono; L. Console; A. Goy; G. Petrone; C. Picardi; M. Segnan; D. Theseider Dupré | |||
| Currently, fault management in Web Services orchestrating multiple suppliers
relies on a local analysis, that does not span across individual services, thus
limiting the effectiveness of recovery strategies. We propose to address this
limitation by employing Model-Based Diagnosis to enhance fault analysis. In our
approach, a Diagnostic Web Service is added to the set of Web Services
providing the overall service, and acts as a supervisor of their execution, by
identifying anomalies and explaining them in terms of faults to be repaired. Keywords: diagnosis, fault management, web service composition | |||
| Mining directed social network from message board | | BIBAK | Full-Text | 1092-1093 | |
| Naohiro Matsumura; David E. Goldberg; Xavier Llorà | |||
| In the paper, we present an approach to mining a directed social network
from a message board on the Internet where vertices denote individuals and
directed links denote the flow of influence. The influence is measured based on
propagating terms among individuals via messages. The distance with respect to
contextual similarity between individuals is acquired since the influence
indicates the degree of their shared interest represented as terms. Keywords: directed social network, internet message board | |||
| Incremental page rank computation on evolving graphs | | BIB | Full-Text | 1094-1095 | |
| Prasanna Desikan; Nishith Pathak; Jaideep Srivastava; Vipin Kumar | |||
| Enhancing the privacy of web-based communication | | BIBAK | Full-Text | 1096-1097 | |
| Aleksandra Korolova; Ayman Farahat; Philippe Golle | |||
| A profiling adversary is an adversary whose goal is to classify a population
of users into categories according to messages they exchange. This adversary
models the most common privacy threat against web based communication.
We propose a new encryption scheme, called stealth encryption, that protects users from profiling attacks by concealing the semantic content of plaintext while preserving its grammatical structure and other non-semantic linguistic features, such as word frequency distribution. Given English plaintext, stealth encryption produces ciphertext that cannot efficiently be distinguished from normal English text (our techniques apply to other languages as well). Keywords: privacy, profiling, protection | |||
| Generating XSLT scripts for the fast transformation of XML documents | | BIBAK | Full-Text | 1098-1099 | |
| Dong-Hoon Shin; Kyong-Ho Lee | |||
| This paper proposes a method of generating XSLT scripts, which support the
fast transformation of XML documents, given one-to-one matching relationships
between leaf nodes of XML schemas. The proposed method enhances the
transformation speed of generated XSLT scripts through reducing template calls.
Experimental results show that the proposed method has generated XSLT scripts
that support the faster transformation of XML documents, compared with previous
works. Keywords: XML, XSLT, document transformation | |||
| ALVIN: a system for visualizing large networks | | BIBK | Full-Text | 1100-1101 | |
| Davood Rafiei; Stephen Curial | |||
Keywords: network visualization, sampling, visualizing the web | |||
| Analysis of topic dynamics in web search | | BIBAK | Full-Text | 1102-1103 | |
| Xuehua Shen; Susan Dumais; Eric Horvitz | |||
| We report on a study of topic dynamics for pages visited by a sample of
people using MSN Search. We examine the predictive accuracies of probabilistic
models of topic transitions for individuals and groups of users. We explore
temporal dynamics by comparing the accuracy of the models for predicting topic
transitions at increasingly distant times in the future. Finally, we discuss
directions for applying models of search topic dynamics. Keywords: topic analysis, topic transition, user modeling, web search | |||
| Clustering for probabilistic model estimation for CF | | BIBAK | Full-Text | 1104-1105 | |
| Qing Li; Byeong Man Kim; Sung Hyon Myaeng | |||
| Based on the type of collaborative objects, a collaborative filtering (CF)
system falls into one of two categories: item-based CF and user-based CF.
Clustering is the basic idea in both cases, where users or items are classified
into user groups where users share similar preference or item groups where
items have similar attributes or characteristics. Observing the fact that in
user-based CF each user community is characterized by a Gaussian distribution
on the ratings for each item and the fact that in item-based CF the ratings of
each user in item community satisfy a Gaussian distribution, we propose a
method of probabilistic model estimation for CF, where objects (user or items)
are classified into groups based on the content information and ratings at the
same time and predictions are made considering the Gaussian distribution of
ratings. Experiments on a real-world data set illustrate that our approach is
favorable. Keywords: collaborative filtering, information filtering, probabilistic model | |||
| An experimental study on large-scale web categorization | | BIBAK | Full-Text | 1106-1107 | |
| Tie-Yan Liu; Yiming Yang; Hao Wan; Qian Zhou; Bin Gao; Hua-Jun Zeng; Zheng Chen; Wei-Ying Ma | |||
| Taxonomies of the Web typically have hundreds of thousands of categories and
skewed category distribution over documents. It is not clear whether existing
text classification technologies can perform well on and scale up to such
large-scale applications. To understand this, we conducted the evaluation of
several representative methods (Support Vector Machines, k-Nearest Neighbor and
Naive Bayes) with Yahoo! taxonomies. In particular, we evaluated the
effectiveness/efficiency tradeoff in classifiers with hierarchical setting
compared to conventional (flat) setting, and tested popular threshold tuning
strategies for their scalability and accuracy in large-scale classification
problems. Keywords: algorithm complexity, parameter tuning strategies, text categorization, very
large web taxonomies | |||
| Site abstraction for rare category classification in large-scale web directory | | BIBAK | Full-Text | 1108-1109 | |
| Tie-Yan Liu; Hao Wan; Tao Qin; Zheng Chen; Yong Ren; Wei-Ying Ma | |||
| Automatically classifying the Web directories is an effective way to manage
Web information. However, our experiments showed that the state-of-the-art text
classification technologies could not lead to acceptable performance in this
task. Due to our analysis, the main problem is the lack of effective training
data in rare categories of Web directories. To tackle this problem, we proposed
a novel technology named Site Abstraction to synthesize new training examples
from the website of the existing training document. The main idea is to
propagate features through parent-child relationship in the sitemap tree.
Experiments showed that our method significantly improved the classification
performance. Keywords: hierarchical classification, site abstraction, support vector machines
(SVM), text classification, web directory | |||
| Designing learning services: from content-based to activity-based learning systems | | BIBAK | Full-Text | 1110-1111 | |
| Pythagoras Karampiperis; Demetrios Sampson | |||
| The need for e-learning systems that support a diverse set of pedagogical
requirements has been identified as an important issue in web-based education.
Until now, significant R&D effort has been devoted aiming towards web-based
educational systems tailored to specific pedagogical approaches. The most
advanced of them are based on the IEEE Learning Technology Systems Architecture
and use standardized content structuring based on the ADL Sharable Content
Object Reference Model in order to enable sharing and reusability of the
learning content. However, sharing of learning activities among different
web-based educational systems still remains an open issue. The open question is
how web-based educational systems should be designed in order to enable reusing
and repurposing of learning activities. In this paper we propose an authoring
system, refered to as ASK-LDT that utilizes the Learning Design principles to
provide the means for designing activity-based learning services and systems. Keywords: architectures, authoring tools, learning activities, learning design,
reusability | |||
| Topological spaces of the web | | BIBK | Full-Text | 1112-1113 | |
| Gabriel Ciobanu; Dănuţ Rusu | |||
Keywords: separation, topology density, web metrics | |||
| Extracting context to improve accuracy for HTML content extraction | | BIBAK | Full-Text | 1114-1115 | |
| Suhit Gupta; Gail Kaiser; Salvatore Stolfo | |||
| Previous work on content extraction utilized various heuristics such as link
to text ratio, prominence of tables, and identification of advertising. Many of
these heuristics were associated with "settings", whereby some heuristics could
be turned on or off and others parameterized by minimum or maximum threshold
values. A given collection of settings -- such as removing table cells with
high linked to non-linked text ratios and removing all apparent advertising --
might work very well for a news website, but leave little or no content left
for the reader of a shopping site or a web portal We present a new technique,
based on incrementally clustering websites using search engine snippets, to
associate a newly requested website with a particular "genre", and then employ
settings previously determined to be appropriate for that genre, with
dramatically improved content extraction results overall. Keywords: DOM trees, HTML, accessibility, content extraction, context, reformatting,
speech rendering | |||
| Constructing extensible XQuery mappings | | BIBAK | Full-Text | 1116-1117 | |
| Gang Qian; Yisheng Dong | |||
| Constructing and maintaining semantic mappings are necessary but troublesome
in data sharing systems. While most current work focuses on seeking automated
techniques to solve this problem, this paper proposes a combination model for
constructing extensible mappings between XML schemas. In our model, complex
global mappings are constructed by first defining simple atomic mappings for
each target schema element, and then combining them using a few basic
operators. At the same time, we provide automated support for constructing such
combined mappings. Keywords: XQuery, automated support, extensibility, mapping | |||
| TJFast: effective processing of XML twig pattern matching | | BIBAK | Full-Text | 1118-1119 | |
| Jiaheng Lu; Ting Chen; Tok Wang Ling | |||
| Finding all the occurrences of a twig pattern in an XML database is a core
operation for efficient evaluation of XML queries. A number of algorithms have
been proposed to process a twig query based on region encoding. In this paper,
based on a novel labeling scheme: extended Dewey, we propose a novel and
efficient holistic twig join algorithm, namely TJFast. Compared to previous
work, our algorithm only needs to access the labels of leaf query nodes. We
report our experimental results to show that our algorithms are superior to
previous approaches in terms of the number of elements scanned and query
performance. Keywords: holistic twig join, labeling scheme | |||
| Web services security configuration in a service-oriented architecture | | BIBAK | Full-Text | 1120-1121 | |
| Takeshi Imamura; Michiaki Tatsubori; Yuichi Nakamura; Christopher Giblin | |||
| Security is one of the major concerns when developing mission-critical
business applications, and this concern motivated the Web Services Security
specifications. However, the existing tools to configure the security
properties of Web Services give a technology-oriented view; only assisting in
choosing data to encrypt and the encryption algorithms to use. A user must
manually bridge the gap between the security requirements and the
configuration, which could cause extra configuration costs and lead to
potential misconfiguration hazards. To ease this situation, we came up with
refining security requirements from business to technology, leveraging the
concepts of Service-Oriented Architecture (SOA) and Model-Driven Architecture
(MDA). Security requirements are gradually transformed to more detailed ones or
countermeasures by bridging the gap between them by using best practice
patterns. Keywords: best practice pattern, model-driven architecture, security configuration,
service-oriented architecture, web services security | |||
| BackRank: an alternative for PageRank? | | BIBAK | Full-Text | 1122-1123 | |
| Mohamed Bouklit; Fabien Mathieu | |||
| This paper proposes to extend a previous work, The Effect of the Back Button
in a Random Walk: Application for PageRank [5]. We introduce an enhanced
version of the PageRank algorithm using a realistic model for the Back button,
thus improving the random surfer model. We show that in the special case where
the history is bound to an unique page (you cannot use the Back button twice in
a row), we can produce an algorithm that does not need much more resources than
a standard PageRank. This algorithm, BackRank, can converge up to 30% faster
than a standard PageRank and suppress most of the drawbacks induced by the
existence of pages without links. Keywords: PageRank, back button, flow, random walk, web analysis | |||
| Finding the boundaries of information resources on the web | | BIBAK | Full-Text | 1124-1125 | |
| Pavel Dmitriev; Carl Lagoze; Boris Suchkov | |||
| In recent years, many algorithms for the Web have been developed that work
with information units distinct from individual web pages. These include
segments of web pages or aggregation of web pages into web communities. Using
these logical information units has been shown to improve the performance of
many web algorithms. In this paper, we focus on a type of logical information
units called compound documents. We argue that the ability to identify compound
documents can improve information retrieval, automatic metadata generation, and
navigation on the Web. We propose a unified framework for identifying the
boundaries of compound documents, which combines both structural and content
features of constituent web pages. The framework is based on a combination of
machine learning and clustering algorithms, with the former algorithm
supervising the latter one. Experiments on a collection of educational web
sites show that our approach can reliably identify most of the compound
documents on these sites. Keywords: WWW, clustering, compound documents | |||
| Semantic search of schema repositories | | BIB | Full-Text | 1126-1127 | |
| Tanveer Syeda-Mahmood; Gauri Shah; Lingling Yan; Willi Urban | |||
| Improving text collection selection with coverage and overlap statistics | | BIBAK | Full-Text | 1128-1129 | |
| Thomas Hernandez; Subbarao Kambhampati | |||
| In an environment of distributed text collections, the first step in the
information retrieval process is to identify which of all available collections
are more relevant to a given query and which should thus be accessed to answer
the query. We address the challenge of collection selection when there is full
or partial overlap between the available text collections, a scenario which has
not been examined previously despite its real-world applications. To that end,
we present COSCO, a collection selection approach which uses
collection-specific coverage and overlap statistics. We describe our
experimental results which show that the presented approach displays the
desired behavior of retrieving more new results early on in the collection
order, and performs consistently and significantly better than CORI, previously
considered to be one of the best collection selection systems. Keywords: collection overlap, collection selection, statistics gathering | |||
| A framework for handling dependencies among web services transactions | | BIBAK | Full-Text | 1130-1131 | |
| Seunglak Choi; Jungsook Kim; Hyukjae Jang; Su Myeon Kim; Junehwa Song; Hangkyu Kim; Yunjoon Lee | |||
| This paper proposes an effective Web services (WS) transaction management
framework to automatically manage inconsistencies occurred by relaxing
isolation of WS transactions. Keywords: isolation relaxation, transaction management protocol, transaction model,
web services | |||
| Middleware services for web service compositions | | BIBAK | Full-Text | 1132-1133 | |
| Anis Charfi; Mira Mezini | |||
| WS-* specifications cover a variety of issues ranging from security and
reliability to transaction support in web services. However, these
specifications do not address web service compositions. On the other hand, BPEL
as the future standard web service composition language allows the
specification of the functional part of the composition as a business process
but it fails short in expressing non-functional properties such as security,
reliability and persistence. In this paper, we propose an approach for the
transparent integration of technical concerns in web service compositions. Our
approach is driven by the analogy between web services and software components
and is inspired from server-side component models such as Enterprise Java
Beans. The main components of our framework are the process container, the
middleware services and the deployment descriptor. Keywords: BPEL, middleware, web service composition | |||
| Application networking on peer-to-peer networks | | BIBAK | Full-Text | 1134-1135 | |
| Mu Su; Chi-Hung Chi | |||
| This paper proposes the AN.P2P architecture to facilitate efficient
peer-to-peer content delivery with heterogeneous presentation requirements. In
general, the AN.P2P enables a peer to deliver the original content objects and
an associated workflow to other peers. The workflow is composed of content
adaptation tasks. Hence, the recipient can reuse the original object to
generate appropriate presentations for other peers. Keywords: application networking, peer-to-peer content distribution | |||
| Web data cleansing for information retrieval using key resource page selection | | BIBAK | Full-Text | 1136-1137 | |
| Yiqun Liu; Canhui Wang; Min Zhang; Shaoping Ma | |||
| With the page explosion of WWW, how to cover more useful information with
limited storage and computation resources becomes more and more important in
web IR research. Using web page non-content feature analysis, we proposed a
clustering-based method to select high quality pages from the whole page set.
Although the result page set contains only 44.3% of the whole collection, it is
related with more than 98% of links and covers about 90% of key information.
Link property and retrieval affects are also observed and experiment results
show that key resource selection method is more suitable for the job of data
cleansing and the result page set outperforms the whole collection by smaller
size and better retrieval performance. Keywords: non-content feature, web IR, web data cleansing | |||
| Web resource geographic location classification and detection | | BIBAK | Full-Text | 1138-1139 | |
| Chuang Wang; Xing Xie; Lee Wang; Yansheng Lu; Wei-Ying Ma | |||
| Rapid pervasion of the web into users' daily lives has put much importance
on capturing location-specific information on the web, due to the fact that
most human activities occur locally around where a user is located. This is
especially true in the increasingly popular mobile and local search
environments. Thus, how to correctly and effectively detect locations from web
resources has become a key challenge to location-based web applications. In
this paper, we first explicitly distinguish the locations of web resources into
three types to cater to different application needs: 1) provider location; 2)
content location; and 3) serving location. Then we describe a unified system
that computes each of the three locations, employing a set of algorithms and
different geographic sources. Keywords: content location, geographic location, location-based web application,
provider location, serving location, web location | |||
| Ontology-based learning content repurposing | | BIBAK | Full-Text | 1140-1141 | |
| Katrien Verbert; A Dragan Gasević; A Jelena Jovanović; Erik Duval | |||
| This paper investigates basic research issues that need to be addressed for
developing an architecture that enables repurposing of learning objects in a
flexible way. Currently, there are a number of Learning Object Content Models
(e.g. the SCORM Content Aggregation Model) that define learning objects and
their components in a more or less precise way. However, these models do not
allow repurposing of fine-grained components (sentences, images). We developed
an ontology-based solution for content repurposing. The ontology is a solid
basis for an architecture that will enable on-the-fly access to learning object
components and that will facilitate repurposing these components. Keywords: content models, learning objects, metadata, ontologies, repurposing | |||
| Representing personal web information using a topic-oriented interface | | BIBAK | Full-Text | 1142-1143 | |
| Zhigang Hua; Hao Liu; Xing Xie; Hanqing Lu; Wei-Ying Ma | |||
| Nowadays, Web activities have become daily practice for people. It is
therefore essential to organize and present this continuously increasing Web
information in a more usable manner. In this paper, we developed a novel
approach to reorganize personal Web information as a topic-oriented interface.
In our approach, we proposed to utilize anchor, title and URL information to
represent content information for the browsed Web pages rather than the content
body. Furthermore, we explored three methods to organize personal Web
information: 1) top-down statistical clustering; 2) salience phrase based
clustering; and 3) support vector machine (SVM) based classification. Finally,
we conducted a usability study to verify the effectiveness of our proposed
solution. The experimental results demonstrated that users could visit the
pages that have been browsed previously more easily with our approach than
existing solutions. Keywords: clustering, personal web information, topic classfication, user information
mining, user interface | |||
| Web2Talkshow: transforming web content into TV-program-like content based on the creation of dialogue | | BIBAK | Full-Text | 1144-1145 | |
| Akiyo Nadamoto; Masaki Hayashi; Katsumi Tanaka | |||
| We propose a new browsing system called "Web2Talkshow". It transforms
declarative-based web content into humorous dialog-based TV-program-like
content that is presented through cartoon animation and synthesized speech. The
system does this based on keywords in the original web content. Web2Talkshow
enable users to get desired web content easily, pleasantly, and in a
user-friendly way while being able to continue working on other tasks. Thus,
using it will be much like watching TV. Keywords: TV-program-like content, dialogue, humor, web content | |||
| WCAG formalization with W3C standards | | BIBAK | Full-Text | 1146-1147 | |
| Vicente Luque Centeno; Carlos Delgado Kloos; Martin Gaedke; Martin Nussbaumer | |||
| Web accessibility consists on a set of checkpoints which are rather
expensive to evaluate or to spot. However, using W3C technologies, this cost
can be clearly minimized. This article presents a W3C formalized rule-set
version for automatable checkpoints from WCAG 1.0. Keywords: WAI, WCAG, XPath, XPointer, XQuery | |||
| Bootstrapping ontology alignment methods with APFEL | | BIBK | Full-Text | 1148-1149 | |
| Marc Ehrig; Steffen Staab; York Sure | |||
Keywords: alignment, machine learning, mapping, matching, ontology | |||
| Understanding the function of web elements for mobile content delivery using random walk models | | BIBAK | Full-Text | 1150-1151 | |
| Xinyi Yin; Wee Sun Lee | |||
| In this paper, we describe a method for understanding the function of web
elements. It classifies web elements into five functional categories: Content
(C), Related Links (R), Navigation and Support (N), Advertisement (A) and Form
(F). We construct five graphs for a web page, and each graph is designed such
that most of the probability mass of the stationary distribution is
concentrated in nodes belong to its corresponding category. We perform random
walks on these graphs until convergence and classify based on its rank value in
different graphs. Our experiment shows that the new method performed very well
comparing to basic machine learning methods. Keywords: HTML, WWW (world wide web), classification | |||
| Does learning how to read Japanese have to be so difficult: and can the web help? | | BIBK | Full-Text | 1152-1153 | |
| Julien Quint; Ulrich Apel | |||
Keywords: kanji, Japanese, SVG, graphetic dictionary, reading help | |||
| The semantic webscape: a view of the semantic web | | BIBAK | Full-Text | 1154-1155 | |
| Juhnyoung Lee; Richard Goodwin | |||
| It has been a few years since the semantic Web was initiated by W3C, but its
status has not been quantitatively measured. It is crucial to understand the
status at this early stage, for researchers, developers and administrators to
gain insight into what will come in this field. The objective of our work is to
quantitatively measure and present the status of the semantic Web. We conduct a
longitudinal study on the semantic Web pages to track trends in the use of
semantic markup languages. This paper presents early results of this study with
two historical data sets from October 2003 and October 2004. Our results show
that while it is very early stage of semantic Web adoption, its growth outpaces
that of the entire Web for the period. Also, RDF (Resource Description
Framework) has dominated among semantic markup languages, taking about 98% of
all semantic pages on the Web. It has been used in a variety of metadata
annotation applications. This study shows that the most popular application is
RSS (RDF Site Summary) for syndicating news and blogs, which takes more than
60% of all semantic Web pages. It also shows that the use of OWL (Web Ontology
Language) which was recommended by W3C in early 2004 has been increased 900%
for the period. Keywords: RSS, markup languages, ontology, semantic web | |||
| A modeling approach to federated identity and access management | | BIBAK | Full-Text | 1156-1157 | |
| Martin Gaedke; Johannes Meinecke; Martin Nussbaumer | |||
| As the Web is increasingly used as a platform for heterogeneous
applications, we are faced with new requirements to authentication,
authorization and identity management. Modern architectures have to control
access not only to single, isolated systems, but to whole business-spanning
federations of applications and services. This task is complicated by the
diversity of today's specifications concerning e.g. privacy, system integrity
and distribution in the web. As an approach to such problems, in this paper, we
introduce a solution catalogue of reusable building blocks for Identity and
Access Management (IAM). The concepts of these blocks have been realized in a
configurable system that supports IAM solutions for Web-based applications. Keywords: federation, identity and access management, reuse, security | |||
| XSLT by example | | BIBAK | Full-Text | 1158-1159 | |
| Daniele Braga; Alessandro Campi; Roberto Cappa; Damiano Salvi | |||
| XQBE (XQuery By Example, [1]), a visual dialect of XQuery, uses hierarchical
structures to express transformations between XML documents. XSLT, the standard
transformation language for XML, is increasingly popular among programmers and
Web developers for separating the application and presentation layers of Web
applications. However, its syntax and its rule-based execution paradigm are
rather intricate, and the number of XSLT experts is limited; the availability
of easier "dialects" could be extremely valuable and may contribute to the
adoption of XML for developing data-centered Web applications and services.
With this motivation in mind, we adapted XQBE to serve as a visual interface
for expressing XML-to-XML transformations and generate the XSLT code that
performs such transformations. Keywords: XML, XQuery, semi-structured data, visual query languages | |||
| Automated semantic web services orchestration via concept covering | | BIBAK | Full-Text | 1160-1161 | |
| T. Di Noia; E. Di Sciascio; F. M. Donini; A. Ragone; S. Colucci | |||
| We exploit the recently proposed Concept Abduction inference service in
Description Logics to solve Concept Covering problems. We propose a framework
and polynomial greedy algorithm for semantic based automated Web service
orchestration, fully compliant with Semantic Web technologies. We show the
proposed approach is able to deal with not exact solutions, computing an
approximate orchestration with respect to an agent request modeled a subset of
OWL-DL. Keywords: description logics, orchestration, semantic web, semantic web services | |||
| Answering order-based queries over XML data | | BIBAK | Full-Text | 1162-1163 | |
| Zografoula Vagena; Nick Koudas; Divesh Srivastava; Vassilis J. Tsotras | |||
| Order-based queries over XML data include XPath navigation axes such as
following-sibling and following. In this paper, we present holistic algorithms
that evaluate such order-based queries. An experimental comparison with
previous approaches shows the performance benefits of our algorithms. Keywords: XML, holistic algorithms, order-based queries | |||
| A publish and subscribe collaboration architecture for web-based information | | BIBAK | Full-Text | 1164-1165 | |
| M. Brian Blake; David H. Fado; Gregory A. Mack | |||
| Markup languages, representations, schemas, and tools have significantly
increased the ability for organizations to share their information. Languages,
such as the Extensible Markup Language (XML), provide a vehicle for
organizations to represent information in a common, machine-interpretable
format. Although these approaches facilitate the collaboration and integration
of inter-organizational information, the reality is that the schema
representations behind these languages are reasonably difficult to learn, and
automated schema integration (without semantics or ontology mappings) is
currently an open problem. In this paper, we introduce an architecture and
service-oriented infrastructure to facilitate organizational collaboration that
combines the push features of the publish/subscribe protocol with storage of
distributed registry capabilities. Keywords: distributed and heterogeneous information management, management of
semi-structured data | |||
| Migrating web application sessions in mobile computing | | BIBAK | Full-Text | 1166-1167 | |
| G. Canfora; G. Di Santo; G. Venturi; E. Zimeo; M. V. Zito | |||
| The capability to change user agent while working is starting to appear in
state of the art mobile computing due to the proliferation of different kinds
of devices, ranging from personal wireless devices to desktop computers, and to
the consequent necessity of migrating working sessions from a device to a more
apt one. Research results related to the hand-off at low level are not
sufficient to solve the problem at application level. The paper presents a
scheme for session hand-off in Web applications which, by exploiting a
proxy-based architecture, is able to work without interventions on existing
code. Keywords: mobile computing, session hand-off, web applications | |||
| Video quality estimation for internet streaming | | BIBK | Full-Text | 1168-1169 | |
| Amy Reibman; Subhabrata Sen; Jacobus Van der Merwe | |||
Keywords: network measurement, performance, streaming, video quality | |||
| An approach for ontology-based elicitation of user models to enable personalization on the semantic web | | BIBAK | Full-Text | 1170-1171 | |
| Ronald Denaux; Lora Aroyo; Vania Dimitrova | |||
| A novel framework for eliciting a user's conceptualization based on an
ontology-driven dialog is presented here. It has been integrated in an
RDF/OWL-based architecture of an adaptive learning content management system.
The implemented framework is illustrated with an application scenario to deal
with the cold start problem and to enable tailoring the system's behavior to
the needs of each individual user. Keywords: adaptive content management, application of semantic web technologies,
personalization on the semantic web, user modeling | |||
| Analyzing online discussion for marketing intelligence | | BIBAK | Full-Text | 1172-1173 | |
| Natalie Glance; Matthew Hurst; Kamal Nigam; Matthew Siegler; Robert Stockton; Takashi Tomokiyo | |||
| We present a system that gathers and analyzes online discussion as it
relates to consumer products. Weblogs and online message boards provide forums
that record the voice of the public. Woven into this discussion is a wide range
of opinion and commentary about consumer products. Given its volume, format and
content, the appropriate approach to understanding this data is large-scale web
and text data mining. By using a wide variety of state-of-the-art techniques
including crawling, wrapping, text classification and computational
linguistics, online discussion is gathered and annotated within a framework
that provides for interactive analysis that yields marketing intelligence for
our customers. Keywords: computational linguistics, content systems, information retrieval, machine
learning, text mining | |||
| Exploiting the deep web with DynaBot: matching, probing, and ranking | | BIBAK | Full-Text | 1174-1175 | |
| Daniel Rocco; James Caverlee; Ling Liu; Terence Critchlow | |||
| We present the design of Dynabot, a guided Deep Web discovery system.
Dynabot's modular architecture supports focused crawling of the Deep Web with
an emphasis on matching, probing, and ranking discovered sources using two key
components: service class descriptions and source-biased analysis. We describe
the overall architecture of Dynabot and discuss how these components support
effective exploitation of the massive Deep Web data available. Keywords: crawling, deep web, probing, service class | |||
| A framework for determining necessary query set sizes to evaluate web search effectiveness | | BIBA | Full-Text | 1176-1177 | |
| Eric C. Jensen; Steven M. Beitzel; Ophir Frieder; Abdur Chowdhury | |||
| We describe a framework of bootstrapped hypothesis testing for estimating the confidence in one web search engine outperforming another over any randomly sampled query set of a given size. To validate this framework, we have constructed and made available a precision-oriented test collection consisting of manual binary relevance judgments for each of the top ten results of ten web search engines across 896 queries and the single best result for each of those queries. Results from this bootstrapping approach over typical query set sizes indicate that examining repeated statistical tests is imperative, as a single test is quite likely to find significant differences that do not necessarily generalize. We also find that the number of queries needed for a repeatable evaluation in a dynamic environment such as the web is much higher than previously studied. | |||
| Wireless SOAP: optimizations for mobile wireless web services | | BIBAK | Full-Text | 1178-1179 | |
| Naresh Apte; Keith Deutsch; Ravi Jain | |||
| We propose a set of optimization techniques, collectively called Wireless
SOAP (WSOAP), to compress SOAP messages transmitted across a wireless link. The
Name Space Equivalency technique rests on the observation that exact recovery
of compressed messages is not required at the receiver; an equivalent form
suffices. The WSDL Aware Encoding technique obtains further savings by
utilizing knowledge of the underlying WSDL by means of an offline protocol we
define. We summarize the design, implementation and performance of our Wireless
SOAP prototype, and show that Wireless SOAP can reduce message sizes by 3x-12x
compared to SOAP. Keywords: SOAP, WSDL, applications, compression, networks, services, web services,
wireless | |||
| METEOR: metadata and instance extraction from object referral lists on the web | | BIBAK | Full-Text | 1180-1181 | |
| Hasan Davulcu; Srinivas Vadrevu; Saravanakumar Nagarajan; Fatih Gelgi | |||
| The Web has established itself as the largest public data repository ever
available. Even though the vast majority of information on the Web is formatted
to be easily readable by the human eye, "meaningful information" is still
largely inaccessible for the computer applications. In this paper we present
the METEOR system which utilizes various presentation and linkage regularities
from referral lists of various sorts to automatically separate and extract
metadata and instance information. Experimental results for the university
domain with 12 computer science department Web sites, comprising 361 individual
faculty and course home pages indicate that the performance of the metadata and
instance extraction averages 85%, 88% F-measure respectively. METEOR achieves
this performance without any domain specific engineering requirement. Keywords: extraction, instance, metadata, object, semantic, web | |||
| Merkle tree authentication of HTTP responses | | BIBAK | Full-Text | 1182-1183 | |
| Roberto J. Bayardo; Jeffrey Sorensen | |||
| We propose extensions to existing web protocols that allow proofs of
authenticity of HTTP server responses, whether or not the HTTP server is under
the control of the publisher. These extensions protect users from content that
may be substituted by malicious servers, and therefore have immediate
applications in improving the security of web caching, mirroring, and relaying
systems that rely on untrusted machines [2,4]. Our proposal relies on Merkle
trees to support 200 and 404 response authentication while requiring only a
single cryptographic hash of trusted data per repository. While existing web
protocols such as HTTPS can provide authenticity guarantees (in addition to
confidentiality), HTTPS consumes significantly more computational resources,
and requires that the hosting server act without malice in generating responses
and in protecting the publisher's private key. Keywords: authenticity, merkle hash tree, web content distribution | |||
| Cyclone: an encyclopedic web search site | | BIBAK | Full-Text | 1184-1185 | |
| Atsushi Fujii; Katunobu Itou; Tetsuya Ishikawa | |||
| We propose a Web search site called "Cyclone", in which a user can retrieve
encyclopedic term descriptions on the Web. Cyclone searches the Web for
headwords and page fragments describing the headwords. High-quality page
fragments are selected as term descriptions and are classified into domains.
The number of current headwords is over 700,000. Keywords: encyclopedias, extraction, organization, web search | |||
| Automated synthesis of executable web service compositions from BPEL4WS processes | | BIBAK | Full-Text | 1186-1187 | |
| M. Pistore; P. Traverso; P. Bertoli; A. Marconi | |||
| We propose a technique for the automated synthesis of new composite web
services. Given a set of abstract bpel4ws descriptions of component services,
and a composition requirement, we automatically generate a concrete bpel4ws
process that, when executed, interacts with the components and satisfies the
requirement.
We implement the proposed approach exploiting efficient representation techniques, and we show its scalability over case studies taken from a real world application and over a parameterized domain. Keywords: automated synthesis, business processes, web service composition | |||
| Web log mining with adaptive support thresholds | | BIBAK | Full-Text | 1188-1189 | |
| Jian-Chih Ou; Chang-Hung Lee; Ming-Syan Chen | |||
| With the fast increase in Web activities, Web data mining has recently
become an important research topic. However, most previous studies of mining
path traversal patterns are based on the model of a uniform support threshold
without taking into consideration such important factors as the length of a
pattern, the positions of Web pages, and the importance of a particular
pattern, etc. In view of this, we study and apply the Markov chain model to
provide the determination of support threshold of Web documents. Furthermore,
by properly employing some techniques devised for joining reference sequences,
a new mining procedure of Web traversal patterns is proposed in this paper. Keywords: Markov model, path traversal pattern, web mining | |||
| Focused crawling by exploiting anchor text using decision tree | | BIBAK | Full-Text | 1190-1191 | |
| Jun Li; Kazutaka Furuse; Kazunori Yamaguchi | |||
| Focused crawlers are considered as a promising way to tackle the scalability
problem of topic-oriented or personalized search engines. To design a focused
crawler, the choice of strategy for prioritizing unvisited URLs is crucial. In
this paper, we propose a method using a decision tree on anchor texts of
hyperlinks. We conducted experiments on the real data sets of four Japanese
universities and verified our approach. Keywords: anchor text, decision tree learning, focused crawling, shortest path | |||
| One project, four schema languages: medley or melee? | | BIBA | Full-Text | 1192 | |
| Makoto Murata | |||
| This talk first gives an overview of an XML project for e-Local Governments, which is under the auspices of MIAC (Ministry of Internal Affairs and Communications) of Japan. This talk then focuses on schema authoring and user interfaces. In particular, the use of four schema languages, namely RELAX NG, W3C XML Schema, DTD, and Schematron, is highlighted. | |||