Co-Chairs
Apostolos ANTONACOPOULOS
University of Liverpool, UK

Jianying HU
IBM T.J. Watson Research Center, USA

 

Program Committee
Henry BAIRD
    Palo Alto Research Center, USA

Thomas BREUEL
    Palo Alto Research Center, USA

Horst BUNKE
    Univ. Bern, Switzerland

Andreas DENGEL
    DFKI, Germany

David DOERMANN
    Univ. Maryland, USA

Matthew HURST
    Intelliseek, USA

Rolf INGOLD
    Univ. Fribourg, Switzerland
Peter KING
    Univ. Manitoba, Canada

Koichi KISE
    Osaka Prefecture Univ., Japan

Nicholas KUSHMERICK
    Univ. College Dublin, Ireland

Young-Bin KWON
    Chungang Univ., South Korea

Dan LOPRESTI
    Palo Alto Research Center, USA

Ethan MUNSON
    Univ. Winsconsin, USA

Fuad RAHMAN
    BCL Technologies, USA

Cecile ROISIN
    INRIA Rhône-Alpes, France

Larry SPITZ
    Document Recognition Technologies, New Zealand

Ah-Hwee TAN
    Nanyang Technological University, Singapore

Chew-Lim TAN
    National Univ. Singapore, Singapore

Christine VANOIRBEEK
    EPFL, Switzerland

Marcel WORRING
    Univ. Amsterdam, The Netherlands

WDA2003 is sponsored in part by:





Proceedings of the

Second International Workshop on
Web Document Analysis

(WDA2003)

Edinburgh, UK
August 3, 2003
(co-located with ICDAR2003)

 

Message from the Co-Chairs

 

Session I. Content Extraction

Extracting Structure from HTML Documents for Language Visualization and Analysis
      Robert P. Futrelle, Andrea E. Grimes and Mingyan Shao

Structuring Web Pages Based on Repetition of Elements
      Tomoyuki Nanno, Suguru Saito and Manabu Okumura

Information Extraction from HTML Documents by Structural Matching
      Thomas M. Breuel

STAN: Structural Analysis for Web Documents
      Johannes Goller

Session II. Content Repurposing

Reflowable Document Images for the Web
      Thomas M. Breuel

Adaptive Document Layout via Manifold Content
      Charles Jacobs, Wilmot Li and David H. Salesin

Web Document Analysis: How Can Natural Language Processing Help in Determining
Correct Content Flow?

      Hassan Alam, Fuad Rahman and Yuliya Tarnikova

Web Document Manipulation for Small Screen Devices: A Review
      Hassan Alam and Fuad Rahman

Session III. Modeling and Annotation

Dynamic Generation of Multi-Modal Crosswords in Web Documents
      Stefan Jaeger, Masaki Nakagawa, Hermann Hild, Roald Wolff and Guido Wojke

Visual GQM Approach to Quality-driven Development of Electronic Documents
      Henryk Krawczyk and Bogdan Wiszniewski

A Xanalogical Collaborative Editing Environment
      Angelo Di Iorio and Fabio Vitali

Session IV. Images on The Web

Protecting Websites with Reading-Based CAPTCHAs
      Henry S. Baird and Mark Luk 

A Relevance Model for Web Image Search
      Cheng Thao and Ethan V. Munson

A Novel Web Image Processing Algorithm for Text Area Identification that Helps Commercial
OCR Engines to Improve Their Web Image Recognition Efficiency

      S. J. Perantonis, B. Gatos and V. Maragos

Improving Rendering and OCRability of Color Images for Web Publishing
      Abhishek Gattani and Hareish Gur

Session V. Web Mining

A Comparison of Two Novel Algorithms for Clustering Web Documents
      Adam Schenker, Mark Last, Horst Bunke and Abraham Kandel

Domain-Specific Web Site Identification: The CROSSMARC Focused Web Crawler
      Konstantinos Stamatakis, Vangelis Karkaletsis, Georgios Paliouras, James Horlock, Claire Grover,
     
James R. Curran and Shipra Dingare

Knowledge Management on the Web: Global Anarchy or Global Standardization?
      György Sebestyén