Olivier Liechti, Mark J. Sifer and Tadao Ichikawa
ISL, Hiroshima University, 1-4-1 Kagamiyama,
Higashi-Hiroshima, 739 Japan
email@example.com, firstname.lastname@example.org and email@example.com
A common foundation for emerging Metadata formats is the eXtensible Markup Language [c,d,e,f] (XML) developed by the W3C, which provides a common syntax for a large number of Metadata formats. As discussed in , different applications of XML are envisaged and classified into four broad categories: (1) applications that require the Web client to mediate between two or more heterogeneous databases, (2) applications that attempt to distribute a significant proportion of the processing load from the Web server to the Web client, (3) applications that require the Web client to present different views of the same data to different users, and (4) applications in which intelligent Web agents attempt to tailor information discovery to the needs of individual users. CDF[g], DOM[h], MCF[i] and OSD[j] are some of the numerous acronyms for recently proposed Metadata formats. WebCollections[s] are an application of XML for describing the properties of some object, e.g. the properties of a Web document. Many other proposals have been made, in the mathematical, chemical, commercial application domains for example.
In this paper, we present the Structured Graph Format (SGF), for describing Web site structure. We also present SGMapper, a client side tool, which uses SGF Metadata to generate interactive site maps. We describe in detail how SGMapper cooperates with a standard Web browser and how any Web server can easily be set up to provide SGF Metadata. We illustrate one possible use of SGF, which could however been applied in many other settings.
The problem addressed by SGMapper is to facilitate navigation in hyperspace. When users navigate large sites by using hyperlinks only, they have difficulty maintaining context [4,5]. Many approaches to this problem try to provide some kind of overview in combination with a detailed portion of the site. Fisheye lenses are a method for browsing large graphs , allocating a large proportion of screen space to a restricted part of the graph and squeezing the rest into the remaining area. Fisheye lensing has been combined with zooming and applied to Web navigation in . Related non-linear display methods for trees are hyperbolic maps  and perspective walls . Another approach used by WebToc  and Nif-T-Nav  is to display the site structure as a hierarchy, which can be navigated like a file system. With the HyperG system , a site author explicitly designs a hierarchy of nodes in addition to associative links. An advantage of this is, that both a hierarchy view and a local view of the surrounding associative network are available.
SGF is based on structured graphs , originally developed in the settings of software engineering and recently applied to multimedia data modeling . Structured graphs were designed to allow scalable browsing and editing of very large graphs and therefore may be useful in the context of the Web. They distinguish between hierarchical and associative links, where hierarchical links are intended to capture aggregation relationships. In addition to the explicit associative links, implicit associative links are generated to show associations between sub-hierarchies. This provides for a close integration of the hierarchy and network views, allowing some simple visual queries to be made.
The paper starts with a simple example, showing how the structure of a Web site can be described with a hierarchy and a network. The notion of implicit associative link is introduced. At the same time, navigation of the example site using SGMapper is presented. SGF is introduced with the encoding of the example site structure. Its specification is then given in the form of its Document Type Description (DTD). Some implementation and integration issues are exposed. Final conclusions are given.
The upper frame in Fig. 2a and b illustrates how the site hierarchy, generated on the basis of SGF Metadata sent by the server, is displayed in SGMapper. The user can click on a node to select it, or shift-click on a node to zoom in the hierarchy. In Fig. 2b, the user has zoomed on the Staff node by shift-clicking it and can see only the ancestors and descendants of this node. These two user operations determine the nature of the information displayed in a window: in the first case (Fig. 2a), the user can see an overview of the hierarchy without seeing labels for every node, while in the second case (Fig. 2b) details for only a portion of the structure are shown. Both views are useful and it is possible to switch from one to the other in a single operation. Moreover, the user can open multiple windows, displaying the hierarchy with different focus points. When a node is selected, visual feedback is given in all windows. This helps the user maintain context while getting details. Finally, because SGMapper is designed to cooperate with a standard Web browser, if the user double-clicks on a node, the corresponding URL is automatically downloaded and displayed in the browser.
Figure 3 shows a portion of the site network reachable from Pr. Smith displayed in SGMapper. The upper frame represents the hierarchy and the lower one represents the network. The user has selected Pr. Smith in the hierarchy frame. Consequently, Pr. Smith appears in the center of the network frame, where each node has on its left side the associative link sources and on its right side the associative link destinations. For example, DBMS is on both the left and right sides of Pr. Smith, because there are two opposing links joining these nodes. The user can explore the network either by clicking or shift-clicking a node. The resulting behavior is the same as in the hierarchy frame, that is either overview or detailed information of the structure is displayed, or both at the same time if multiple windows are opened.
SGMapper provides a number of features, which allow users to select
portions of the structure and to determine which explicit/implicit associations
between them are displayed. Space prevents us from describing these functions
in detail. However, we would like to illustrate their ability to answer
simple visual queries. Consider the research groups A, B and C and the
member categories Staff and Students. The user could be interested to find
out, what kind of members are associated with each group. Figure 5a to
c shows, how the user could answer this question with SGMapper. The user
needs to select the Member's sub-hierarchy and to successively click on
each of the three Research groups to see the associations. First, group
A is selected and no association with any member is seen. Second, the user
clicks on group B, as shown in Fig. 5a. There is an (implicit) link between
Group B and Staff, but not between Group B and Students. Third, the user
clicks on Group C, as shown in Fig. 5b. There are (implicit) links between
Group C and Staff, and between Group C and Students. The interpretation
of the three views gives the answer: both students and staff members are
associated with group C, only staff members are associated with group B,
and no one is associated with group A. In Fig. 5c, another function of
the viewer is used to further restrict the displayed associations.
|<!DOCTYPE STRUCTUREDGRAPH SYSTEM "SGF.dtd">
<NODE NODEID="N001" LABEL="SGLab_Home_Page">
<SGATT NAME="URL" VALUE="http://www.sglab.edu" />
<NODE NODEID="N002" LABEL="Teaching">
<SGATT NAME="URL" VALUE="http://www.sglab.edu/teaching.html" />
<NODE NODEID="N003" LABEL="Research">
<SGATT NAME="URL" VALUE="http://www.sglab.edu/research.html" />
<NODE NODEID="N016" LABEL="Project_Z">
<SGATT NAME="URL" VALUE="http://www.sglab.edu/projects/project_z.html" />
<NODE NODEID="N019" LABEL="Peter">
<SGATT NAME="URL" VALUE="http://www.sglab.edu/~peter" />
<LINK SOURCE="N001" DEST="N002"> </LINK>
<LINK SOURCE="N019" DEST="N020"> </LINK>
Table 2 gives the SGF DTD. Without explaining in detail the syntax of an XML DTD [c], we describe the SGF validity constraints with reference to the related XML keywords given in parenthesis:
|<!-- SGF DTD -->
<!-- a SG is defined by 3 sets: nodes, hierarchical and associative links -->
<!ELEMENT STRUCTUREDGRAPH (SGATT*,NODES,HIERARCHY,NETWORK)>
<!-- the set of nodes -->
<!ELEMENT NODES (NODE*)>
<!-- the set of hierarchical links -->
<!ELEMENT HIERARCHY (LINK*)>
<!-- the set of associative links -->
<!ELEMENT NETWORK (LINK*)>
<!-- a node -->
<!ELEMENT NODE (SGATT*)>
NODEID ID #REQUIRED
LABEL CDATA #IMPLIED >
<!-- a link between 2 nodes -->
<!ELEMENT LINK (SGATT*)>
SOURCE IDREF #REQUIRED
DEST IDREF #REQUIRED
LABEL CDATA #IMPLIED >
<!-- SGATT allows definition of application specific attributes -->
<!ELEMENT SGATT EMPTY>
NAME CDATA #REQUIRED
VALUE CDATA #REQUIRED >
The overall behavior of the system is illustrated in Fig. 6: at initialization time, the spy registers both to be alerted when URLs are fetched by the browser (1) and to handle SGF files (2). When a HTML resource is downloaded from the Web (3), the browser alerts the spy (4). When the resource MIME type is SGF (5), then the browser stores it on disk and sends a request to the spy to open it (6). The spy can then forward the Metadata to the Viewer, which can display it (7). When the user double-clicks on a node in the viewer, this alerts the spy (8), which asks the browser to download the corresponding URL (9). The resource is finally fetched and displayed in the browser (10).
At the time of writing, the implementation of this architecture has been almost completed, though some effort is still needed to achieve full inter operability between the viewer and the browser. Both the spy and the viewer have been implemented on MacOS, the former in C++ and the latter in Java. Communication between the spy and the browser is done via Apple Events, communication between the spy and the viewer is done over TCP/IP. The viewer uses Microsoft msxml  [n] XML parser. There should not be any major obstacles in porting the code to the Windows platform, as only the communication between the spy and the browser would need to be rewritten, using Windows based IPC (DDE for example).
Ideally when a visitor arrives on a site, not necessary via the home
page, he/she should be able to get the Metadata quickly and easily, that is
by having to follow a minimal number of links. One possibility, as shown
in Fig. 7, is to have in every document a link to a unique HTML document
(map.html), that gives access to an SGF document (map.sgf). This indirection
for example allows a traditional site map to be provided together with
some information about the Metadata format and available clients.
The easiest way to satisfy the second requirement is to rely on MIME types [o], which are a standard way of determining the content of a file sent across TCP/IP networks. For this reason, we have defined a MIME type for SGF files, namely "text/x-sgf". Web servers can easily be configured to send particular MIME types, generally by adding a line in a file called "mime.types" or the like, which defines a mapping between MIME types and a file extensions.
Finally the question of how to build the SGF document capturing the site structure remains. A straightforward method is to manually encode the information using a text editor. A more realistic approach would be to create or adapt a site design tool, such as NetObject Fusion [p], SiteMill [q] or Front Page [r], which could automatically generate an SGF description of the site on the basis of relationships defined by the designer. Another approach would be to generate the Metadata by parsing the documents of the site. This process would be easier, if the HTML documents contained some extra information about the links, as proposed in .
Benefits of this system include (1) efficiency, as the structure is downloaded once, in a single transfer, permitting exploration of the site structure without sending further requests to the server, (2) help for navigation, as both context and detail information about a site are given to the user by multiple windows, and (3) the ability to answer visual queries, taking advantage of implicit associative links defined by the Structured Graph formalism.
SGMapper is only one of the possible uses of SGF Metadata. Because SGF is a public format, any client application could request SGF documents from a Web servers and process them. Software agents could take advantage of this, by regularly checking the evolution of the structure of a site and initiate appropriate actions. For example, think of an agent, which would notify you every time a new paper is written by your favourite author. Altogether, SGF illustrates three of the envisioned use of XML: (1) applications that attempt to distribute a significant proportion of the processing load from the Web server to the Web client, (2) applications that require the Web client to present different views of the same data to different users, and (3) applications in which intelligent Web agents attempt to tailor information discovery to the needs of individual users.
A first step in our future work is to complete and enhance the integration of SGMapper with Web browsers. Beyond this, we would like to explore the use of other capabilities of structured graphs , such as support for links having multiple destinations and sources, link hierarchies and link types. We have started to implement a robot, which explores a Web site and generates its SGF structure description using a set of rules to distinguish between hierarchical and associative links. We have also started an extension of the system, which would support storing the user's browsing history in SGF. The associated Structured Graph would grow dynamically as the user would visit different sites, either by using SGF Metadata if provided, or by infering the nature of the links with a similar set of rules used by the robot. Format specification, demonstrations and progress reports will be published under SGF home page [t].
|Olivier Liechti received his Diploma in Computer Science at the University of Fribourg, Switzerland in 1995. He has been working for several years at Prisme Informatique in Switzerland, where he was involved in DBMS and intranet related projects. He is now a PhD student at the Information System Laboratory in Hiroshima University, Japan. His main research interests are in Web technologies, CSCW systems, mobile and ubiquitous computing environments.|
|Mark Sifer is presently a JSPS research fellow with the Faculty of Engineering at Hiroshima University. Previously he was a Lecturer at the University of Technology Sydney, where he also completed a PhD. He has also worked at a range of companies and organisations as a software developer. His main research interest is in graph abstraction models and their application to the design of graph based software tools.|
|Tadao Ichikawa graduated from Waseda University also receiving his Doctor of Engineering degree from Waseda University. Prior to his professorship at Hiroshima University, he worked at the Research and Development Laboratory of Kokudai Denshin Denwa Co., Ltd. (KDD) in Tokyo. He moved to Hiroshima University in 1979, where he is responsible for research and education in information and computer sciences as Professor of the Information Systems Laboratory. He initiated the IEEE Symposium on Visual Languages and the IEEE International Conference on Multimedia Computing and Systems in 1984 and 1994, respectively. At the IEEE (Institute of Electrical and Electronics Engineers) Computer Society, he also founded the Technical Committee on Multimedia Computing and the Task Force on YUFORIC (Youth Forum in Computer Science and Engineering)in 1992 and 1995, respectively. He is on the editorial board of the IEEE Transactions on Knowledge and Data Engineering and the International Journal on Visual Languages and Computing. He currently serves as Board of Govenors at the IEEE Computer Society. He is an IEEE Fellow.|