This paper considers the general problem of media and modality conversion in human-computer interaction. It discusses a generic approach based on abstract models, independent of modalities. The particular case of interfaces for people with disabilities illustrates the discussion.
Fifteen years ago, the development of personal computers, accompanied by the generalisation of networks - especially the Internet - and the apparition of Braille and speech devices, opened new perspectives for visually impaired people. At home, at work or at school, the use of electronic data should help social inclusion. Indeed, a priori, these data can be processed by a computer and used through different modalities; then, multimodal interfaces should be able to present them optimally according to the users' needs and specificities.
Electronic documents are usually planned to be used by one multimodal interface, implementing one static set of modalities, and not by several multimodal interfaces using various alternative modalities. The following discussion is illustrated by results from several researches we carried out during the last five years: Web accessibility [Archambault and Burger, 1999,Duchateau et al., 1999,Archambault et al., 2000]; design of a specific Web browser [Burger and Hadjadj, 1999]; adaptation of workstations [Schwarz et al., 2000] with a view to enabling blind persons to work in an ordinary environment; pedagogical tools for blind pupils [Burger et al., 1996].
Human communication is performed through different channels corresponding to our five senses and to our means of expression (speech, gesture). Depending on the structure of the information transmitted, each channel corresponds to various modalities. For instance, Braille and tactile diagrams are two modalities corresponding to the tactile senses.
A multimodal interface is an interface which is able to use various modalities, and to provide users with various kinds of interaction, possibly through various channels of communication. For instance, standard user interfaces (Figure 1) involve a graphical screen, a mouse, a keyboard, and a loudspeaker. These devices correspond to specific modalities and types of interaction. As there is no or very little redundancy between them, it is called exclusive multimodality [Coutaz, 1991,Burger, 1994]. Usually the data which can be accessed through these interfaces are formatted to fit specifically to that scheme.
In more and more cases the need appears to access the same information using different types of devices, corresponding to alternate modalities; and this trend is increasing.
In each of these cases, information have to be adapted to fit the alternative device, that is information should be converted in order to fit the specific presentation rules which are associated with each modality. Therefore the data model should include all the elements which are necessary to correspond to the specific presentation rules of each modality.
These conversions often request additional information that is difficult to find in the standard interface, because it is expressed graphically or through the layout of graphical objects in the window.
In order to optimise the data presentation, reformulation is often necessary. For instance in a Web page, the links have a special colour and are often underlined; the user can detect them easily. In order to facilitate the translation of such presentations into Braille, several reformulations are possible: put the link into brackets; or blinking the text of the link; using speech together with a special sound at the start of the utterance (using another audio modality) [Archambault et al., 2000].
Even if the standard interface is purely textual, reorganisation of the data may be necessary to optimise the performance. In a workstation adaptation project, we noticed a remarkable improvement of the productivity of blind workers (from +25% to +60%, achieving performances similar to other workers); thanks to data reorganisation [Schwarz et al., 2000].
Web pages often include a large number of links grouped at the top of the page. The simple translation of such HTML documents into a sequential modality, like Braille or speech synthesis, will result in a drastic reduction of information accessibility, because of the very structure of these documents. This is not the case with database servers which make it possible to access raw data, then to structure its presentation and format it especially for each user. One may consider that the representation of information in a database is compatible with a data model independent of modalities, while it is not the case for HTML documents.
In some cases the specificity of a modality makes it necessary to provide special tools. This is the case with modalities which do not allow a global view of the document.
In the case of a web page, the user cannot know whether it is a long or a short page, and how many links it contains. The contextual global information available to a sighted user before he/she starts reading the page is not accessible to users of a non visual interface. A summary of the page content may compensate efficiently for this limitation, even if it is only based on a statistical analysis of this information.
Another case is when the information given by the context, for instance the layout of the screen, is necessary to understand the document. In some cases, specific help should be added. Once again, if the data model does not contain enough information, this help has to be designed specifically by an expert.
At the beginning, the use of the Internet seemed very promising for blind users because it was only textual. The fast development of graphical techniques made it more difficult, and the accessibility of the Web quickly became a problem. When the data model is linked to only one mode of communication, for example graphics, the only way to enable conversions into other modalities is to provide a textual alternative. Otherwise the conversion will cause an important drop in information accessibility. In the case of a Web server home page, the consequences may prove disastrous (inability to access the information available on the server or other related information sources).
To make HTML documents accessible, it was necessary to set up a large number of guidelines [WAI, 1998] - a sizeable work performed by the Web Accessibility Initiative. The main rule of the guidelines is that all information included in documents should be expressed in textual mode.
Universal access to technologies means that user interfaces to these technologies become transparent. That is the interface should display information in the more appropriate way for each user. In other words the information should be converted to fit the modalities used by the user. To perform theses conversions, we can observe in the cases presented previously that:
For current projects, we need more and more robustness in adaptations, and more adaptativity. Therefore we have to design software models that fit these characteristics.
The TIM project [Archambault and al., 2000] concerns the adaptation and design of computer games for all visually impaired children (from 3-4 years old, including children with additional disabilities). The aim is to develop a tool allowing to produce adaptable and adaptative games. This implies the possibility for the game to change the modality according:
For instance, in the case of question-answer games, if the player is a young blind child, starting learning Braille, speech synthesis can be used for the questions, and the proposed answers displayed on the Braille display. Then if the child gives correct answers rapidly during a while, the questions will be displayed on the Braille display. Conversely when the child obviously has difficulties, the speech synthesiser can read the answers in the same time they are displayed in Braille. For younger children, who can hardly understand synthesised speech the questions should be transmitted in audio format (with a recorded 'real' voice).
In fact, for young children, the adaptation implies larger conversions. A lot of games are based upon the global vision of the layout and the visual memory. It is often not enough to simply present the game elements in an alternative way, but it is necessary to change the scenario of the interaction. Therefore the classical software models used until now to make adaptations of interfaces are inappropriate.
The approach of the TIM project is to provide each user with a specific interface, instead of maintaining a standard interface and specific adaptations. The interface will be generated by an interface generator capable of building an adaptative interface for any specific user, from a set of default values corresponding to standard use, and in function of a scenario of interaction (Figure 3). Then the feedback from the user will allow the interface to adapt to his needs and competences.
In most cases the conversion from a multimodal interface to another will cause an important drop in information transmission because data models are very close to the standard display, especially regarding the Web. In fact, technologies have developed much more rapidly than our conscience of their utilities. For most Web designers the HTML language is simply a language describing the aspect of Web pages, rather than the structure and the semantics of the corresponding documents.
The conversion of a standard interface to meet the requirements of a specific user or user community will be replaced by the generation of a multimodal interface adapted to these requirements. Thus we need the data model to be totally independent of the modalities, including, in function of the kind of application and the targetted group of users, some interaction scenarios and alternatives. Such models should allow the use of any modality (alone or in conjunction with other modalities), and any output or input device, including future devices which are not developed yet.