E.NISSAN

An introduction to using the CuProS
metarepresentation language for defining flexible
nested-relation structures for monolingual
and multilingual terminological databases
Ephraim Nissan
School of Computing and Mathematical Sciences,
University of Greenwich

Abstract

In this paper, a preliminary discussion is provided of how to structure a terminological database by resorting to nested relations whose schema is in turn regulated by means of a metaschema coded in the CuProS language (for "Customization Production System").

The database management system is called Raffaello. The application outlined is to a multiliangual database of cognate lexical items.

Introduction

In a sense, this paper is a prelude intended to convey the taste of a forthcoming monograph of mine, now almost completed and tentatively titled Raffaello and CuProS for Structuring Terminological Databases Flexibly. A separate paper in this volume describes Onomaturge, an expert system incorporating knowledge on word formation for the purposes of neologisation. Along with that project, a pool of tools for managing its lexical database was also developed, and named Raffaello. The lexical database of Onomaturge was represented in nested relations (coded as embedded parentheses), practically a tree of properties whose depth (i.e., number of levels) is basically unlimited other than by practical considerations. Examples were given in that paper as illustrations.

The twin subjects of the forthcoming monograph, as preliminarily outlined in the present paper, are:

(a) a description of a metarepresentation-driven tool for retrieval from deep, flexibly structured nested relations, and

(b) an outline of an application to a multilingual database of cognate lexical items.

One of the lectures at the 1999 EAFT Conference in Paris described an ongoing project whose goal is the construction of a terminological database for the languages of Scandinavia and the contiguous regions. More than just one language family are involved, yet how far-reaching are the implications of the phylogenetic cognacy among sets of languages is apparent, for all of the degrees of freedom afforded through the decision-making of separate terminological committees. Lexical databases for language families are of particular interest to, for example, Romance terminologists where a coordinated development of new sublexicons is sought, but historical databases (e.g., for Semitologists) also stand to benefit from such representations where lexical contacts across languages are taken into account.

By using Raffaello nested relation to represent a variety of information on terminology and lexicography, the schema of attributes is quite flexible, and alternative equivalent subschemata are possible. The approach is incremental, in the sense that not only the database itself can be augmented with new data (that is, additions to the "object-level" of the database), but new attributes and new representation syntax can be added (that is, additions to the "meta-level" of the database).

The metadatabase, that is to say, the metaschema in which the schema of the database is specified, is a sequence (within which order is immaterial) of rules, each specifying which structures can be nested under the attribute which lends the rule its name and appears in its left-hand side. The programming language of the metaschema is CuProS, and strictly speaking users don't even have to learn it, as they can recycle extant (possibly superabundant) schemata. By means of the metaschema coded in CuProS, we achieve a separation of:

- the constraints on attribute nesting (and thus, on thematic coherence inside chunks of represented knowledge), from

- the description of the semantics of attributes (data-types, etc.: complex semantics, in turn, can be described in meta-level nested relations, associated with attributes).

Actually, Raffaello is more general than just application to terminology, but its most significant use to date has been in this area indeed.

The Multilingual Database Schema

In the database of the Onomaturge expert system for word formation, deeply nested relations, often large (of tens or on occasion even hundreds lines of code), represent chunks of knowledge associated with lexical entries, and possibly also with morphological word-formation rules. The schema of the lexicographic database can subserve more than just the word-formation task of the control component of Onomaturge. For example, I eventually developed a representational approach to (qualitative rather than word-count) expectations about how spread is knowledge about semantic or encyclopedic objects (or even terminological objects) across such social categories that are age groups, professional environments, and so on (Nissan 1995, 1987b).

As to the schema for multilingual lexical representation, it was developed from the turn of the decade. It includes an architecture, a representation in terms of graphs, and an attribute-nesting schema, to subserve a knowledge-base on a set of languages related either phylogenetically or by lexical borrowing. The exemplification has in the main concerned Semitic languages. The schema developed for this project accounts for the relation between constellations of semantic concepts and constellations of derivatives (related by roots and morphologic derivation patterns, parallel because of cognacy in a given family of languages).

Structures in my multilingual, nested-relation knowledge-base schema comprise the following (while not necessarily all of them):

lexical frames, of:
- roots,
- families of roots,
  or
- derivatives,
semantics-to-lexicon frames,
purely semantic/encyclopedic frames,
semantic-shift frames (accounting for the historical evolution of what a given term signifies),
onomasiological relations (that state how a given concept is expressed, even metaphorically, in various languages or possibly dialects),
frames of morphological formation patterns,

and optionally also:

frames of languages (or dialects, or strata),
frames of textual corpora,
frames of speaking communities (ethnographic frames),
bibliographical frames.

Moreover, other kinds of knowledge-representation can be incorporated in the nested relations, especially charts, or then partitioned semantic networks, a widespread formalism from artificial intelligence. This may be appropriate for expressing complex information that would not fit easily in a preordained standard pattern.

Of the charts in particular, I devised a kind that is suited for expressing the relation between roots and morphological derivatives, and the respective semantic concepts they convey, in the perspective of historical dictionaries for a given language family. Such charts can be drawn, then translated into the database representation in frames (here, this term is being used interchangeably with "nested relations").

Let each sheet of the graph representation be subdivided into an upper part, reserved for the universe of semantic concepts, and a lower part, reserved for the lexicon, that is, the union of the set of lexical roots and of the set of derivatives (or terms in general). In the top part of a sheet, semantic concepts are graphically enclosed in "clouds". Pairs of clouds can be united by "semantic edges". On nondirected semantic edges, if the label is the symbol in which an S surmounts a ~ sign, then the edge is nondirected and symbolizes semantic equivalence, actually, a tentative notion, as synonymy can be found only in context, not in isolation. More generally, a label being the symbol in which an S surmounts two superposed ~ signs means "semantically related". All other semantic edges are arcs (i.e., directed edges), and indicate semantic shift, whose nature can be expressed by a label.

Unenclosed strings in the lexical part of the sheet indicate lexical entries: either terms, or then roots. If information is associated with a given string, it may be convenient to enclose them together in a box. Such lexical objects are connected to a semantic concept (a "cloud") by a nondirected edge, termed a "lexical/semantic edge". Arcs from roots to their derivatives ("derivation arcs") are contained in the lexical part of the sheet.

Besides, lexical objects found there can be connected to each other by nondirected edges, labelled either by a symbol in which an M surmounts a ~ sign (for "morphological equivalence"), or by a symbol in which an M surmounts two superposed ~ signs (for "morphological near-equivalence").

In the semantic part of the sheet, arcs indicating semantic shift (or, in general, semantic aspects of etymology) between concepts are drawn as a dashed line, possibly labelled:

1) with a capitalized string explaining the kind of shift (such as ANALOGY, and the like); or

2) with a lower-case string corresponding to name of an intermediate concept; or

3) with both (e.g., ANALOGY followed by the name of a concept that is the feature involved in the analogy).

In our graph representation, we distinguish between

- an upper, cursory level: a graph depicting relations in an aggregated way, including roots but excluding derivatives; this being useful for gaining insight at an introductory level; and

- a detail level: the root's derivatives, too, are included in the lexical part of the sheet, while the semantic part of the sheet depicts also semantic shifts.

In the short compass of this paper, it would not be possible to explain in detail such a graphic representation, let alone the variety of kinds of frames corresponding to different kinds of information or knowledge about terminology.

Anyway, let it suffice here to point out the difference between:

- an encyclopedic entry for a lexicalised semantic concept, such as the sample frame shown in Appendix A for "guava"; and

- frames that properly describe terminology itself.

This way, an onomasiological (semantics-to-lexicon) frame also named "guava" may include a property stating which kind of frame it is (namely, a semantics-to-lexicon frame), and then a chunk (i.e., a subtree) which under the header (i.e., the attribute) LEXICAL-ENTRY would state how a guava is called in one or more languages.

Moreover, another chunk in the same frame could, under the header (i.e., the attribute) SEMANTIC-SHIFT include pointers to other small frames, each describing a particular pattern of semantic shifts:

- the name for a pear eventually coming to denote "guava" (this is the case of the Swahili dictionary [Johnson 1939]);

- the name for an olive being used to denote "guava" (this occurs in Omani and other Arabic dialects from the south of the Arabian peninsula [Serjeant 1939]);

- the name for a guava being used to mean a lie (it happens in some Latin American varieties of Spanish);

- a diminutive from the name for a guava being used to mean a kind of a shrub: in Cuban Spanish, a "guayabito" is a shrub whose leaves resemble those of the guava shrub, and whose fruit has the size of a cherry, but the same name also came to mean "little mouse".

In turn, the lexical semantic concept "mouse" is involved in other semantic shifts in various languages: in the Emilia-Romagna region of Italy, "musclen", formed as a diminutive which literally means "little mouse", names the kiwi fruit, which quite evidently happened in the last few decades. Better known is the shift of the Latin noun "musculus" from the sense "little mouse" to the sense "muscle", with a calque (`akhbar) being attempted in mediaeval Hebrew medical terminology, whereas a successful shift in contemporary Hebrew has the same term, `akhbar (literally: "mouse") denote, like the English "mouse", its sense from computing equipment. For Luganda, a language from Uganda, browsing the Kitching and Blackledge Luganda-English dictionary, I found it to list a zoonym for "chameleon" that also means "muscle". For the semantic shift of a term from the sense "muscle" into the sense that is subordinate to "reptilian", the classical example in onomasiology is the Latin "lacerta" (i.e., "lizard" and "arm muscle").

Representing these informations by using my schema of representation is straightforward. In the forthcoming monograph, it is shown how less pictoresque kinds of information that arguably are more central to the usual concerns of the developers of terminological databases can also be represented handily by resorting to the same approach. For example, Appendix B of the present paper shows the CuProS code of the metarepresentation of frames of lexical roots; the syntax cannot be explained in full here, but comments are included.

Appendix A: A semantic/encyclopedic frame for the lexical concept "guava"

Lay or technical common-sense features of guava fruits and plants can be represented as shown in the nested relation shown here.

N.B.: whatever follows a semicolon on a line is just comment.

Appendix B: The CuProS code of the metarepresentation of frames of lexical roots

Bibliography

D. Geeraerts (1983), "Reclassifying Semantic Change", in Quaderni di Semantica, 4(2), pp. 217-240

F. Johnson (1939), A Standard Swahili English Dictionary, Oxford University Press, Oxford

A.L. Kitching and G.R. Blackledge (1925), A Luganda-English and English-Luganda Dictionary, The Uganda Book Shop, Kampala

E. Nissan (1986), "The Frame-Definition Language for Customizing the RAFFAELLO Structure-Editor in Host Expert Systems", in Proceedings of the First International Symposium on Methodologies for Intelligent Systems (ISMIS'86), Knoxville, Tennessee, ACM SIGART Press, Z. Ras and M. Zemankova, eds., New York, pp. 8-18

E. Nissan (1987), "Nested-Relation Based Frames in RAFFAELLO. Representation & Meta-representation Structure & Semantics for Knowledge Engineering", International Workshop on Theory and Applications of Nested Relations and Complex Objects, Darmstadt, Germany, INRIA, France, pp. 95-99

E. Nissan (1987), "Knowledge Acquisition and Metarepresentation: Attribute Autopoiesis", in Proceedings of the Second International Symposium on Methodologies for Intelligent Systems (ISMIS'87), Charlotte, North Carolina, Z. Ras and M. Zemankova, eds., North-Holland, Amsterdam, pp. 240-247

E. Nissan (1987b), "Exception-Admissibility and Typicality in Proto-Representations", in Proceedings of the First International Conference on Terminology and Knowledge Engineering, Trier, Germany, 1987, H. Czap and C. Galinski, eds., Indeks Verlag, Frankfurt, pp. 253-267

E. Nissan (1992), "Deviation Models of Regulation: A Knowledge-Based Approach", in
Informatica e Diritto, Year XVIII (2nd Series, Vol. 1), No. 1/2, pp. 181-212

E. Nissan (1995), "Meanings, Expression, and Prototypes", in Pragmatics and Cognition, 3(2), pp. 317-364

E. Nissan and H. Weiss (1995), "The HyperJoseph Project. Part B: A Representation
Syntax for Intertextuality, that Takes into Account Translation, Editing, and the Page Layout of Given Editions", in Proceedings of the Fourth International Conference on Bible and Computers (AIBI'94), Amsterdam, August 15-18, 1994, F. Poswick, ed., Geneva & Paris: Champion-Slatkine, for Maredsous (Belgium): Association Internationale Bible et Informatique, pp. 163-173

M.Z. Özsoyoglu (1988), ed., Special issue on Nested Relations, The IEEE Data Engineering Bulletin, 11(3), IEEE

R. Sappan (1987), The Logical-Rhetorical Classification of Semantic Changes, Braunton, Merlin

R.B. Serjeant (1988), Review of The Spoken Arabic of Khabura on the Batina of Oman, by A.A. Brockett (Journal of Semitic Studies Monograph, 7. University of Manchester, 1985). Journal of Semitic Studies, 33(1), pp. 146-149

_retour à la page principale_