An
introduction to using the CuProS
metarepresentation language for defining flexible nested-relation structures for monolingual and multilingual terminological databases Ephraim Nissan School of Computing and Mathematical Sciences, University of Greenwich |
||
Abstract
Introduction In a sense, this paper is a prelude intended to convey the taste of a forthcoming monograph of mine, now almost completed and tentatively titled Raffaello and CuProS for Structuring Terminological Databases Flexibly. A separate paper in this volume describes Onomaturge, an expert system incorporating knowledge on word formation for the purposes of neologisation. Along with that project, a pool of tools for managing its lexical database was also developed, and named Raffaello. The lexical database of Onomaturge was represented in nested relations (coded as embedded parentheses), practically a tree of properties whose depth (i.e., number of levels) is basically unlimited other than by practical considerations. Examples were given in that paper as illustrations. The twin subjects of the forthcoming monograph, as preliminarily outlined in the present paper, are:
One of the lectures at the 1999 EAFT Conference in Paris described an ongoing project whose goal is the construction of a terminological database for the languages of Scandinavia and the contiguous regions. More than just one language family are involved, yet how far-reaching are the implications of the phylogenetic cognacy among sets of languages is apparent, for all of the degrees of freedom afforded through the decision-making of separate terminological committees. Lexical databases for language families are of particular interest to, for example, Romance terminologists where a coordinated development of new sublexicons is sought, but historical databases (e.g., for Semitologists) also stand to benefit from such representations where lexical contacts across languages are taken into account. By using Raffaello nested relation to represent a variety of information on terminology and lexicography, the schema of attributes is quite flexible, and alternative equivalent subschemata are possible. The approach is incremental, in the sense that not only the database itself can be augmented with new data (that is, additions to the "object-level" of the database), but new attributes and new representation syntax can be added (that is, additions to the "meta-level" of the database). The metadatabase, that is to say, the metaschema in which the schema of the database is specified, is a sequence (within which order is immaterial) of rules, each specifying which structures can be nested under the attribute which lends the rule its name and appears in its left-hand side. The programming language of the metaschema is CuProS, and strictly speaking users don't even have to learn it, as they can recycle extant (possibly superabundant) schemata. By means of the metaschema coded in CuProS, we achieve a separation of:
The Multilingual Database Schema In the database of the Onomaturge expert system for word formation, deeply nested relations, often large (of tens or on occasion even hundreds lines of code), represent chunks of knowledge associated with lexical entries, and possibly also with morphological word-formation rules. The schema of the lexicographic database can subserve more than just the word-formation task of the control component of Onomaturge. For example, I eventually developed a representational approach to (qualitative rather than word-count) expectations about how spread is knowledge about semantic or encyclopedic objects (or even terminological objects) across such social categories that are age groups, professional environments, and so on (Nissan 1995, 1987b). As to the schema for multilingual lexical representation, it was developed from the turn of the decade. It includes an architecture, a representation in terms of graphs, and an attribute-nesting schema, to subserve a knowledge-base on a set of languages related either phylogenetically or by lexical borrowing. The exemplification has in the main concerned Semitic languages. The schema developed for this project accounts for the relation between constellations of semantic concepts and constellations of derivatives (related by roots and morphologic derivation patterns, parallel because of cognacy in a given family of languages). Structures in my multilingual, nested-relation knowledge-base schema comprise the following (while not necessarily all of them):
and optionally also:
Moreover, other kinds of knowledge-representation can be incorporated in the nested relations, especially charts, or then partitioned semantic networks, a widespread formalism from artificial intelligence. This may be appropriate for expressing complex information that would not fit easily in a preordained standard pattern. Of the charts in particular, I devised a kind that is suited for expressing the relation between roots and morphological derivatives, and the respective semantic concepts they convey, in the perspective of historical dictionaries for a given language family. Such charts can be drawn, then translated into the database representation in frames (here, this term is being used interchangeably with "nested relations"). Let each sheet of the graph representation be subdivided into an upper part, reserved for the universe of semantic concepts, and a lower part, reserved for the lexicon, that is, the union of the set of lexical roots and of the set of derivatives (or terms in general). In the top part of a sheet, semantic concepts are graphically enclosed in "clouds". Pairs of clouds can be united by "semantic edges". On nondirected semantic edges, if the label is the symbol in which an S surmounts a ~ sign, then the edge is nondirected and symbolizes semantic equivalence, actually, a tentative notion, as synonymy can be found only in context, not in isolation. More generally, a label being the symbol in which an S surmounts two superposed ~ signs means "semantically related". All other semantic edges are arcs (i.e., directed edges), and indicate semantic shift, whose nature can be expressed by a label. Unenclosed strings in the lexical part of the sheet indicate lexical entries: either terms, or then roots. If information is associated with a given string, it may be convenient to enclose them together in a box. Such lexical objects are connected to a semantic concept (a "cloud") by a nondirected edge, termed a "lexical/semantic edge". Arcs from roots to their derivatives ("derivation arcs") are contained in the lexical part of the sheet. Besides, lexical objects found there can be connected to each other by nondirected edges, labelled either by a symbol in which an M surmounts a ~ sign (for "morphological equivalence"), or by a symbol in which an M surmounts two superposed ~ signs (for "morphological near-equivalence"). In the semantic part of the sheet, arcs indicating semantic shift (or, in general, semantic aspects of etymology) between concepts are drawn as a dashed line, possibly labelled:
In our graph representation, we distinguish between
In the short compass of this paper, it would not be possible to explain in detail such a graphic representation, let alone the variety of kinds of frames corresponding to different kinds of information or knowledge about terminology. Anyway, let it suffice here to point out the difference between:
This way, an onomasiological (semantics-to-lexicon) frame also named "guava" may include a property stating which kind of frame it is (namely, a semantics-to-lexicon frame), and then a chunk (i.e., a subtree) which under the header (i.e., the attribute) LEXICAL-ENTRY would state how a guava is called in one or more languages. Moreover, another chunk in the same frame could, under the header (i.e., the attribute) SEMANTIC-SHIFT include pointers to other small frames, each describing a particular pattern of semantic shifts:
In turn, the lexical semantic concept "mouse" is involved in other semantic shifts in various languages: in the Emilia-Romagna region of Italy, "musclen", formed as a diminutive which literally means "little mouse", names the kiwi fruit, which quite evidently happened in the last few decades. Better known is the shift of the Latin noun "musculus" from the sense "little mouse" to the sense "muscle", with a calque (`akhbar) being attempted in mediaeval Hebrew medical terminology, whereas a successful shift in contemporary Hebrew has the same term, `akhbar (literally: "mouse") denote, like the English "mouse", its sense from computing equipment. For Luganda, a language from Uganda, browsing the Kitching and Blackledge Luganda-English dictionary, I found it to list a zoonym for "chameleon" that also means "muscle". For the semantic shift of a term from the sense "muscle" into the sense that is subordinate to "reptilian", the classical example in onomasiology is the Latin "lacerta" (i.e., "lizard" and "arm muscle"). Representing these informations by using my schema of representation is straightforward. In the forthcoming monograph, it is shown how less pictoresque kinds of information that arguably are more central to the usual concerns of the developers of terminological databases can also be represented handily by resorting to the same approach. For example, Appendix B of the present paper shows the CuProS code of the metarepresentation of frames of lexical roots; the syntax cannot be explained in full here, but comments are included.
Appendix A: A semantic/encyclopedic frame for the lexical concept "guava" Lay or technical common-sense features of guava fruits and plants can be represented as shown in the nested relation shown here. N.B.: whatever follows a semicolon on a line is just comment.
Appendix B: The CuProS code of the metarepresentation of frames of lexical roots
Bibliography D. Geeraerts (1983), "Reclassifying Semantic Change", in Quaderni di Semantica, 4(2), pp. 217-240 F. Johnson (1939), A Standard Swahili English Dictionary, Oxford University Press, Oxford A.L. Kitching and G.R. Blackledge (1925), A Luganda-English and English-Luganda Dictionary, The Uganda Book Shop, Kampala E. Nissan (1986), "The Frame-Definition Language for Customizing the RAFFAELLO Structure-Editor in Host Expert Systems", in Proceedings of the First International Symposium on Methodologies for Intelligent Systems (ISMIS'86), Knoxville, Tennessee, ACM SIGART Press, Z. Ras and M. Zemankova, eds., New York, pp. 8-18 E. Nissan (1987), "Nested-Relation Based Frames in RAFFAELLO. Representation & Meta-representation Structure & Semantics for Knowledge Engineering", International Workshop on Theory and Applications of Nested Relations and Complex Objects, Darmstadt, Germany, INRIA, France, pp. 95-99 E. Nissan (1987), "Knowledge Acquisition and Metarepresentation: Attribute Autopoiesis", in Proceedings of the Second International Symposium on Methodologies for Intelligent Systems (ISMIS'87), Charlotte, North Carolina, Z. Ras and M. Zemankova, eds., North-Holland, Amsterdam, pp. 240-247 E. Nissan (1987b), "Exception-Admissibility and Typicality in Proto-Representations", in Proceedings of the First International Conference on Terminology and Knowledge Engineering, Trier, Germany, 1987, H. Czap and C. Galinski, eds., Indeks Verlag, Frankfurt, pp. 253-267 E. Nissan
(1992), "Deviation Models of Regulation: A Knowledge-Based Approach",
in E. Nissan (1995), "Meanings, Expression, and Prototypes", in Pragmatics and Cognition, 3(2), pp. 317-364 E. Nissan
and H. Weiss (1995), "The HyperJoseph Project. Part B: A Representation M.Z. Özsoyoglu (1988), ed., Special issue on Nested Relations, The IEEE Data Engineering Bulletin, 11(3), IEEE R. Sappan (1987), The Logical-Rhetorical Classification of Semantic Changes, Braunton, Merlin R.B. Serjeant (1988), Review of The Spoken Arabic of Khabura on the Batina of Oman, by A.A. Brockett (Journal of Semitic Studies Monograph, 7. University of Manchester, 1985). Journal of Semitic Studies, 33(1), pp. 146-149
|
||