Using Natural Language in Software Development

This paper has been submitted for publication in the Journal of Object-Oriented Programming - Report on Object Analysis and Design. Once published, copyright transfers to that journal. Authorization for copies should be obtained there. This early draft is being made available for professional peer communication.

Nik Boyd

Using Natural Language in Software Development

Software developers translate ideas into code. However, a treacherous chasm yawns early in the software development process. On one side of this gap is the natural language used to describe customer problems and solution usage requirements. On the other side are the formal languages used in software development for analysis, design and programming. How can we bridge this gap? This paper describes techniques for transforming natural language into rhetoric suitable for conceptual modeling. These techniques help fill the gap between the informal natural language used to describe problems and the formal modeling languages used to specify software solutions.

First, the nature of the problem is considered - the ambiguity and vagueness inherent in natural language and the need for precision and comprehension in formal models. Next, the value of bridging the gap between informal natural language and formal conceptual models is considered - the need for commonality among stakeholders and developers, and the preparation of problems and requirements for requirements management and object-oriented analysis. Previous work regarding the use of natural language in software design is surveyed and considered. Further uses for natural language in software development are then proposed - a set of goals and guiding principles, a framework for applying linguistic metaphors to software design, and some initial suggestions for analyzing natural language to develop conceptual models. Some final observations and suggestions for further investigation are offered in conclusion.

THE DESCRIPTION OF PROBLEMS AND REQUIREMENTS

Software development invariably begins with some human need or desire - to explore or solve a problem, to automate a business process, to entertain, to educate, to communicate, to transact and track commerce, to explore and share information and knowledge, etc. The human needs that have been touched by computers are truly varied and numerous. However, we often have difficulty describing precisely what we want and need computers to do, and why. So, we must first serve the human need to express and understand problems. For, how can we hope to solve a problem without first making it intelligible?

We use natural language to describe our needs and problems, but it's often complex, vague and ambiguous. Sentences are complex when they contain clauses and phrases that describe and relate several objects, conditions, events and/or actions. Sentences are vague when they contain generalizations, or they are missing important information, especially the subject or objects needed by a verb for completeness. Sentences are ambiguous when they are open to multiple interpretations. All these troubles arise (naturally) when we discuss our needs and problems using natural language.

On the other hand, software requires more precision, formality and simplicity than that commonly found in natural language. Computer programming languages typically limit expressions to a few simple computational primitives and their combinations. Computational primitives are often hardwired into the syntax of a programming language. Software components and classes can be built from these primitives, but their construction is typically tedious and error prone. Also, the translation from requirements to code often reduces, eliminates or distorts much of the original meaning intended by the requirements.

Given these factors, it's not surprising that the translation of natural language descriptions into usable software poses quite a challenge. We need ways to reduce ambiguity and complexity without sacrificing the richness and meaning of natural language. We need to be able to clearly and consistently express and share our intentions: our purposes, objectives and goals, and the obstacles and problems that provide opportunities for software-based solutions. Ideally, programming languages will eventually increase our ability to approximate the expressive power of natural language. Meanwhile, we need ways to systematically map our rich and meaningful ideas into the limited forms of expression supported by programming languages.

DEVELOPMENT REQUIRES EFFECTIVE COLLABORATION

Many people collaborate to specify and build software systems. Stakeholders include customers, sponsors, domain experts, and system users. Developers include analysts, designers, programmers, and quality assurance staff. Managers include product managers, project managers, and quality managers. The people who participate in problem and solution specification may play many of these roles, but they always share a common need. All participants need to share a common understanding, i.e., mental models of domain problems and how software solves them.

The initial stages of problem specification usually involve the various stakeholders and a requirements analyst. Each stakeholder has a viewpoint and opinions regarding the nature of the problem(s) to be solved and the needs to be satisfied by the software solution. All these (sometimes disparate) viewpoints need inclusion, reconciliation and representation in the problem specification. The requirements analyst often serves as a mediator among the stakeholders to establish consensus regarding needs and priorities. Then, the requirements analyst inevitably serves as a translator between the stakeholders and the software developers, who often speak very different languages. Stakeholders speak in terms of business needs, goals and perceivable functionality. Software developers speak in terms of models, design and programming, size, effort, and schedule.

It is a responsibility of the requirements analyst to understand the limitations of modeling formalisms and natural language. The requirements analyst can help stakeholders reshape their thinking to achieve their desired outcomes, often by rephrasing problems and reframing¹ the contexts within which those problems are embedded. Requirements analysts need to be able to explore alternatives for modeling, but stakeholders must validate that the analytic conceptual models match their mental models. So, we still need natural language for the humane expression of those models. Problems and requirements begin with natural language. For the benefit of (often non-technical) stakeholders, they must also finally be expressed with natural language, even though they must also be analyzed and formalized for translation into software.

The importance of effective communication between all the participants involved with information systems cannot be overstated. The quality of the communication between people determines their commonality² - the degree to which they share mental models of domain problems and solution usage requirements (Figure 1). Commonality ultimately determines the quality of the resulting software solution - the degree to which the software solves the domain problems and satisfies the usage requirements.

Commonality supports software quality by insuring that the people involved in specifying a problem and those involved in solving the problem share a common understanding of the problem. Without commonality, the likelihood of developing a satisfactory and usable software solution is drastically reduced. So, stakeholders and developers need to share overall and detailed mental models of the relevant problem elements, issues and rules, as well as the objectives and goals that need to be satisfied by a software solution.

The kinds of linguistic analysis discussed in this paper are based on the traditional word categories found in English grammar.³ Through sentence and word analysis of ordinary informal discussions, we can extract a great deal of information useful to object-oriented software designs. We can rephrase sentences so that they are meaningful on multiple levels and to multiple audiences, including the various kinds of stakeholders and software developers.

Figure 1. Commonality

With suitable constraints on syntax, natural language can serve as a rich source for natural conceptual models. Natural conceptual models retain much of the character of the natural language from which they were derived. Natural conceptual models can help people share and compare their mental models, and thereby help them establish commonality with respect to a particular problem domain. Conceptual models (in general) are especially valuable for large systems (they help manage complexity), for systems that involve medium to large teams, and for projects and products that endure for several years (they help a business transfer knowledge and train staff).

Linguistic analysis and conceptual models can also dramatically ease the process of object-oriented analysis. Conceptual models provide a preliminary vocabulary and relational models for the elements of a problem domain. In turn, these models ease the task of object-oriented analysis - the analysis work becomes more one of organization and selection rather than raw object discovery. Also, conceptual models provide a link from the object-oriented analysis models back to the source material found in the original problem descriptions and solution usage requirements. This link directly supports greater traceability - from code back through design and analysis models, back through conceptual models to the problem descriptions and usage requirements.

Do Methodologies Address the Problem?

Software development methodologies (SDMs) typically begin only after a problem has been articulated (Figure 2). Each SDM prescribes a relatively mechanical process for transforming requirements into code. So, SDMs tend to be solution oriented rather than problem oriented. While SDMs may recommend the documentation of problems and requirements, they seldom provide any guidance regarding how to describe a problem in the first place. In this regard, object-oriented methods are no better than structured methods.

SDMs usually presume that a problem to be solved and the uses of a software solution have been well articulated. Waterfall SDMs further presume that the requirements will not change substantially between the time they were collected and the completion of system development. However, many years of industry experience have shown both these presumptions to be flawed (in both theory and practice). More often than not, problem descriptions and usage requirements are informal, imprecise, incomplete, and/or internally inconsistent. Even when a problem has been relatively well articulated, usage requirements on a software solution often change, especially in business domains where products and services must quickly respond to customer demands and competitive market pressures.

Figure 2. Missing Steps

If anything, these factors accentuate the need for requirements elicitation, analysis and management. Requirements provide the baseline against which software developers can measure their solution to determine whether it solves the problems and satisfies the needs of stakeholders. Substantially as a result of these factors, requirements management has become an important area of practice for software development. The Software Engineering Institute (SEI) has characterized requirements management as key practice area (KPA) #1 in its Capability Maturity Model (CMM).⁴ Requirements management provides the basis for making a software process repeatable. So, it serves as the foundation on which the remainder of the CMM rests. Any methodology that does not explicitly address requirements and that does not include requirements management is fundamentally incomplete. It does not address the real problem of software development - how do we engineer customer satisfaction? At least part of the answer must be to build our models using their language - i.e., their vocabulary and mental models. So, natural conceptual models can serve as a good starting point for requirements management - because natural conceptual models add formality over natural language while retaining proximity to their natural language origins.

RELATED WORK

In the past, the utilization of natural language elements to influence software design has been largely tacit and informal. During the 1970's, Smalltalk⁵ appeared and offered the opportunity for explicitly "object-oriented" programming. Smalltalk was derived from earlier programming languages, like Simula,⁶ that focused on objects, especially objects that were independent of particular programs (i.e., they were reusable). The extensible keyword syntax of Smalltalk directly supports the modeling of natural language expressions, especially those languages which support a subject - verb - object (SVO) syntax order. An example of this capability will be provided later in this paper.

In the late 1970's, Halstead⁷ analyzed English prose for operands and operators in his pioneering work on software science. Unfortunately, rather than taking a fresh look at the classification problem, Halstead's categorization of operands and operators from natural language elements was based on previous work by Miller, Newman and Friedman.⁸ Also, his categorization of numbers was (admittedly) inconsistent. In spite of these limitations, his explorations raised the possibility that software theory and quantitative analysis might extend to natural language, and they suggested the existence of a mapping from natural language to computational primitives (operands and operators).

In the early 1980's, Abbott^9,10 proposed an approach to Ada program design based on linguistic analysis of informal strategies written in English. Thereafter, Booch^11,12 extended this approach into an explicitly object-oriented design process. Abbott's approach involved developing an informal strategy using natural language and then formalizing the strategy by identifying the data types, objects (variables of those types), and operators (applied to those objects). Congruent with the object-oriented approach, Abbott's work focused on the use of nouns and noun phrases as references in natural language, especially common nouns, proper nouns, mass nouns, and their units of measure. Common nouns suggest data types (i.e., object classes). Proper nouns and references suggest objects. Verbs, attributes, predicates, and descriptive expressions suggest operators. Control structures are suggested by English phrases using if, then, else, for, do, until, when, etc. Abbott's work provided an initial set of heuristics for mapping natural language elements to operands and operators (i.e., objects and their methods). He concluded the following:

"The main lesson to be learned from this exercise is that programs can be developed in terms that match intuitive data types and operators. The concepts used to understand a problem originally are generally the best concepts to use to write the program to solve the problem. This is not to say that the first idea one has is necessarily the best approach to take. It is often the case that one's original idea for an algorithm can be greatly improved. Nevertheless, it is usually a good idea to identify and formalize the intuitive concepts, that is, data types, with which the program is concerned."

This paper also supports the use of natural language as the conceptual basis for software development. Retaining as much as possible of the syntax and semantics of natural language problem descriptions in our software models has clear advantages for explaining those models and for transferring to others the original domain knowledge upon which they were based. We can also extend Abbott's admonition against the naïve use of natural language. The initial description of a problem may need to be examined critically and rephrased in order to discover the "best" concepts with which to represent and solve the problem (even before consideration of which might be the "best" algorithm for its solution).

In 1989, Saeki, Horai, and Enomoto¹³ proposed a software design process based on natural language. Their work elaborates upon and complements the heuristics offered by Abbott. They focused particularly on the identification of dynamic system behavior as expressed by the verbs in a natural language description. Their work offers many useful ideas regarding the information needed to represent the relationships that exist between natural language elements and some rules for selecting message senders and receivers. However, care must be exercised, as without appropriate balance, a focus on verbs can skew the orientation of software designs away from objects toward processes and functions.

In 1990, Carasik, Johnson, Patterson, and Von Glahn¹⁴ exposed the limitations of using entity - relationship models to define semantic intensions, and they argued persuasively for a unified view of meaning and for the usage of conceptual modeling languages and knowledge representation techniques to represent meaning. They argued that the division of the world into noun (entity), verb (relationship), and adjective (value) traditionally used by entity - relationship models is not helpful for the formal representation of meaning, and that these distinctions are not significant for conceptual models. However, while conceptual equivalents often exist between some noun, verb and descriptive adjective, the selection of nouns, verbs or adjectives to articulate domain semantics is often not arbitrary to stakeholders. While purified conceptual models may be ideal for knowledge representation, reducing conceptual models to basic roles and case relations between concepts makes them too arcane for general use by stakeholders. So, while entity - relationship models may not finally be adequate to the task of domain description, conceptual models need not diverge so far from natural language as to be unintelligible to stakeholders.

In 1992, Cockburn¹⁵ investigated the application of linguistic metaphors to object-oriented design. Cockburn's investigation leads us toward mechanisms for reducing adverbs and adjectives to verbs (that apply to nouns). While his investigation does not supply one, a linguistic model for such transformations between the primary word classes is rather straightforward. This paper offers such a model and suggests that adverbs and adjectives should not merely be reduced to their corresponding verbs. They need to be considered part of a larger framework - adverbs, adjectives and even verbs may also be reduced to nouns. We also explore the use of language as a metaphorical basis (analogical source) for the structure (syntax) of objects and object messages, and for the naming (semantics) of software components - class names, object references, variable names and method names.

Again in 1992, Cordes and Carver¹⁶ introduced one of the first attempts to apply automated tools to requirements analysis and the automatic generation of object models from requirements documents. While the translation of the initial requirements into a suitable knowledge base requires human interaction to resolve ambiguities, the subsequent translation of the domain knowledge into object models is automated. Cordes and Carver acknowledged that the translation of formalized knowledge into object models is sensitive to the quality of the initial requirements specification (as one would expect). Still, their process and tools can help a requirements analyst begin to bridge the gap between informal requirements and formal software models. The prospect of automating more of the analysis process is certainly intriguing.

More recently, the RECORD (REquirements COllection, Reuse and Documentation)¹⁷ project at Umeå University in Sweden has integrated a number of tools and techniques, including natural language processing, with the intent of providing a complete solution for requirements collection, analysis, management and object-oriented modeling. While the tools are still under development, the project goals are encouraging.

Over the past 25 years, we can observe that the object-oriented approach has been progressively extended to cover programming, design and analysis. This tradition and trend continues with the extension of the object-oriented approach into the realms of requirements elicitation, domain description and modeling.

GOALS AND PRINCIPLES

The primary goal of this work is to make natural language suitable for conceptual modeling. Toward this end, we want analytic processes that produce a consistently (but not overly) simple syntax and that reduce complexity, ambiguity and vagueness without sacrificing completeness and essential meaning. These processes for simplifying natural language need to be straightforward, so that people can learn and apply them easily. The mapping from simplified natural language to conceptual models (and vice-versa) needs to be relatively obvious, so that people can readily comprehend the conceptual models and see their linkage back to the natural language from which they originated. The mapping from conceptual models to object-oriented analysis models needs to be suggestive, so that further analysis and object-oriented design can easily select and elaborate upon the essential domain and usage concepts.

The analytic practices recommended in this paper are based on several observations regarding natural language, including the relative dependencies that word meanings have upon each other, the utility that conceptual models provide for establishing commonality among stakeholders, the inherent limitations that models have in terms of scope and time, the equivalences that can be found between nouns, verbs, and adjectives, and the consequent relevance that all words contribute to our comprehension of problems and requirements.

Conceptual Relativity

Words only have meaning in the context of their usage. So, concepts only have meaning within the context of other related concepts. Each concept has a web of associations with other related concepts. Conceptual models make concepts and their relationships explicit through graphs and narratives. Such formal analytic treatment makes the models sharable and comparable.

Graphical models give us overviews of the relationships between concepts, allowing us to better integrate and comprehend them. Narrative descriptions provide us with opportunities for applying our linguistic knowledge and the richness of linguistic metaphors. Usage models and the design of human-computer interactions give us opportunities to apply many kinds of metaphors, including visual and gestural metaphors, in addition to the linguistic metaphors considered in this paper.

Conceptual Utility

Conceptual models can serve as the foundation for establishing commonality and developing software solutions. Commonality exists when people share similar ideas. Quality software development requires commonality. Conceptual models provide a formal mechanism for sharing and comparing the constructions people make internally - i.e., their mental models.

Commonality also provides stakeholders with the opportunity for greater participation in the development of software solutions. As stakeholders formalize their knowledge and needs, software developers become less burdened by interpretation and more free to concern themselves with translating the concepts from models into code. In turn, this shifts more responsibility for the quality of the system specifications onto the shoulders of the stakeholders, rightly giving them more control over the quality of the final system.

Conceptual Limitation

Conceptual models focus on selected relationships between selected concepts. Conceptual models are convenient and valuable within a limited domain and for a limited time - i.e., until more elegant and complete models are developed. Renovations in conceptual models are normal. Renovations can be expected until our understanding and models of a problem domain mature. Even after our understanding of a domain has stabilized and then matured, occasional revolutions can occur. Such conceptual revolutions are often the result of paradigm shifts (revelations) and should be considered a healthy aspect of system growth and maturity.

Object-oriented software development projects exhibit a natural cycle of expansion and consolidation. Over time, new objects are added to and integrated with existing objects in a system. As an object system grows, its design often shows signs of stress or brittleness - initial designs rarely have a perfect or long term fit. Continued growth may become very difficult without redesigning some parts of the system. During system evolution, consolidation into the design of the lessons learned provides a system with the potential for continued growth. Systems evolve holistically - i.e., as our understanding of the problems and requirements grows, so our conceptual models, analytic models, designs and implementations also evolve. This fact underscores the importance of integrated tools that support the coevolution of analyses, designs, and implementations - i.e., round-trip engineering.

Conceptual Equivalence

There is a property of words analogous to the wave - particle duality that exists in physics. In physics, it's simply a point of view and interaction whether a physical entity behaves like a wave or a particle. Similarly, an essential concept can be expressed as a noun, a verb or a descriptive adjective. In other words, under each set of semantically related noun - verb - adjective combinations lies an essential concept.

When examined, we find that each essential concept has a root expression as a noun, a verb, or a descriptive adjective. The expression of a concept begins in one of these three word classes. However, by appending (or removing) appropriate affixes, many instances of these word classes can be transformed into another. Figure 3 shows a representative sample of these kinds of transformations.

Figure 3. Word Class Transformations

However, notice that not all of these transformations make sense for all words, even if they are technically feasible (which is also not always the case). For example, the adjective blue does not lend itself well to suffixes like -ity (blueity), -ize (blueize), or even -ly (bluely). So, some potential word derivations are simply non-sense. Others may have some semblance of meaning, but only in a prosaic or poetic context, not in a computational one. Where a set of semantically related words does exist, the underlying essential concept can be said to be independent of any specific word class. Alternatively, we could say that all three word classes (noun, verb, and adjective) taken together provide the fullest expression of an essential concept.

This principle has significance for object modeling. While nouns and verbs have no primacy in natural language, they are of paramount concern in object-oriented software models. We want to find useful subjects and objects - i.e., domain objects with usable faces. The principle of conceptual equivalence allows us to reshape statements until they are useful for building conceptual models, especially models that focus on objects (nouns, operands) and their associated faces (verbs, operators). Simplifying phrases and sentences to focus on nouns and verbs allows us to more easily design analogous software object relationships and messages. It also allows us to better trace our designs back to their conceptual origins.

Conceptual Relevance

Most object-oriented methodologies only consider nouns and verbs to be relevant during analysis, largely because our fundamental computational model supports only operands and operators. However, consideration of the other word classes can also add significant value to the collection and analysis of problems and requirements. Considering the import of adjectives, adverbs, prepositions, articles, conjunctions, and interjections during analysis enriches the process of object discovery and retains more of the semantic content and metaphoric richness of the source material. Every word in a statement can contribute value to a conceptual model. Therefore, none of the words in a statement should be eliminated arbitrarily or prematurely. Every word should be considered.

Nouns, verbs and descriptive adjectives contribute concepts.
Predicative prepositions contribute arguments to verbs.
Genetive prepositions contribute relations to nouns.
Limiting adjectives contribute cardinality information.
Articles can also indicate cardinality.
Descriptive adverbs are often merely dressed up adjectives.
Conjunctions can link clauses to build rules.
Interjections can guide users toward correct system usage.

Figure 4 shows a simplified model of the relationships between the word classes, indicating how they function with respect to each other in phrases and sentences.

Figure 4. A (Very) Simplified Language Model

Extensive consideration of how the word classes function in relation to each other and the value that each word class contributes to conceptual models is beyond the scope of this introductory paper. A more detailed examination of each word class is available on the Web.¹⁸ Next, we consider the metaphors (analogs) between natural language and software design elements.

LINGUISTIC METAPHORS IN SOFTWARE DESIGN

Metaphors play a fundamental role in communication. Observe that natural language is rich with metaphors. Our words are pregnant with meaning. Metaphors are the lifeblood of natural language, dispersing inspiration and vital semantic depth throughout the social body of humanity. It seems only natural then to examine how we communicate. How can we preserve as much as possible the metaphoric richness of natural language while we solve the concrete design problems we encounter as software professionals?

Software objects communicate and collaborate by exchanging information, responding to requests received from their neighbors. So, software object interactions can be designed to resemble human communications. We can make our object-oriented software designs more intelligible by organizing and naming their parts based on linguistic analysis of problem descriptions and usage requirements.

We can extend linguistic metaphors over several levels of software organizational structure, from the structure of object messages to the interactions between societies of networked processes and resources. Whether or not intentional, the design patterns present in software resemble patterns in human communication. We can distinguish the following levels and analogs for software objects versus natural language elements (see Table 1).

Table 1. Linguistic Metaphors in Software Design

Messages	resemble sentences	Method names resemble verb and noun phrases. Instance and class names resemble sentence subjects and objects.
Methods	resemble scripts	Methods are composed of message sequences that resemble conversations between agents.
Classes	resemble conversational agents	Classes have distinct responsibilities (knowledge and behavior), areas of expertise (facets), and jargons (interfaces). Class names often include nouns and descriptive adjectives. Interface names may use verbs that have been converted to descriptive adjectives - e.g., Serializable.
Subsystems	resemble collaborative teams	Each member (class) in a subsystem / package plays a distinct role with concomitant responsibilities.
Processes	resemble agencies	Finished software products assemble and reuse packaged classes into individual processes. Each process provides services and work products (akin to a small business).
Intranodal clusters	resemble communities	Locally interacting processes resemble provincial human communities (or enterprises).
Internodal clusters	resemble societies	Globally interacting processes resemble global human societies (or global enterprises).

Syntactic Metaphors: Messages and Sentences

Software object messages resemble natural language sentences. More specifically, the overall order of object message elements resembles the syntax of a simple subject - verb - object (SVO) sentence. The message receiver plays the role of the sentence subject. The message pattern plays the role of the verb, incorporating any predicative prepositions associated with the verb. The message arguments (if any) play the roles of the sentence objects.

The original phrasing of intentions may need refinement and rephrasing to make them suitable for modeling. Consider the following intention and its refinements.

Intention - pay my current phone bill

Refinement - pay my current phone bill using funds from my checking account

Refinement - BofA account #12345-67890 transfer $62.50 US dollars to GTE account #310-399-8888

The syntax of the programming language Smalltalk is uniquely effective at modeling sentence syntax. Smalltalk keyword message patterns can be designed so that they resemble sentences and noun phrases. The following expressions provide an example of how Smalltalk naming conventions can be used to model sentence structures.

"Smalltalk example"
funds := Money withAmount: 62.50 ofCurrency: 'USD'.
sourceAccount := CheckingAccount 
    forBusinessNamed: 'BofA' accountNumber: '12345-67890'.
targetAccount := BillingAccount 
    forBusinessNamed: 'GTE' accountNumber: '310-399-8888'.
sourceAccount transfer: funds to: targetAccount.

When used in this manner, Smalltalk can serve as a conceptual modeling language as well as a programming language. It provides a significant degree of built-in traceability from the code back to the original problem description. On the other hand, rather than tagging message arguments with meaningful words, the syntaxes of most other programming languages require that message arguments be supplied in comma-separated lists delimited by parentheses.

// Java example
funds = Money( 62.50, 'USD' );

sourceAccount = CheckingAccount( 'BofA', '12345-67890' );
targetAccount = BillingAccount( 'GTE', '310-399-8888' );
sourceAccount.transfer( funds, targetAccount );

Note how this mapping eliminates the predicative prepositions associated with the verb. While this kind of syntax clearly limits the expressiveness of a programming language, it still allows us to model the nouns and verbs found in natural language, and thereby support a basic level of traceability.

Facial Metaphors: Interfaces, Facets, Surfaces

Cognitive scientists have shown that face recognition plays a important role in our cognitive processing. So, what relevance do faces have for software design? Our faces serve as our primary means of communicating with each other. So, it seems natural to consider the faces that software components use to communicate with each other and with users. Three kinds of faces are central to the design of software components: interfaces, facets, and surfaces (i.e., human-computer interfaces).

As noted previously, software object messages can be designed to model natural language sentences. Another aspect of this metaphoric relationship between software object messaging and natural language can be expressed as follows - software objects are specialists. Each software object provides a specific set of services and understands only a limited set of messages. Thus, the expressions they understand and utilize are like jargons - i.e., specialized languages.

Organizing these jargons has become an important element of practice in object-oriented analysis and design. Such jargons are typically organized into interfaces (faces between parts) that are named groups of messages. Recent advances in programming models directly support this kind of organization. The Java programming language includes syntax for defining and implementing interfaces. Microsoft COM and OMG CORBA both have their own software object interface definition languages (IDLs).

In the realm of domain analysis, the use of facets (little faces) has emerged as an important technique for classifying the features of software components (Figure 5).¹⁹ Faceted classification of software parts supports systematic reuse of analysis and design, as well as software component reuse. Facets reflect different aspects of the software components they describe. They can reflect the viewpoints of different stakeholders, or different roles that a component may play in various relationships.

Figure 5. Assets have Facets

In human-computer interactions, information and services from software objects are projected onto the surfaces (faces over objects) of graphical and/or textual displays. Users observe and manipulate software object models through such views. The metaphors used to design the "look and feel" of these displayed surfaces have evolved over several years. Recently, such surface designs have begun to evolve from merely graphical user interfaces (GUIs) to truly object-oriented user interfaces (OOUIs). Object-oriented user interfaces support direct manipulation of the underlying software objects via appropriate and intuitive metaphors and affordances presented on display surfaces.

Interfaces, facets and surfaces all serve as faces for software objects. Each face has its own language appropriate for discussion, analysis and design. However, the underlying elements of these languages all have a common basis in the metaphors offered by natural language.

Social Metaphors: Roles and Responsibilities

Given the metaphorical similarities between software object communications and human communications, it seems natural to organize software object systems similar to the ways in which human groups are organized. In fact, it has been shown that the best object-oriented software designs often result from a responsibility-driven approach.²⁰ Collaborating software objects are assigned roles and responsibilities within a system.²¹ Responsibilities for knowledge and behaviors are distributed appropriately among the collaborating software objects. Decisions regarding the assignment of responsibilities are often captured on Class-Responsibility-Collaborator (CRC) cards (Figure 6).

Figure 6. CRC Card

The fitness of the assignments of responsibilities depends on a number of factors primarily related to the localization of data and functions close to their use. Recognition of the importance of these design optimizations led to the development of the Law of Demeter.²² This law prohibits the cascading of generic accessor functions (each of which returns an object that existed before the call), but allows the cascading of constructor functions (each of which returns an object that did not exist before the call). The Law of Demeter supports coupling control, narrow interfaces, information hiding, restriction and localization. So, the Law of Demeter serves as a useful heuristic for deciding how best to distribute responsibilities in an object system.

Because several qualities are desirable in object-oriented designs, the responsibility expressed by a verb may span multiple objects in the final system design. Responsibility may be partly distributed over several collaborating objects in a system, or it may be further decomposed within a given object class. Each related method signature may have greater or fewer arguments depending on the computational needs of each method. So, very general action verbs may need to be decomposed into their constituent actions in relation to the objects that undergo changes.

The assignment of responsibilities to an object depends on the role(s) it plays in a system. Roles serve as an increasingly important metaphor for communicating object-oriented software designs, and recognition of their importance has grown in recent years. The codification of object-oriented software design knowledge in Design Patterns²³ is founded in part on the metaphor of roles. Software design patterns describe reusable collaborations between design elements. Each design element plays an identifiable role with well defined responsibilities.

Community Metaphors: Distributed Objects and Networked Agents

Networked and independent processes communicate with each other through message exchanges. So, the metaphors for designing communicating processes are analogous to those used to design interacting objects, especially if the processes are themselves composed of objects. This has led to the development of distributed object systems, extending the software object metaphors beyond the scope of a single process and across the communities of computers interlinked throughout the world. As a result, we are seeing the emergence of software agents that serve as human representatives in virtual communities. As we evolve these agents, they are becoming interactive, autonomous and self-replicating (viral), and useful for searching, querying, accessing, filtering, and reporting from networked information resources.

There are also efforts to make these software agents exhibit a kind of intelligence by supplying them with encoded knowledge in formalized ontologies. The knowledge encoded by such ontologies embodies conceptual relationships, purified semantic information that is primarily linguistic in origin.

Expressive Limitations

Over two decades of industry experience has shown the benefits of building software systems with objects, including rapid deployment, easier maintenance, and substantial reuse of both code and design. Software objects provide a natural and convenient way to organize and model systems in software. The object-oriented approach creates significant opportunities for building software models of high quality and conceptual integrity. The elements of the best software object system designs closely resemble those of the domain they model, including their structure, organization, and behavior.

There are also some limitations in the object-oriented approach. Today's commercial object-oriented programming languages still limit object designs to the embodiment of noun and verb phrases. If used at all, the other word classes are only used in naming conventions for classes, instances and methods. Unfortunately, the limitations of programming languages have profoundly influenced most analysis and design methodologies.

However, the limitations in the expressiveness of object-oriented programming languages need not limit how we analyze problems and requirements. The consideration of only nouns and verbs in previous analytic approaches is incomplete. Limiting our analytic considerations to only nouns and verbs prematurely limits and artificially impoverishes the conceptual models we build, which in turn has direct consequences upon the software systems we build.

NATURAL CONCEPTUAL MODELS

The modeling languages used for software development tend to be elaborate and complex (e.g., the Unified Modeling Language²⁴ - UML). The primary reason for this is that models constructed with these languages need to include a level of detail sufficient to specify the design of a software system. While simplified versions of these languages may be used during analysis, these languages tend to focus on solution modeling rather than problem modeling. These kinds of models are appropriate for software developers, but they are not especially suitable for stakeholders (who are not generally trained in their construction and interpretation).

The languages used to model conceptual structures (e.g., Sowa's Conceptual Graphs²⁵ - CG) focus on the semantics of natural language and tend to be syntactically simple, but (partly as a result of their simple syntax) the models they produce tend to be large and difficult to interpret. Conceptual structures are appropriate for machines and computational linguists. They provide a mechanism for representing knowledge and the semantics of natural language expressions. They provide a level of formality suitable for manipulation by computers - i.e., for natural language processing and natural language understanding systems. While conceptual structures are derived from natural language, they are arcane and unintelligible to lay people (i.e., not suitable for general human consumption and interpretation).

Stakeholders need a modeling language that provides some degree of formality, while remaining very simple and close to natural language. These motivations led to the development of a natural conceptual modeling language (NCML) to support natural conceptual models. Natural conceptual models have both a graphical form and a textual form. However, this paper focuses on the textual form. Even with suitable modeling tools, building and maintaining graphical models can be time consuming - i.e., costly. This is not to say that graphical models are not valuable. They are, but a balanced approach to their use is recommended (e.g., they are useful for documentation, especially for the "hot spots" in a system that exhibit the most complexity and need the most explanation). For an introduction to the graphical language and a complete example of its use, please refer to the related paper available on the Web.²⁶

Syntactic Normalization and Semantic Exploration

People have twisted minds and formulate complex thoughts. At least, they often express their thoughts with complex sentences. This complexity makes human communication more efficient by reducing or eliminating redundant phrases. Such eloquence is certainly appropriate for ordinary rhetoric. However, to make sentences suitable for modeling, we need to greatly simplify their syntax, often at the expense of some apparent redundancy - e.g., repeatedly naming a subject in several related sentences.

There are several transformations that can be applied to sentences that preserve their overall meaning while producing simple and consistent syntactic formats. Collectively, we can characterize these transformations as syntactic normalization. Simplifying the syntax of sentences makes them less prone to ambiguity. Such simplicity can also help people to share and compare their mental models. Thus, the additional formality supports commonality.

Sometimes a complex sentence cannot be simplified to the desired degree without first exposing the meaning of some of its constituent phrases. In these situations, semantic exploration may provide clues for recovering nouns or verbs from descriptive adjectives and adverbs, and other kinds of phrases.

Normalized sentences are simple declarative sentences, sometimes called kernel sentences or nuclear sentences. The following definition extends that presented by Carasik, et. al.,¹⁴ while integrating distinctions identified by Abbott.¹⁰ The normal form of clauses have the following characteristics.

They use common nouns, mass nouns, units of measure, attribute nouns, direct references, and descriptive expressions.
They use unmarked number (singular) for sentence subjects (and other objects where possible).
They use unmarked mood (indicative), voice (active), number (plural), and polarity (affirmative) for verbs.
They use complete verbs with appropriate prepositions and objects.

Recommendations for determinants include the following:

Use the indefinite articles - a, an (or no article)	for most singular nouns to indicate one instance of many (potentially a class).
Use the indefinite pronoun - some (or no determinant)	for plural nouns to indicate the existence of many instances (of a collection or class).
Use the (distributive) indefinite pronoun - each	to indicate that an instance is a member of a collection (usually a class).
Use the definite article - the	only when a sentence subject (or object) refers to a domain singleton - i.e., there is only ever one instance of the referenced object.
Use a proper noun to refer to a singleton,	but the class(es) of which the instance is a member should also be identified.

The following table shows some illustrative examples of normalized sentences. Each example is preceded by the number of actants required by the example verb. The verbs listed are typical, including examples of both transitive and intransitive verbs. The initial examples are prototypical - i.e., the verbs only have placeholders for actants (x, y). The remaining examples identify appropriate prepositions, and the actants name thematic semantic roles that are appropriately associated with the example verb.

(1) x begins (2) x emits y

(1) x exists (2) a predator captures a prey

(1) x ends (3) a giver gives a gift to a recipient

(2) x becomes y (4) a sender sends a message to a receiver with a medium

A Representative Example

The following brief example was excerpted from a larger one. The full example describes the operations of a hazardous chemical storage depot regulated by the Environmental Protection Agency (EPA). The key problem associated with its operations is the allocation of storage resources (i.e., hazardous chemical drum storage space allocation).

"The drums are stored in special storage buildings; in the depot there are also buildings that house scientific and administrative staff. Each storage building is licensed to hold a maximum number of drums."

"Management has introduced a company regulation that requires the depot manager be able to monitor the depot and to always be able to check if the depot is in a vulnerable state. The regulation states that a depot is vulnerable if any two neighboring buildings contain the maximum number of drums."

The first sentence describes the depot buildings. Several applications of syntactic normalization are needed to simplify the sentence and split it into its constituent clauses. The second sentence describes the licensing of the storage buildings. A mix of syntactic normalization and semantic exploration is needed to reveal the (several) ideas contained in the sentence. Likewise, the sentences in the second paragraph need a mixture of normalization and exploration.

The asterisks (*) in Table 2 indicate those sentences retained in the final domain model. Candidate domain elements are indicated with bold. Candidate relations are indicated with italic. Rule formations are indicated by underlined conjunctions.

Table 2. Example Analysis


Convert verb to active voice		Special storage buildings store the drums.
Convert subject to singular	*	A storage building stores some drums.
Verb isolation		(Staff) buildings house scientific and administrative staff.
Simple generalization		scientific and administrative staff = staff members
Convert subject to singular	*	A staff building houses staff members.
Verb extraction - "in the depot"	*	The depot contains some storage buildings.
	*	The depot contains some staff buildings.
Categorization	*	The depot is a hazardous chemical storage facility.

Convert verb to active voice		Who licenses the storage buildings? The EPA licenses each storage building to hold a maximum number of drums.
Categorization	*	The EPA regulates hazardous chemical storage facilities.
	*	The EPA is a regulatory agency.
Verb nominalization - of store		A storage building stores drums = drum storage.
Verb nominalization - of license	*	The EPA issues a drum storage license for each storage building.
Implications	*	Each storage building provides drum storage.
	*	A drum storage license permits drum storage.
	*	A drum storage license limits (a storage building's) drum storage.
	*	Each storage building has a drum storage license.
Verb nominalization - of limit	*	Each storage building has a drum storage limit.

Verb isolation	*	Management introduced a company regulation.
	*	A company regulation requires depot monitoring.
Adjective nominalization - of vulnerable		vulnerable state = depot vulnerability
	*	The depot manager monitors the depot for a depot vulnerability.

Denominalization - of neighboring	*	A storage building neighbors (another) storage building.
Abstraction via nominalization		"building contains a maximum number of drums" = a full storage building
Detailed decomposition	*	A full storage building exists when its (storage) drum count equals its drum storage limit.
Concept definition (rule)	*	A depot vulnerability exists if a full storage building neighbors (another) full storage building.

Extended Example Available

Linguistic analysis of the remainder of this problem can be found on the World-Wide Web.²⁶ The Web example also shows the kind of results that can be produced by the step-by-step application of syntactic normalization and semantic exploration. The Web example includes the resulting graphical conceptual models and an introduction to the natural conceptual modeling language (NCML).

CONCLUSION

Natural conceptual models advance the practice of software development, linking linguistic analysis to domain and usage requirements analysis, object-oriented analysis and software design. It suggests that concepts and natural language elements can be directly linked to the design of software implementations, especially using an object-oriented approach. This paper highlighted the importance of a thorough examination of the problems and requirements expressed by domain experts and solution stakeholders.

Conceptual models can help to establish commonality among solution providers and stakeholders, helping them to build a shared understanding of their problems and how their software solutions will be used. Commonality helps to ensure the correct implementation of software solutions, and conceptual models help to link software object models back to the original problem descriptions and solution usage requirements.

Clearly, natural language analysis and conceptual modeling can have a profound influence on software design, especially when it is coupled to an object-oriented approach. The necessary discipline can be summarized in the following manner.

Syntax normalization - Keep sentence syntax simple.
Semantic exploration - Look for hidden nouns and verbs (especially in adjectives and adverbs).
Build and validate domain and usage vocabularies and models based on the discoveries offered by the first two steps.

Observations and Future Work

One of the primary motivations behind this work has been to discover whether and to what extent it can truly be said, with respect to software, that, "the description of a problem is its solution." On the surface, this idea seems naïvely simplistic, but the object-oriented approach to software design seems to support it. The application of linguistic metaphors to software design needs further examination and refinement, especially as they apply to design patterns and distributed computing systems. While many of the heuristics for linguistic analysis have been developed,¹⁸ more work is needed to establish a comprehensive and coherent treatment of the various word classes in the practice of natural conceptual modeling and object-oriented software development.

A number of practical uses for natural conceptual models are also possible and worth further investigation (even though some are speculative in nature).

Conceptual models could provide the foundation for model-based requirements management. When linked to object-oriented models for analysis, design and programming, problem and requirements models may provide better ways to quantify and manage feature sets for evolutionary system development - e.g., for iterative process models like HP's EvoFusion.²⁷
Some of the tools available to natural language understanding (NLU) systems make possible the development of a linguistic analysis assistant for problem modeling and reformulation. The resulting models could be fed into requirements management tools and CASE tools for analysis, design and implementation.
Natural conceptual models may also serve as a bridge to other kinds of formal models. They may have useful application in other areas, including ontology development, knowledge management, natural language modeling,²⁸ the study of metaphors,²⁹ and the application of speech-act theory to the design of information systems.³⁰
In the (distant?) future, natural language analysis and speech recognition might contribute to the practical foundations for a conversation-based natural language programming environment.

Acknowledgments

Thanks to Alistair Cockburn for coining the phrase "computational rhetoric" and for his continuing focus on issues related to the interfaces between humans and technology. His work continues to serve as a fundamental source of inspiration. Thanks also to Phil Shinn for his insightful review and comments on early drafts of this paper.

REFERENCES

Robert Dilts, John Grinder, Richard Bandler, Judith DeLozier, Leslie Cameron-Bandler. Neuro-Linguistic Programming, Vol. 1. Meta Publications, Santa Cruz, CA, 1979.
Stephen Lankton. Practical Magic: The Clinical Applications of Neuro-Linguistic Programming. Meta Publications, Santa Cruz, CA, 1979.
John Opdyke. Harper's English Grammar. Warner Books, Inc., New York, NY, August, 1987.
Mark Paulk, Charles Weber, Suzanne Garcia, Mary Beth Chrissis, Marilyn Bush. Key Practices of the Capability Maturity Model, Version 1.1 CMU/SEI-93-TR-025, Carnegie-Mellon University / Software Engineering Institute, Feb. 1993.
Adele Goldberg, Alan Kay, editors. Smalltalk-72 Instruction Manual. Technical Report SSL-76-6, Xerox PARC, Mar. 1976.
O. Dahl, K. Nygaard. SIMULA - an ALGOL-Based Simulation Language. Communications of the ACM 9(9):671-678, Sep. 1966.
Maurice Halstead. Elements of Software Science. Elsevier North-Holland, Inc., New York, NY, 1977.
G. Miller, E. Newman, E. Friedman. Length Frequency Statistics of Written English. Information and Control 1:370-389, 1958.
Russell Abbott. Report on Teaching Ada. Technical Report SAI-81-313-WA, Science Applications, Inc., Dec. 1980.
Russell Abbott. Program Design by Informal English Descriptions. Communications of the ACM 26(11):882-894, Nov. 1983.
Grady Booch. Object-Oriented Design. Ada Letters 1(3):64-76, Mar.-Apr. 1982.
Grady Booch. Object-Oriented Analysis and Design with Applications, 2nd Ed. Benjamin/Cummings, Redwood City, CA, 1994.
Motoshi Saeki, Hisayuki Horai, Hajime Enomoto. Software Development Process from Natural Language Specification. Proceedings of the 11th International Conference on Software Engineering (ICSE-11), IEEE Computer Society Press, 1989.
Robert Carasik, Steve Johnson, Donald Patterson, George Von Glahn. Towards a Domain Description Grammar: An Application of Linguistic Semantics. ACM SIGSOFT Software Engineering Notes 15(5):28-43, Oct. 1990.
Alistair Cockburn. Using Natural Language as a Metaphoric Basis for Object-Oriented Modeling and Programming. IBM Technical Report TR-36.0002, 1992.
David Cordes, Doris Carver. An Object-Based Requirements Modeling Method. Journal of the American Society for Information Science 43(1):62-71, Jan. 1992.
See the RECORD project at Umeå University - http://www.cs.umu.se/~jubo/RECORD.html
Nik Boyd. Natural Language Analysis for Domain and Usage Models. http://home.labridge.com/~nikboyd/papers/rhetoric/.
Ruben Prieto-Díaz. Implementing Faceted Classification for Software Reuse. Communications of the ACM 34(5):88-97, May 1991.
Robert Sharble, Samuel Cohen. The Object-Oriented Brewery: A Comparison of Two Object-Oriented Development Methods. ACM SIGSOFT Software Engineering Notes 18(2):60-73, Apr. 1993.
Rebecca Wirfs-Brock, Brian Wilkerson, Lauren Weiner. Designing Object-Oriented Software. Prentice-Hall, Englewood Cliffs, NJ, 1990.
Karl Lieberherr, Ian Holland, Arthur Riel. Object-Oriented Programming: An Objective Sense of Style. Proceedings of the ACM Conference on Object-Oriented Programming, Systems, Languages and Applications Sept. 1988. For further information, refer to the Demeter home page at http://www.ccs.neu.edu/research/demeter/.
Erich Gamma, Richard Helm, Ralph Johnson, John Vlissides. Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley, Reading, MA, 1995. For further information, refer to the Patterns home page at http://hillside.net/patterns/patterns.html.
Grady Booch, Ivar Jacobson, James Rumbaugh. The Unified Modeling Language. Rational Software, 1997. For further information, refer to the Rational home page at http://www.rational.com/.
John Sowa. Conceptual Structures: Information Processing in Mind and Machine. Addison-Wesley, Reading, MA,1984.
Nik Boyd. A Natural Conceptual Modeling Language. This paper introduces a new simplified graphical notation for conceptual models and provides an extensive example of its application to a sample problem. See http://home.labridge.com/~nikboyd/papers/educe/models/.
See references to Evolutionary Fusion on the Fusion home page - http://www.hpl.hp.com/fusion/.
See Natural Language Modeling - http://www.sharp-informatics.com/.
See the Metaphor Center - http://metaphor.uoregon.edu/metaphor.html.
Terry Winograd. A Language/Action Perspective on the Design of Co-operative Work. Human Computer Interaction 3(1):3-30, 1988.

Intention	-	pay my current phone bill
Refinement	-	pay my current phone bill using funds from my checking account
Refinement	-	BofA account #12345-67890 transfer $62.50 US dollars to GTE account #310-399-8888

(1) x begins	(2) x emits y
(1) x exists	(2) a predator captures a prey
(1) x ends	(3) a giver gives a gift to a recipient
(2) x becomes y	(4) a sender sends a message to a receiver with a medium