LEE HARTMAN

Southern Illinois University

Carbondale, IL, U.S.A.

Lhartman@siu.edu

 

Phono (Version 4.0):  Software for Modeling Regular Historical Sound Change

(In Actas [del] VIII Simposio Internacional de Comunicación Social, Santiago de Cuba, 20-24 de enero del 2003 [Santiago de Cuba, 2003], I.606-609)

 

1.  What Is Phono?

Phono is a software tool for developing and testing models of regular historical sound change.  A model consists essentially of an ordered set of sound-change rules.  The user types ancestor words on the keyboard, and the program displays on the screen the successive stages of development and the final descendant form.  The program was developed in conjunction with a model for Latin-to-Spanish (based mainly on Otero 1971 and Hartman 1974; see also Hartman 1985), but it is designed to operate models for any natural language.  For the Spanish model, the ancestor words are mainly documented (in Latin), and the purpose of Phono is to test hypotheses about sound-change rules:  their contents, their relative chronology, and their cumulative effects.  For models of languages with undocumented proto-forms, the program can test not only the contents and order of proposed sound-change rules, but also the proposed forms of reconstructed ancestor words.  This was the case when bin Muzaffar (1996a and 1996b) used Phono to model the development of Shawnee from Proto-Algonquian.

The prospect of modeling language evolution often raises questions that make it necessary to clarify what Phono will not do.  Phono deals only with regular change, although the derivation of some exceptional words can be simulated by temporarily masking a single rule in the chronological sequence (see Hartman 1986).  The program deals with just one line of descendancy at a time (rather than deriving several sister-language forms simultaneously).  It simulates only “downstream” derivation (that is, Phono cannot put a sound-change model “into reverse” and project upstream to generate ancestor words).  Thus Phono does not carry out directly the process of comparative reconstruction of one ancestor word from several descendants in different daughter languages, although it may indirectly help with that enterprise by testing hypotheses.

2.  History.

The development of Phono began in the early 1980s.  Version 1 was written in PL/I for an IBM 360 mainframe, with the Spanish model hard-coded in the program.  Beginning with that first version and continuing to the present, the procedure has been to receive the etymon (ancestor word), as a character string; to translate it to a set of binary feature values (+syllabic, –high, etc.); to perform the historical derivation in terms of these feature values; and finally to retranslate the resulting values back to the form of a character string for output.  For Version 1, the font available for input and output was limited in effect to the uppercase letters of the Roman alphabet.  For input, the phonetic values of these letters could be refined and specified by means of a set of “adjustment rules”, applied prior to the sound-change derivation.  Latin ancestor words could be entered almost entirely in Latin orthography, with the help of adjustment rules that served, for example, to reinterpret the letter <X> as a phonetic sequence of [ks], to assign stress to words, and so on.  And in output, the uppercase alphabet’s lack of precision was remedied by means of a system of so-called “feature-based diacritics” (derived from Burton-Hunter 1976), whereby each segment of the character string could be accompanied by a listing of the feature values in which the segment differed from the default values of the character in the alphabet.  The velar nasal “eng”, for example, would appear as uppercase <N> (the nearest equivalent character in the alphabet), qualified as [–coronal, +high, +back], and so on.  This notation required an elaborate two-dimensional display for each stage of the derivation, but it did insure that no phonetic detail would be lost (see Hartman 1981).

From the beginning, it was clear that Phono needed the capability to perform “batch testing”.  Each time a model is altered during its development, it needs to be retested in the Batch mode, to insure that the new alterations—made in order to account for one group of words—do not cause erroneous results for some other, unforeseen group of words.  For the purpose of this kind of testing, Phono reads some large number of pairs of words—etymon and reflex (that is, ancestor and descendant forms)—from a data file.  For each pair, Phono performs its derivation on the etymon and compares the result with the known reflex, marking each pair as either a “good” or a “bad” match.  The bad matches serve to signal where the model has been impaired or needs improvement.  In this process, the input of the known reflex form, like the input of the etymon, requires its own set of adjustment rules to provide phonetic detail sufficient for the matching operation.  Versions 1, 2, and 3 used a single alphabet for etymon input, for output of the derivation, and—in the Batch mode—for input of the known reflex for comparison.        

The second major version (around 1988) was written in Pascal for the personal computer (DOS operating system), with the Spanish model (and, in principle, any other model) to be read as data and interpreted by the program.  The output font was composed of symbols from the ASCII character set, including some ad hoc conventions such as the “$” sign to represent the palatal sibilant.  In this version the output display, with its feature-based diacritics, still occupied an entire screen for each stage of the derivation.

Version 3.0 (1993, Pascal) brought four innovations:  (1) an internal editor for the alphabet, making it possible to customize the alphabet’s feature values to the user’s taste, with little risk of typographical error; (2) a display of the whole derivation in a single screen; (3) “Word-Trace” and “Rule-Trace” procedures to bring together a list of all the words affected by a particular rule, or of all the rules that affect a particular word; and (4) a system of on-line Help screens. 

Version 3.1 (1994, Pascal) added to the alphabet-editor a set of internal editors for rule makeup and rule order, further reducing the risk of typographical errors.  This version was made available on the Web for downloading.

The newest version, number 4, released in 2002, is written in Visual Basic to run on the Windows operating system.  It is available for downloading from http://mypage.siu.edu/lhartman.  Its most notable advance over Version 3 is the display of output in a standard phonetic font:  the alphabet of the International Phonetic Association.  This IPA font is a copyrighted product of SIL International (www.sil.org), ©1993, used with permission, and bundled with the program.  While previous versions used a single alphabet for etymon input, for output of the derivation, and for input of known reflex forms in the Batch mode, Version 4 handles these three functions with three separate alphabets.  The phonetic alphabet for output is “read-only”, while the two alphabets for input—of the etymon and of the known reflex respectively—are subject to editing by the user.  Adjustment rules are still necessary to supply phonetic detail to the keyboard notation of input, but the role of these rules is reduced by the use of independent alphabets for etymon and reflex.  Thanks to the regularity of the Spanish spelling system (unlike English, for example), reflex words in this language can be supplied in their orthographic form, leaving to the adjustment rules such tasks as silencing the letter <h>, or interpreting the letter <c> as [k], or as an [s] or theta sound, or as the palatal affricate “che”, according to the following letter.

Like its predecessors, Version 4 has the potential to enhance the precision of the output strings by means of feature-based diacritics—if the user chooses to view them.  In the feature matrix of the word ready for output, if a segment matches the values of a symbol in the Phonetic Alphabet in every one of its 20 features, then that symbol appears on the screen in black.  But if any of the feature values of the segment differ from those of the “nearest equivalent” symbol in the Phonetic Alphabet, then the symbol is displayed in blue, and the user can touch it with the mouse-pointer to see the names and signs of the features whose values differ.  In Version 4, this capability is much less needed than before, given the improved precision of the IPA font.

Like previous versions, Version 4 offers the possibility of marking any rule as “persistent” in the chronological sequence.  Persistent rules (defined by Chafe 1968) are those that reapply throughout the derivation whenever their conditions occur.  Each time an ordinary rule (a so-called “transient” rule) brings about a change in the word, Phono traverses the entire list of persistent rules automatically.  For example, a rule of assimilation within consonant clusters can be made persistent to apply to new clusters whenever these are formed by the loss of a vowel. 

3.  Computing Challenges

Given the decision to carry out the historical derivation at the level of binary feature values (rather than, say, through a process of whole-segment replacement), the main challenges of the Phono project have come down to questions of notation:  (1) what keyboard conventions to use for the input of ancestor words and, in the Batch mode, for the input of known descendant words; (2) what screen conventions and font to use for the output display of derivations; and (3) how to represent the rules in a form that would be sufficiently versatile to accommodate any rule that might occur, and yet most easily learnable for the user who wishes to edit the rules (see Hartman 1993a and 1993b).

The question of notation for word input was solved fairly simply through the use of input adjustment rules (one set for the etymon words, another set for the known reflex words of the Batch mode).  In the new version, the need for these rules has been reduced, but not eliminated, by the use of separate alphabets for etymon and reflex.

The question of output notation (the main problem cited by Becker 1996 in his review of Version 3.2) has been largely solved in Version 4 through the use of the IPA phonetic font.  Feature-based diacritics are still available for signaling unexpected discrepancies in derivations, but the need for them has likewise been greatly reduced by the precision of the IPA alphabet.

The third notation problem, that of the rules themselves, has proven to be a harder nut to crack.  Phonologists have a somewhat standard way of expressing sound rules in the format “A —> B / C _ D”, meaning that element A becomes element B in the environment following C and preceding D—or in other words, every instance of “CAD” becomes “CBD”.  But my experience with the Spanish model shows that this convention—even with the refinements of curly braces, angled brackets, parentheses, etc. that were codified by Chomsky and Halle (1968)—can be inadequate to express the complexity of some naturally occurring rules.   As a result, Phono continues in Version 4 with essentially the same rule notation as previous versions:  rules are expressed mainly in terms of binary feature values and locations in the word, using a hierarchy of if-clauses followed by a series of then-clauses.  This system, although logical within itself, requires some learning effort on the part of the user.  On the other hand, Version 4’s Rule Editor is arguably more user-friendly than that of earlier versions:  it operates almost entirely by movements and clicks of the mouse, and the user is encouraged to work with it exclusively, rather than edit the text of rules with a word processor and risk introducing typographical errors. 

4.  Future Developments.

Phono’s Version 4 is complete in its essential components:  the apparatus for derivations, in both the Interactive mode and the Batch mode for pair-testing; and the internal editors for the input alphabets and the contents and order of rules.  At the time of this writing, some auxiliary features of Version 3 have yet to be transferred to Version 4:  the so-called “singleton” batch mode, in which a list of ancestor words can be read and put through derivation with their outcomes written to a data file rather than to the screen; and the so-called “trace” procedures—Rule Trace and Word Trace—mentioned above.  These features will be incorporated in the next few months.  Likewise in the near future a system of on-line Help screens will be incorporated (at present, Version 4 is bundled with a “readme” file that serves as a user’s manual).  I hope that, in the future, I (or other programmers, since the source code is public) can find ways to make the rule notation more similar to the standard notation that is familiar to linguists.  I encourage researchers to use Phono to test models for languages other than Spanish, from a variety of language families, and to provide feedback for further development of the program, in order to insure that it serves the needs of historical phonology universally.

References

Becker, Donald A.  1996.  “Historical Linguistics as a Hacker’s Paradise:  Review of Phono 3.2”.  Glot International, 2:22.

Bin Muzaffar, Towhid.  1996a.  “Computer Simulation of Shawnee Historical Phonology”.  M.A. thesis, Memorial University of Newfoundland.

__________.  1996b.  “Computer Simulation of Shawnee Historical Phonology”.  In Canadian Linguistic Association Annual Conference Proceedings (Calgary:  Calgary Working Papers in Linguistics), pp. 293-303.

Burton-Hunter, Sarah K.  1976.  “Romance Etymology:  A Computerized Model”.  Computers and the Humanities, 10:217-220.

Chafe, Wallace.  1968.  “The Ordering of Phonological Rules”.  International Journal of American Linguistics, 34:115-136.

Chomsky, Noam, and Morris Halle.  1968.  The Sound Pattern of English.  New York:  Harper & Row.

Hartman, Steven Lee.  1974.  “An Outline of Spanish Historical Phonology”.  Papers in Linguistics, 7:123-191.

__________.  1981.  "A Universal Alphabet for Experiments in Comparative Phonology".  Computers and the Humanities, 15:75-82.

__________.  1985.  "A Computer Model of Spanish Historical Sound Change".  In Homenaje a Álvaro Galmés de Fuentes (Madrid:  Gredos), 2:89-98.

__________.  1986.  "Learnèd Words, Popular Words, and 'First Offenders'".  In Oswaldo Jaeggli and Carmen Silva-Corvalán (eds.), Studies in Romance Linguistics (Dordrecht:  Foris), pp. 87-98.

__________.  1993a.  “Three Problems of Notation in Modeling Sound Change”.  Paper presented at Round Table on Computer Applications in Historical Linguistics, Brussels, Belgium, December 8. 

__________.  1993b.  "Writing Rules for a Computer Model of Sound Change."  In Southern Illinois Working Papers in Linguistics and Language Teaching, 2:31-39.

Otero, Carlos-Peregrín.  1971.  Evolución y revolución en romance.  Barcelona:  Seix Barral.