Generating Words and Semantic Space

Generating Words and Semantic Space

adapted from a March 1, 1997 post by Paul M. Hoffman

Whew! I've finally caught up on four+ months of unread conlang digests, and can now finally re-de-lurk. It's a good thing it's not possible to overdose on this stuff! (Is it? Egad, where'd I put that cardiac needle?!)

Anyhow, right near the end of those four months, snuggling comfortably up close to the present, Matt Pearson wrote:

I suppose if I ever rose from the ranks of computer sub-literacy, I might try my hand at setting up a computer-generated vocabulary. But I suspect that once I had generated the list, I'd want to go back and tweak with it, altering or completely changing some of the less appropos words to fit my aesthetic sensibilities. Also, being a realist, I'd want to establish (apparent) etymological connections between various roots, which would involve changing them to make them more similar.

Yes! For me, that's always been one of the great pleasures of artlanging: leaving traces (sometimes subtle, sometimes whack-upside-the-cabeza obvious) of historical connections between words.

For me, one of the most fun things is dividing up the 'lexical-semantic space' of Tokana in ways that differ from English. For example, where English has four words, "bring", "hold", "carry", and "wear", Tokana has a single word, "kespa". On the other hand, Tokana has two words for "shell", depending on whether you're talking about a hard shell (e.g. of a shellfish or turtle) or a soft shell (e.g. of an arthropod or an egg). Furthermore, the word for "soft shell", "eket", also means "fingernail".

I love the 'soft shell' - 'fingernail' connection! Concocting senses is a favorite of mine, too. The challenge is, given a map of the semantic universe (well...), partition it uniquely and interestingly.

I don't think that doing a computer-generated vocabulary would allow me to make those kinds of connections, although I suppose it would give me a corpus of words which I could then go back and revise. Perhaps this is how some of you work?

I use Roget's and a few good bilingual dictionaries (notably the Zulu dictionary by Doke and Vilakazi and those great lexica of Austronesian languages I get at the local university library, not that I can remember any of the authors or titles now of course). Now that I know about WordNet, I feel more and more tempted to try to devise some simple algorithms for generating useable "sense group" glosses.

Another tool I've used in the past is Buck's _Dictionary of synonyms in Indo-European languages_ (I've probably mangled the title), which shows the semantic and phonological development of several hundred IE roots over time in a variety of IE languages (if I had it here, I'd give a few examples; it's very handy when you're creating vocabulary and need a bit of inspiration).

OK, I'll go ahead and spill my guts with reckless abandon: what I *really* yearn for is a kind of "sound-sense-gene engineering laboratory" (pardon the silly term) to help speed up the process of lexicon building. Mix etymalarkey and senscrambling, stir well, and apply heat. For example, start with a single unremarkable word (apostrophe = stress marker):

  'kaistu 'ripe'

Create a few straightforward derivatives:

  'kaistuam   'ripen (tr.)'

  'yakaistu   'unripe'

  kaistu'auzu 'overly ripe'

Mutate the words' shapes and senses:

  'kaistu   'be ripe'

  'kaistiam 'ripen, bring to fruition; achieve, accomplish;
               perform, enact'

  'yagestu  'be unripe, naive, a "greenhorn"'

  kes'tauzu 'be overly ripe; bad, spoiled, rotten; unctuous, smarmy'

Take out some of the original and added senses, substitute a rival word to cover the original sense 'ripe', make a phonological alteration or two, and there you are:

  e'bamu    'be ripe'

  'kaihtiam 'perform (a service, a duty, or some other "good" thing);
               enact (a law); achieve, accomplish'
               
  'yagehtu  'be new (at something); be a virgin'

  ke'tsauzu 'be overly ripe; unctuous, smarmy, treacly'

That took me ten or fifteen minutes to do by hand, using my _Roget's International Thesaurus_, a few simple phonological change patterns I've run into in various languages, and a little imagination. Surely much of it could have been done in a minute or two by a computer program, requiring only a modicum of my own time to polish things up. Right? It *should* be easy to get a computer to perform sense "branching" much as I did, though I doubt it would pick the same branches I would follow.

My feeling is that this sort of thing can be done not too godawfully by a computer program as long as all the necessary "parts" are in place and work well together. Here's my own first stab at what the requirements for this are...

Minimal Requirements for a Word Shape and Sense Laboratory

All of the following components must be available:

  1. Solid random word generation (i.e., a "structured" probabilistic
     model that meets whatever phonotactic constraints you want)

  2. A good variety of morphological and phonological "transformations"
     to apply (using the word loosely and not in any orthodox sense)

  3. Good sense data from a variety of languages -- a database of
     senses such as those for French _fade_ 'insipid, tasteless;
     lame (joke); drab, dull; stale (smell)'

  4. A human being to bring this all together and in the darkness --
     er, tweak the output.

It seems to me that #1 and (especially) #4 are already present in abundance, but where are #2 and #3? I've been collecting interesting glosses from bilingual dictionaries for years, but what I have now is a disorganized mishmash of notes spread throughout many notebooks, computer files, etc. Maybe now is the time to establish a more rigorous "Semantic Scrapbook" with examples from various languages -- and people's imaginations -- of interesting sense-connections like that between 'overly ripe' and 'unctuous, smarmy'. What do you all think? I'll dig through my notebooks and see what I can come up with for starters.

Oops! I forgot #5:

  5. Time to do all this...  (*Sigh!*  As if I didn't have too many
     frying pans on the fire as it is!)

Return to Conlang-related topics|Back to FAQ page