Manifesto on JAR's Next Language

Originally from spring 2001, updated 11 December 2001, 2 January 2002, and 23 June 2002

("Jargol" is apparently the name of a pedigreed stud bull.)

Reasons to design a programming language:

Languages are for

No matter what your goals are initially, demands will always expand. Always best to start by figuring out how to do abstraction.

One of my goals is to try to reduce the sophistry, insularity, and inbreeding of the programming activity. I want the culture and terminology of programming to be closer to that of mathematics, engineering, science, and even humanities. This means abstracting away from von Neumann computers and working toward a formal language for expressing human, logical, and scientific problems and their solutions.

Meaning and Information

The central ideas that I'd like to capture are meaning and information.

Things written in the language should have meaning. Names should have meaning. A meaning may be assigned a priori (i.e. dictated by me) or may be derived by combining things that already have meaning.

Conversely, as many useful meanings as possible should have expression in the language. One should be able to express anything that any plausible meta-program might care about. If a particular meta-program doesn't understand something, it can just ignore it, and chances are some other meta-program will be able to use it.

I'm not sure I know what meaning is, but I'm striving for a common sense definition that blends into mathematical meaning. I am not referring to the so-called "meanings" of denotational semantics.

Example (not a good one): What does a+b mean? A mathematician might say it means that there is a binary ACI operator (ACI = associative, commutative, and identity-possessing) and it is being applied to two things, a and b, whose meaning derives from the context of discussion - that is, a, +, and b are all pronouns, but we know that a+b = b+a (where = is itself defined).

By information I mean Platonic bags of bits -- immutable and identity-free. Bignums and symbols in Lisp are like this, but information may also be structured (S-expression-like). Some memory might hold different information at different times, but memory has identity and is therefore not information.

I'm not totally happy with the term "information," since it implies that someone is being informed of something (knows something that they didn't know before), and I want a term that means uninterpreted bits. However, I haven't found anything better. "Data" and "datum" are close but in English they mean "given" and have the semantics of an independent variable, whereas I'm looking for a term that would also include dependent variables (outputs). "Utterance" is clumsy. "Message" is almost right. "Artifact" has been proposed. "String," "text," "content," "expression," "resource", "version" all have some good properties but none works well.

[23 June 2002] Today I like the term "number". There is good mathematical precedent for using this word to mean an object with rich internal structure: the game/numbers described by John Horton Conway and presented in Knuth's book Surreal Numbers.

Design concepts

Essential features for any language I'd care to design right now:

The following are ideas that I'd like to try out. The following are flakey ideas that I want to try but would probably give up on pretty quickly if they didn't work.

Community Glossary

The purpose of the language's community glossary is to encourage and abet, but not to require, the language's users to use consistent terminology across all programs.

The community glossary will just be a set of term descriptions submitted to a central repository by people who are using the language. Submission will be easy -- perhaps an automatic part of a program build process. E.g. if a program includes a definition of a term and the definition is declared "community" then the definition will be submitted to the repository. Discussions of the merits and details of definitions will also be encouraged and archived.

New entries should have some support -- some evidence that this definition of the term is consistent with established usage in natural language.

[Maybe I can get Greenspun or KMP interested in this?]

Before using a term in a program, programmers will be encouraged to consult the community glossary and link to a particular definition found there, and it will be easy for them to do so.

Example: There might be community agreement that "sorted" is an adjective that means that the elements of a list occur in order. "Element," "list," and "order" would also have definitions in the glossary.

Another example: One might go to the community glossary and look up "sort". There might be three or four definitions of this term, written in natural language prose, each with a date and author. Some authors might say "sort" is something you do to a list, while for others it has the sense of "kind", as in universal algebra. Each author will argue that his or her definition is good, and others will argue against them, in an archived discussion.

Definitions may or may not be operational. For example, "halt" could be given a definition in terms of halting of Turing machines. Even terms such as "sorted list" which can be made operational would have an implementing program only as backup to a declarative definition.

In the extreme, a programmer writes programs that simply implement meanings defined in terms that are all defined in the community glossary. The program then has a readily understood informal or formal specification.

As in a dictionary of English, multiple definitions are acceptable, although discouraged. A use of a term should be associated with the particular definition desired. Somehow there has to be a mechanism to allow good definitions to become ascendant and others to be phased out. I don't know how to do that in a way that scales.

As suggested above, the glossary may contain programs (preferably public domain) that illustrate, exemplify, or even help to define terms (e.g. tests).

The glossary will also propose grammatical aspects of terms, such as parts of speech.


kmp: not a functional language; learn moo; paranoia; clumping of term definitions; qualified names, environments, and alpha conversion; synchronization not good; substandards.org

jar 2002-06-23: name of language should be Google-unique