SRFI x -*- outline -*- * Title Syntax for lexical case sensitivity contexts * Author Taylor Campbell * Abstract This SRFI defines a mechanism by which to locally toggle between lexical case sensitivity and insensitivity of symbols. * Issues ** Default case behaviour Whether case insensitivity is the 'right' default is a matter not important to the goals of this proposal. Personally, I am vehemently opposed to making case sensitivity the default, but this is irrelevant. The status quo is case insensitivity, and any transition to case sensitivity will be long & tedious and involve the updating of much code. For the present default being case insensitivity, and the existence of such embedded sublanguages as are mentioned in the rationale section, the #cs/#ci mechanism is a useful one to solve many problems easily, succinctly, and immediately. It is also an already established mechanism. Even if case sensitivity be adopted at some point in the future, #cs/#ci would not hurt, and indeed it would be useful for programmers who still wish to write code case-insensitively or who do not have the time to go through all of their code to downcase it. ** Internationalization Some may object to the restriction of case folding to the English alphabet in the ASCII character set. This is not a restriction, however, but rather merely a necessary specification to make the #cs/ #ci mechanism useful, without extending to issues beyond the scope of the present proposal, namely internationalization and case distinctions in non-English languages. There is no restriction imposed because this proposal specifies nothing beyond the scope necessary to make #cs/#ci useful, and non-English letters to which some form of case folding may apply cannot portably be written in Scheme programs anyway. If extensions to the allowed character set for identifiers are made, they may include case folding behaviour in case-insensitive contexts, but internationalization is not the subject of this proposal. ** Terminology 'Case sensitivity' and 'case insensitivity' are somewhat misleading terms. The correct terms would be 'case preservation' and 'case folding,' because symbols internally are case-sensitive, but the reader has the option of folding or preserving case. However, the words 'sensitive' and 'insensitive' are already established by the existing support for #cs/#ci, and #cp/#cf are less likely acronyms to be recognized. * Rationale Many applications of Scheme involve embedded sublanguages that require, for some reason or another, case sensitivity of identifiers. For example, the Scheme Shell, scsh [1], has a sublanguage for writing Unix program invocations in S-expressions with symbols, which requires case sensitivity of the symbols in order to be able to correctly denote Unix file system elements; another example is SXML [2], which represents XML trees in S-expressions, and, since XML is case sensitive, SXML must be as well. Yet, in the Scheme language, identifiers are case insensitive, a convention which has been established for decades in Lisp and a reliance on which much existing code and many conventions of code exhibit. It is therefore useful to be able to toggle between case sensitivity and case insensitivity of identifiers to support embedding these sublanguages cleanly within Scheme. * Informal specification In the lexical syntax of Scheme, the sequence `#cs' enters a lexical case sensitivity context for the following complete S-expression, so that the case of identifiers written in that S-expression is not folded but preserved exactly as it was written. Similarly, the sequence `#ci' enters a lexical case insensitivity context (the default) for the following complete S-expression, so that the case of identifiers written in that S-expression is folded as is default by the Scheme system. In these examples, assume that the expressions are evaluated in a lexically case-insensitive context: (eq? 'foo 'foo) => #t (eq? 'foo 'FOO) => #t (eq? 'foo #ci 'foo) => #t (eq? 'foo #ci 'FOO) => #t (eq? 'foo #cs 'foo) => unspecified (eq? 'foo #cs 'FOO) => unspecified (eq? #cs 'foo #cs 'foo) => #t (eq? #cs 'foo #cs 'FOO) => #f (eq? #cs 'FOO #cs 'FOO) => #t (symbol->string 'foo) => unspecified (string=? (symbol->string 'foo) (symbol->string #ci 'foo)) => #t (symbol->string #cs 'foo) => "foo" (symbol->string #cs 'FoO) => "FoO" (eq? (string->symbol "fOo") #cs 'fOo) => #t (map symbol->string '#cs(foo FOO fOo)) => ("foo" "FOO" "fOo") (map symbol->string '#cs(foo #ci x FOO) => ("foo" "x" "FOO") or ("foo" "X" "FOO") (define #cs FOO 5) (define #ci FOO 3) foo => 3 #ci FoO => 3 #cs FOO => 5 or 3 #cs foo => unspecified, possibly error ** Notes . #cs/#ci affect only symbols. That is, other case-insensitive parts of Scheme are not affected by it; for instance, #cs #X123 is the same as #ci #X123; both represent the hex literal for the number in decimal represented as 291. . #cs/#ci do not affect reader abbreviations such as quote in the sense that the quote mark still expands to the symbol QUOTE with the default case: #cs 'XYZ is equivalent to #cs (#ci QUOTE XYZ), for any XYZ. The same is true of ` as an abbreviation for QUASIQUOTE, , for UNQUOTE, and ,@ for UNQUOTE-SPLICING. . #cs/#ci are themselves case-insensitive; that is, #cs, #cS, #CS, & #Cs are all equivalent, and likewise with #ci, #cI, #CI, & #Ci. . #cs/#ci must be followed by a delimiter. For example, `#cs'FOO', `#cs 'FOO', `#cs(#ci quote FOO)', and `#cs ; Frob grovel! 'FOO' are all equivalent and valid, but `#csFOO' is not valid (and therefore not equivalent to `#cs FOO'). ** Definition of case insensitivity Case-folding in case-insensitive contexts is specified only the English alphabetic characters that are already defined by R5RS as allowed in source programs. In particular, it is not specified of non-English languages in which case might provide typically significant semantic distinctions in names. Other specifications may provide this, but this was deemed a can of worms not relevant to the base of this proposal. * Implementation A simple, nearly-R5RS implementation of #cs/#ci is included. It is a slight modification of Scheme48's reader, which is a very simple readtable-driven recursive descent parser; there are two lines changed from and about twenty added to the original code. It is in s48-read-ci-cs.scm; see that file for the copyright & licensing terms. * Copyright Copyright (C) 2005 Taylor Campbell. All rights reserved. Don't distribute this. I'm not liable; if stuff goes wrong, it's all your fault, *nyah nyah*. This copyright notice will change in the real SRFI document to something reasonable (in particular, the official SRFI copyright notice, surprise surprise).