SRFI x                                                  -*- outline -*-

* Title

Syntax for lexical case sensitivity contexts

* Author

Taylor Campbell

* Abstract

This SRFI defines a mechanism by which to locally toggle between
lexical case sensitivity and insensitivity of symbols.

* Issues

** Default case behaviour

Whether case insensitivity is the 'right' default is a matter not
important to the goals of this proposal.  Personally, I am vehemently
opposed to making case sensitivity the default, but this is irrelevant.
The status quo is case insensitivity, and any transition to case
sensitivity will be long & tedious and involve the updating of much
code.  For the present default being case insensitivity, and the
existence of such embedded sublanguages as are mentioned in the
rationale section, the #cs/#ci mechanism is a useful one to solve many
problems easily, succinctly, and immediately.  It is also an already
established mechanism.  Even if case sensitivity be adopted at some
point in the future, #cs/#ci would not hurt, and indeed it would be
useful for programmers who still wish to write code case-insensitively
or who do not have the time to go through all of their code to downcase
it.

** Internationalization

Some may object to the restriction of case folding to the English
alphabet in the ASCII character set.  This is not a restriction,
however, but rather merely a necessary specification to make the #cs/
#ci mechanism useful, without extending to issues beyond the scope of
the present proposal, namely internationalization and case distinctions
in non-English languages.  There is no restriction imposed because this
proposal specifies nothing beyond the scope necessary to make #cs/#ci
useful, and non-English letters to which some form of case folding may
apply cannot portably be written in Scheme programs anyway.  If
extensions to the allowed character set for identifiers are made, they
may include case folding behaviour in case-insensitive contexts, but
internationalization is not the subject of this proposal.

** Terminology

'Case sensitivity' and 'case insensitivity' are somewhat misleading
terms.  The correct terms would be 'case preservation' and 'case
folding,' because symbols internally are case-sensitive, but the reader
has the option of folding or preserving case.  However, the words
'sensitive' and 'insensitive' are already established by the existing
support for #cs/#ci, and #cp/#cf are less likely acronyms to be
recognized.

* Rationale

Many applications of Scheme involve embedded sublanguages that require,
for some reason or another, case sensitivity of identifiers.  For
example, the Scheme Shell, scsh [1], has a sublanguage for writing Unix
program invocations in S-expressions with symbols, which requires case
sensitivity of the symbols in order to be able to correctly denote Unix
file system elements; another example is SXML [2], which represents XML
trees in S-expressions, and, since XML is case sensitive, SXML must be
as well.  Yet, in the Scheme language, identifiers are case
insensitive, a convention which has been established for decades in
Lisp and a reliance on which much existing code and many conventions of
code exhibit.  It is therefore useful to be able to toggle between case
sensitivity and case insensitivity of identifiers to support embedding
these sublanguages cleanly within Scheme.

* Informal specification

In the lexical syntax of Scheme, the sequence `#cs' enters a lexical
case sensitivity context for the following complete S-expression, so
that the case of identifiers written in that S-expression is not folded
but preserved exactly as it was written.  Similarly, the sequence `#ci'
enters a lexical case insensitivity context (the default) for the
following complete S-expression, so that the case of identifiers
written in that S-expression is folded as is default by the Scheme
system.

In these examples, assume that the expressions are evaluated in a
lexically case-insensitive context:

  (eq? 'foo 'foo)                       => #t
  (eq? 'foo 'FOO)                       => #t
  (eq? 'foo #ci 'foo)                   => #t
  (eq? 'foo #ci 'FOO)                   => #t
  (eq? 'foo #cs 'foo)                   => unspecified
  (eq? 'foo #cs 'FOO)                   => unspecified
  (eq? #cs 'foo #cs 'foo)               => #t
  (eq? #cs 'foo #cs 'FOO)               => #f
  (eq? #cs 'FOO #cs 'FOO)               => #t
  (symbol->string 'foo)                 => unspecified
  (string=? (symbol->string 'foo)
            (symbol->string #ci 'foo))  => #t
  (symbol->string #cs 'foo)             => "foo"
  (symbol->string #cs 'FoO)             => "FoO"
  (eq? (string->symbol "fOo")
       #cs 'fOo)                        => #t
  (map symbol->string
       '#cs(foo FOO fOo))               => ("foo" "FOO" "fOo")
  (map symbol->string
       '#cs(foo #ci x FOO)              => ("foo" "x" "FOO")
                                        or ("foo" "X" "FOO")

  (define #cs FOO 5)
  (define #ci FOO 3)
  foo                                   => 3
  #ci FoO                               => 3
  #cs FOO                               => 5 or 3
  #cs foo                               => unspecified, possibly error

** Notes

  . #cs/#ci affect only symbols.  That is, other case-insensitive parts
    of Scheme are not affected by it; for instance, #cs #X123 is the
    same as #ci #X123; both represent the hex literal for the number in
    decimal represented as 291.
  . #cs/#ci do not affect reader abbreviations such as quote in the
    sense that the quote mark still expands to the symbol QUOTE with
    the default case: #cs 'XYZ is equivalent to #cs (#ci QUOTE XYZ), for
    any XYZ.  The same is true of ` as an abbreviation for QUASIQUOTE,
    , for UNQUOTE, and ,@ for UNQUOTE-SPLICING.
  . #cs/#ci are themselves case-insensitive; that is, #cs, #cS, #CS, &
    #Cs are all equivalent, and likewise with #ci, #cI, #CI, & #Ci.
  . #cs/#ci must be followed by a delimiter.  For example, `#cs'FOO',
    `#cs 'FOO', `#cs(#ci quote FOO)', and `#cs           ; Frob grovel!
    'FOO' are all equivalent and valid, but `#csFOO' is not valid (and
    therefore not equivalent to `#cs FOO').

** Definition of case insensitivity

Case-folding in case-insensitive contexts is specified only the English
alphabetic characters that are already defined by R5RS as allowed in
source programs.  In particular, it is not specified of non-English
languages in which case might provide typically significant semantic
distinctions in names.  Other specifications may provide this, but this
was deemed a can of worms not relevant to the base of this proposal.

* Implementation

A simple, nearly-R5RS implementation of #cs/#ci is included.  It is a
slight modification of Scheme48's reader, which is a very simple
readtable-driven recursive descent parser; there are two lines changed
from and about twenty added to the original code.  It is in
s48-read-ci-cs.scm; see that file for the copyright & licensing terms.

* Copyright

Copyright (C) 2005 Taylor Campbell.  All rights reserved.

Don't distribute this.  I'm not liable; if stuff goes wrong, it's all
your fault, *nyah nyah*.  This copyright notice will change in the real
SRFI document to something reasonable (in particular, the official SRFI
copyright notice, surprise surprise).