This is Taylor R. Campbell's blag.
Copyright (c) 2006--2010, Taylor R. Campbell.
Verbatim copying and distribution of this entire article are
permitted worldwide, without royalty, in any medium, provided this
notice, and the copyright notice, are preserved.
Send comments to the email address with a host of the little-endian
order of the domain net.mumble and with a username of campbell.
Before reading this, please take a grain of salt and hold it in your
mouth with your tongue against your cheek for several minutes; also,
send in your sarcas-o-meter in for a tune-up just in case. (Some of
the entries are serious, though. I think it is pretty obvious which
ones are serious and which ones are not, but your mileage may vary.)
2010-01-01 Paredit patches, still
Not enough users have given me feedback on the patches to paredit in
. Thus, paredit 22 has
not yet been released. See the (chronologically) previous (but
textually next) entry.
Should there be a mailing list for paredit? Contact me if you have
an opinion on this question, or would like to offer to administer
such a mailing list.
2009-07-10 Paredit patches
Paredit 22 is under development. There is a directory of patches to
paredit at each in the
form of an elisp file that contains a blurb summarizing what changes
it makes when it is loaded, using the `load-file' command or the
`load' function, after paredit.el. If you would like to contribute
to the design of paredit 22, please try these patches and send any
feedback about them to me.
This message will self-destruct when paredit 22 is released.
2009-03-28 More simply addressing the confusion about UNWIND-PROTECT
in Scheme
Suppose you want to release a resource after a part of a program
that uses the resource. For example, consider a connection to a
database that should be disconnected, a file descriptor that should
be closed, or a buffer for sensitive data such as a pass phrase that
should be destroyed by zeros or random data. Once the resource is
released, it can be neither reused nor readily reconstituted.
Is the resource represented in Scheme by a single object? If so,
ensure that with the object is registered a finalizer that releases
the resource, so that even in the face of an interactive ^C at the
REPL the resource will be released. All of the above examples
qualify.
(define (open-database-connection database-descriptor)
(let ((handle (allocate-database-handle)))
(register-finalizer handle %close-database-connection)
;; After this point, if we are interrupted -- say with ^C at
;; the REPL --, somewhere the handle will be recorded so that
;; its resources can be released even if the usual logic to
;; close the connection is never reached.
(%open-database-connection database-descriptor handle)
handle))
Is it /necessary/ for the resource to be released immediately after
the part of the program that uses it? If so, use REWIND-PROTECT at
. In this
case, control can never re-enter the extent that uses the resource.
Of the above examples, only a buffer for sensitive data qualifies.
(define (call-with-pass-phrase prompt receiver)
(let ((buffer (make-pass-phrase-buffer)))
(rewind-protect
(lambda ()
(prompt-for-pass-phrase prompt buffer)
(receiver buffer))
(lambda ()
(clear-pass-phrase-buffer buffer)))))
Is it /desirable/ for the resource to be released immediately after
the part of the program that uses it? If so, use UNWIND-PROTECT at
. Of the
above examples, both the database connection and file descriptor
qualify.
(define (call-with-database-connection database-descriptor
procedure)
(let ((handle (open-database-connection database-descriptor)))
(unwind-protect (lambda () (procedure handle))
(lambda ()
;; If a continuation is captured inside PROCEDURE so that
;; this cannot execute immediately when PROCEDURE returns,
;; then closing the database connection here is redundant,
;; since registered with HANDLE is a finalizer that will
;; close the connection. However, provided that the
;; finalizer have no effect if the connection is already
;; closed, this redundancy is harmless and enables
;; UNWIND-PROTECT to close the connection immediately if
;; no such continuation has been captured.
(close-database-connection handle)))))
Rationale: Scheme's CWCC (or composable continuations) enables
powerful control abstractions such as inverting control. For
example, (FOR-EACH ) applies to each
element of in sequence. With CWCC, one can reify the
iteration of the list by FOR-EACH into a first-class cursor object
that can be advanced independently. This way, for example, we can
traverse multiple lists in parallel using only the one-list FOR-EACH
procedure.
If we rather consider a procedure (FOR-EACH-QUERY-RESULT ), the situation is not much different, except that
FOR-EACH-QUERY-RESULT may entail the opening of a connection to a
database, and the connection is a resource which should be released
after the query results are finished being received. Using CWCC to
invert control of FOR-EACH-QUERY-RESULT is just as valid as using
CWCC to invert control of FOR-EACH, but if FOR-EACH-QUERY-RESULT
uses REWIND-PROTECT (a common idiom in unfortunately much Scheme
code), it inhibits control inversion. UNWIND-PROTECT allows the
control inversion.
Why involve finalization in UNWIND-PROTECT, rather than just
associating a finalizer with the object representing the connection
to the database? The difference is in the intent implied by the use
of UNWIND-PROTECT that if control cannot re-enter the protected
extent, the protector can be immediately applied when control exits
the protected extent extent even if no garbage collection is about
to run finalizers.
Note that one cannot in general reconcile
(1) needing to release the resource immediately after control exits
the extent that uses it, and
(2) enjoying the benefits of powerful control abstractions such as
inversion of control.
However, if (1) is relaxed to
(1') needing to release the resource immediately after control
/returns normally from/ the extent that uses it,
then one can reconcile (1') and (2) by writing
(CALL-WITH-VALUES (LAMBDA () )
(LAMBDA RESULTS
(APPLY VALUES RESULTS))), or simply
(BEGIN0 )
using the common name BEGIN0 for this idiom. (In Common Lisp, this
is called PROG1.)
Also note that condition systems are not involved; they are a red
herring. Any condition handler that releases a resource should do
so only when the condition implies that the resource is no longer
usable; for example, if a database connection becomes desynchronized
somehow and cannot proceed, then it is reasonable for the handler
for this condition to close the connection and destroy any of its
state. However, when handling conditions that do not imply that the
resource is no longer usable, the resource should not be released.
An enclosing handler may continue within a protected extent, or even
exit and re-enter the protected extent, for instance while iterating
over a reified database cursor procured by control inversion.
Handling any condition by releasing the resource defeats these
useful operations.
At is an
implementation of UNWIND-PROTECT for MIT Scheme. This definition is
lighter-weight than the portable one. However, it does not attempt
to run the protector immediately if it can.
Appendix A: John Cowan has asked for elaboration on the inversion of
control for iteration with FOR-EACH.
(define (list->stream list)
(iteration-procedure->stream
(lambda (receiver)
(for-each receiver list))))
(define (iteration-procedure->stream iteration-procedure)
(lazy
(call-with-current-continuation
(lambda (return-stream)
(iteration-procedure
(lambda (element)
(call-with-current-continuation
(lambda (return-to-iteration)
(return-stream
(stream-cons element
(lazy
(call-with-current-continuation
(lambda (return-cdr)
(set! return-stream return-cdr)
(return-to-iteration))))))))))
(return-stream stream-nil)))))
;;; Simpler definition with reset & shift.
(define (iteration-procedure->stream iteration-procedure)
(lazy
(reset
(begin
(iteration-procedure
(lambda (element)
(shift continue-iteration
(stream-cons element (lazy (continue-iteration))))))
stream-nil))))
Appendix B, discussion question: Should UNWIND-PROTECT evaluate
protection forms in the dynamic context of the call to
UNWIND-PROTECT? This rules out using UNWIND-PROTECT within
REWIND-PROTECT, which may be an onerous requirement.
2009-03-21 On file systems, Unix, and brain damage [updated]
Earlier this month, users of the alpha version of Ubuntu, the first
to support the ext4 file system by default, reported an immense
increase in data loss after crashes leading to corruption of their
desktop environments. After a crash, many files that were recently
written were truncated on disk to zero bytes.[1]
How did this happen?
To atomically replace a file's old data with new data, applications
write the new data to a file linked at a temporary pathname, and then
rename the link to the permanent pathname, overwriting the link to
the file with the old data. This sequence of operations, with the
atomicity guarantee of the POSIX rename() function, guarantees that
at any time, at the permanent pathname will be a link either to a
file containing the old data or to the file containing the new data.
This guarantee is made, at least, during normal system operation.
POSIX guarantees nothing when the system crashes or the power fails.
If the system is even bootable after such an event, POSIX permits it
to have replaced the contents of the file system with a collection of
cat macros. Although this behaviour is permitted, it is not useful.
System crashes and power failures are facts of life, and one of the
goals of journalling file systems is to mitigate their damage to the
file system.
In the name of performance, the ext4 file system as it was bundled
with the alpha version of Ubuntu would delay the allocation of blocks
on the disk for a file so that they can be allocated contiguously
together when needed. Periodically the system forces allocation of
blocks and commits them to the disk; by default, this happens on
sixty-second intervals. But the ext4 file system would sometimes
commit renames to the journal before allocating blocks for the new
data, during a sixty-second window. If the system crashes, and the
journal is replayed, then the rename may be applied -- before the new
data are written to the disk. This effectively reorders the sequence
that many applications rely on to atomically replace files,
destroying its atomicity.
The developers of the ext4 file system answered with several
`solutions': [2], [3]
(1) make your systems more reliable with UPSs and by avoiding buggy
proprietary drivers that cause such frequent crashes, which the
developers of the ext4 file system don't encounter;
(2) use SQLite and other databases instead of writing files to disk;
(3) change all the applications that do write files to disk so that
they fsync before closing and renaming; and if that becomes a
performance problem then
(4) run fsync periodically in a separate thread.
Solution (1) is no excuse for a reliable file system, and is a
dangerous sentiment to be held by the developer of a file system
likely to find widespread use in personal computers running Ubuntu,
computers which may not have the resources available in data centres
for reliability. This suggests that the developers of the ext4 file
system have never seriously stress-tested it with forced power cycles
during heavy I/O loads!
About solution (2), and the general feeling of `we like databases!',
I can only refer to jwz's rant about mbox summary files. Technical
merit aside, solutions (2) and (3) are impractical because there are
simply too many programs to change.
Solutions (3) and (4) are disturbingly misguided. Even if it were
practical to change every application's write/rename sequence into a
write/fsync/rename sequence, to do so would be semantically wrong.
Only programs that require a file's data to be written to permanent
storage before proceeding should use fsync; this is what fsync does.
For example, editors such as Emacs fsync files before informing users
that the files are written to disk; users are unhappy if the files
that their editors told them were written to disk were not. Mail
receiving agents fsync files where they store mail before telling the
sending agents that the mail has been accepted; users are unhappy if
they are told that mail was delivered when it was not.
Programs that require only that the replacement operation be atomic,
and that it not leave a file in a corrupted intermediate state,
should not use fsync, because the purpose of fsync is to provide a
guarantee which these programs do not need. For example, the KDE
window configuration is not critical user data, but may be stored on
the file system. It is OK for the machine to forget that a user has
dragged a window a few pixels, but it is not OK for KDE to read a
corrupted intermediate state of the file between the old and new
versions -- hence atomic write/rename replacement without fsync.
Extraneous fsyncs, in fact, degrade performance, especially for small
files that are frequently overwritten -- it gives the file system no
opportunity to keep the intermediate versions solely in memory.
Calling fsync periodically in a separate thread is nonsense -- if a
program cannot proceed until a file's data are on permanent storage,
it should call fsync and wait; if a program can proceed, then it
shouldn't call fsync. The operating system already periodically
synchronizes the file system with permanent storage -- at least, one
would hope so; if it doesn't do this, then it has a bug.
Actually, fsync does not guarantee that a file's data are written to
permanent storage before proceeding. POSIX (or at least the SUSv3,
)
claims that it shall:
The fsync() function shall request that all data for the open file
descriptor named by fildes is to be transferred to the storage
device associated with the file described by fildes. The nature of
the transfer is implementation-defined. The fsync() function shall
not return until the system has completed that action or until an
error is detected.
Many man pages echo this sentiment. For example, NetBSD's fsync(2)
man page begins with:
fsync() causes all modified data and attributes of fd to be moved
to a permanent storage device.
But on AIX, Darwin, Linux, NetBSD, and probably other Unices as well,
fsync does not actually guarantee that the file's data are written to
permanent storage. Instead it only nudges the file system to force
some blocks associated with the file slightly sooner. If the file
system is backed by a hard disk device that has a write cache, the
blocks might just end up in the write cache -- which is volatile, and
which may evaporate in the case of a power failure.
Common protocols by which operating systems communicates with storage
devices have commands to request that the storage devices force their
write caches to permanent physical media. Some devices ignore these
commands, in which case one cannot win, but the commands can be sent
nevertheless, and this is the best effort that the operating system
can make. And AIX, Darwin, Linux, and NetBSD all provide ways for
programs to request this -- different ways, that is.
AIX and NetBSD provide a system call fsync_range, which has a flag
FDISKSYNC (NetBSD) or O_DSYNC (AIX) that requests any blocks
associated with the file to be forced to permanent physical media.
Linux provides a fancier but similar system call sync_file_range.
Darwin provides an fcntl command F_FULLFSYNC. These approaches are
all slightly different, and for file descriptors not backed by
permanent storage devices they all report different errors. So just
to force a file's data to disk reasonably reliably on at least four
different Unices and to ignore the cases of file descriptors not
backed by permanent storage, it takes a good seventy lines of code
littered with cpp conditionals, together with autoconf or
autoconf-like detection of the relevant system calls.
Ugh!
Fortunately for the hapless users of Ubuntu, the developers of the
ext4 file system have patched it so that renaming a link to a file
that overwrites a link to another file forces any pending data for
the first file to be sent to the disk before the directory structure
is updated to reflect the rename. This may not guarantee that the
data are actually, really, truly on permanent physical media, but
it's better than waiting sixty seconds before they are even delivered
to the device that will eventually write them to permanent physical
media.
But this kerfuffle has not inspired confidence in me about the ext4
file system. In [3], the developers of the ext4 file system make
such absurd suggestions as making a laptop mode in which the kernel
ignores fsync, so that even those applications that use fsync
semantically correctly will not have the very guarantee that is its
sole reason for existence! When did performance become a concern
overriding robustness in file systems?
[1] http://bugs.launchpad.net/ubuntu/+source/linux/+bug/317781
[2] http://bugs.launchpad.net/ubuntu/+source/linux/+bug/317781/comments/45
[3] http://thunk.org/tytso/blog/2009/03/15/dont-fear-the-fsync/
[Update, 2009-03-26: I have been pointed to Linus Torvalds' ideas
on the subject, archived at . I
hope that the ext4 developers take heed of his sentiment.]
2009-03-21 Correction to the optimization of resource release in
control brackets
Due to an unstoppable deluge of the concern of readers after the
2009-02-04 correction, I am obliged to correct the optimization in
the original 2006-05-01 entry.
The correct optimization is to associate a flag with each wind point
which is by default clear and which will be set by CWCC or by any
procedures that reify composable continuations. (This is not an
onerous requirement: invocation of the reified continuations must run
in time linearly proportional to the depth of the wind point tree
anyway.) If the flag is clear when the after thunk is called, then
the finalizer can be run immediately.
2009-02-04 Correction to control brackets and resource release
It has come to my attention that I identified the wrong criterion
for when protector thunks established by UNWIND-PROTECT may be
called. The criterion I stated is that control may never again
return from the protected thunk. The correct criterion is that
control may never again re-enter the protected thunk. (Consider,
for example, a coroutine that does not terminate and return.)
The correct definition for *UNWIND-PROTECT in the appendix is:
(define (*unwind-protect thunk protector)
(let ((protector-cell (make-protector-cell protector)))
(dynamic-wind
(lambda ()
(identity-procedure protector-cell))
thunk
(lambda ()
unspecific))))
In Scheme systems that provide an REGISTER-FINALIZER procedure,
which requests the garbage collector to apply a procedure when a
given object's storage is reclaimed, one might write:
(define (*unwind-protect thunk protector)
(let ((object (cons 0 0)))
(register-finalizer object protector)
(dynamic-wind
(lambda ()
...refer to OBJECT to prevent reclamation of its storage...)
thunk
values)))
2009-01-04
I forgot about the 2007-mm-dd entry. I'm not sorry.
2007-mm-dd On macros and expression in Scheme
[Placeholder for a long-promised entry which I haven't yet forgotten
about.]
2007-08-23 On the Theatrical Security Administration's cleverness
During my flight to Seattle one month ago, my case containing a
guitar was `randomly' selected to be searched, and the TSA left a
gift for me inside: an Orwellian notice about protecting me and my
fellow passengers by randomly violating our constitutional rights
(which might, they add, involve accidentally damaging the bags, with
no liability on the TSA). Flying back to Boston, I left a prominent
copy of the Fourth Amendment to the US Constitution in the same case
-- specifically, in the body of the case. When I opened up the case
this morning, I found no notice from the TSA, but the slip of paper
on which was written the Fourth Amendment was tucked into the very
top of the case, where the headstock of the guitar goes. The paper
does not fit through the neck of the case; it could not have slid
there by its own volition, even when tossed by the roughest of
baggage handlers.
I'll shortly discover whether the TSA's X-ray machine affects ASA
400 film, too. The TSA goon who demanded to put my camera and its
film through that machine did let pass an unintentionally helpful
hint, though: to avoid damaging any film, just tell the TSA goons
that its speed is ASA 1000!
2007-07-16 On debugging in Scheme
The two Scheme systems that I use primarily these days are MIT
Scheme and Scheme48
. Both have decent debuggers that work for most
cases. Both have facilities for limiting output to avoid excessive
output for deeply nested data structures, or infinite recursion in
the printer for circular data structures. Both have rudimentary but
working mechanisms for examining pretty much all data in the system.
(I can make objects that Scheme48 and MIT Scheme fail to inspect,
but most people can't, and this is an uncommon operation.)
Let me make this clear right now: these are rudimentary tools. They
have some minor niceties (for instance, the usual mode of operation
in MIT Scheme's debugger is to have a REPL within every stack
frame), but these minor niceties are not especially profound.
Rudimentary as these tools are, though, they *work*, and they
suffice to examine and debug any reasonable program.
And MIT Scheme and Scheme48 are nearly unique in the respect of
having general, if rudimentary, debugging tools. This baffles me.
This flummoxes me. This flabbergasts me. (No, this is not
something new; this is something that I have been aware of for a
long time, and privately fumed about.)
Today I wanted to debug a program in Gambit. Gambit, too, has a
rudimentary debugger. Yet it completely fails in two prominent
ways:
(1) There is no limitation to the output at the REPL. (The values
of local variables on stack frames are printed truncated, but that
doesn't help if I want to inspect them more thoroughly.) Although
it is possible, with a bizarre incantation about *read* tables
(boggle) to protect the *writer* against circular output, there is
no apparent provision for limiting the depth of the output, and the
default is for the writer and pretty-printer to lose horribly with
infinite recursion until you hit C-c. And the default format for
printing records involves recursively printing all of the records'
components. Graphs? You lose.
(2) There is no object inspector by which to interactively inspect
the components of objects without having to grovel through megabytes
of output from an infinitely recurring writer. No, there is not
even a way to inspect the local variables of a closure found on the
stack. In fact, I couldn't even find a way to look at the condition
that was signalled.
And Gambit is better than most other Schemes.
Chicken will print a spare trace of call frames that were on the
stack. That is all. It does not even show the nesting of the
frames; garbage frames that have already returned, but are retained
on the stack because of Chicken's architecture, are shown along with
everything else. And the stack is traced *after* the error message
is printed, so if the error message printed a complex object, what
the poor, deprived programmer will see as a trace of the call frames
on the stack is procedures used to print the error message --
information that is completely irrelevant to the actual problem.
I wrote an object inspector for Chicken (the `dissector'), but it's
useless because (1) records no longer contain any introspective
information, and (2) it is impossible to get at the objects in call
frames on the stack.
DrScheme will give a helpful display of a flurry of arrows covering
the whole source file in question, so that programmers can see by
what flow of control they arrived at the error. Flow of data?
What's that?
If you plan to create a new implementation of Scheme which you mean
to be useful to other humans, *please* implement a general debugger
and object inspector. The state of debugging tools in Scheme is
absolutely appalling and utterly inexcusible.
(I'm told that Chez has a general, if rudimentary, debugger as well;
because Chez is commercial and proprietary, however, and the free
software Schemes out there meet my needs far better than Chez does
anyway, I don't have a licence for Chez with which to experiment
with its debugger.)
2007-07-15 On the public domain, licence terms for software, and
practical legal choices
I am not a lawyer. This is not legal advice. I have not pursued
legal studies, and have no background in civil litigation, copyright
law, or contract law. (The terminology that I employ in this
article will without a doubt induce any lawyer to cringe.) For that
matter, I'd rather devote my time to hacking than to burying my head
in legal studies for decades: Scheme is a much nicer language for
expressing ideas concisely and clearly than English, but
unfortunately our laws are expressed in English.
It is precisely because I am more interested in hacking than legal
analysis, in fact, and because I should prefer never to worry about
embroilment in courts of law as a consequence of careless decisions,
that I write now.
For the past couple of years, I have been releasing my publicly
available software into the public domain. I ought to be precise,
however: I have been releasing said software with a message claiming
that it is in the public domain. I ignored remarks equivocating the
notion that any works can be released into the public domain, my
reason being that it was simply absurd for an author to be unable to
forswear any and all claims of copyright: surely, if an author has
no intention of claiming copyright, she ought to be able to
relinquish the right of doing so. And the simplest status for works
lacking copyright claims is release into the public domain.
Not only did I place message stating that the code was in the public
domain, but I also added a disclaimer of all warranties -- a rather
inept and negligent disclaimer, which read `All warranties are
disclaimed.' My reason for this was that I preferred to avoid
cluttering all of my files with long, uppercase, distracting
disclaimers of all warranties that some lawyer at Berkeley could
think of and enumerate; and that I preferred also to avoid requiring
the distribution of a separate LICENCE file with every other file I
released, purely for the sake of laziness.
What I failed to consider, though, was the actual implications of
these messages. Consequently, on realizing this, I briefly
considered the following line of reasoning, after encountering
another suggestion that works may not clearly be released into the
public domain:
1. What matters in the preamble to the software is what terms a
court of law will accept and base decisions upon.
2. If a court of law will not accept a statement that the software
is released into the public domain, there is no reason for it
to prefer the acceptance of a statement of licence terms: one
could imagine two copies of a file, one with the terms and one
without, and of neither one could anyone claim authority.
My second point was faulty, though, because it questioned only
witness credibility, which is an unrelated issue; and I omitted a
third point, on contract law, warranty, and gifts. I honestly don't
know whether a court of law will accept the preamble to a source
file, but my guess is that it probably would. The more important
question is whether a user (perhaps some corporate entity -- whether
good or evil, according to your favourite definition of that
spectrum --, or some governmental entity, or anyone) can have
confidence in the terms set by the author.
There are two aspects of the software in which users may wish to
have confidence: (1) the users want to be confident that the
copyright and licence terms are fixed, and that the author will
never pursue or even threaten litigation for violations of licence;
and (2) the users may also want the power to pursue the author for
damages if the author's software fails at the cost of the users.
Contrariwise, the authors of free software (except perhaps djb)
prefer to disclaim themselves of any such warranties.
[This entry is not yet finished. Sorry.]
Summary: The public domain is murky, even if it may seem to make
common sense. I prefer the modified (3-clause) BSD licence for
similar terms and greater protection, or at least a clearer idea of
the protection for all parties involved. It is also fairly concise.
Furthermore, it has been examined by lawyers. [I don't know whether
it has been tested in court, and don't have the time or inclination
to research this in greater detail. Any references to cases
involving the 3-clause BSD licence would be welcome, though.]
I repeat, though, that I am not a lawyer, and that this is not legal
advice.
If the terms of the modified BSD licence are not to your taste, John
Cowan has a utility for choosing another one by some simple
criteria: .
2007-07-14 Summarizing several issues with compilation and
interpretation
I ran out of steam writing last month about compilation and
interpretation, so here I'll try to summarize as succinctly as I can
a few points about which a disturbing number of people are confused.
Look to the previous entry for definitions of `interpreter' and
`compiler'.
1. Most implementations of Lisps are not pure interpreters; most are
based on compilers in some way or another. Call them
implementations, not interpreters or compilers: they may *have*
interpreters or compilers, but they are not themselves
interpreters or compilers.
2. A `compiler' is not a tool chain that turns source code into
stand-alone executables. A compiler is a tool that analyzes the
meaning of programs and generates programs in another language
with equivalent meaning.
3. Your physical machine is an interpreter. The universe is a kind
of interpreter as well.
4. No compiled program is `stand-alone'. Not only do compiled
programs still require interpreters to execute them, but they
also must be structured in the appropriate format for the
intended operating system under which they will run, and they
generally require support from some run-time libraries. It just
so happens that most computers that most people are interested in
developing programs for have libc built-in. It also happens that
most computers don't already have the run-time libraries needed
by Lisp programs.
2007-06-19 Concerning compilation and interpretation, and an
interpretation of the compilation of confusion surrounding their
meaning and execution
I often find myself in an unpleasant proximity to a bickering
arglebargle about whether compiled languages or interpreted
languages are better; whether Java is compiled or interpreted;
whether Sun's Java implementation (or others) is a compiler or an
interpreter; whether Lisp can conceivably be compiled, if it has
EVAL built-in; how Lisp systems like SBCL can be compilers if they
can't spit out files that someone's favourite operating system can
execute (which is, in fact, not really true); and other matters of
confused frivolity concerning compilation and interpretation.
Sometimes I even find a participant or two in the discussion who
tries to allay the heat by suggesting that the distinction between
compilation and interpretation is murky at best, citing examples
like Java implementations that involve both Java to byte-code
compilers and byte-code interpreters, or modern x86 implementations,
which translate the x86 instruction set every time the i-cache is
refilled into an internal more RISC-like format to execute.
Sometimes I even do this myself, if I am in a hurry and lack the
time to explain the subject more precisely.
Let us commence with definitions:
/Compilation/ is the analysis of a program's meaning, rendering
the same semantics into another language. A /compiler/ is a
total function from programs to meanings.
/Interpretation/ is the execution of a program given the state of
the machine, either diverging or yielding a final answer. An
/interpreter/ is a partial function from programs and machine
states (or simply machine states, if the program is a part of
the machine state) to answers.
(Programming language theoreticians in the audience will recognize
that compilation corresponds with denotational semantics and that
interpretation corresponds with operational semantics. I hope,
however, that such members of the audience are already adequately
familiar with the concepts that they need read no further. I hope.)
A usable language implementation that is based on a compiler also
requires an interpreter. When most people hear about `compilers',
they are hearing about compilers that map programs in languages like
C to machine programs, i.e. programs in languages for the
instruction sets implemented by physical processors such as the x86,
the PowerPC, the SPARC, the PDP-10, and so on.
[This entry is not yet finished.]
2007-05-13 SLIME48
(Apologies: the post prognosticated by the previous preparation will
be postponed.)
After several long months of version incompatibility, there is a new
version of SLIME48 available that will work with CVS SLIME as of
2007-05-13. A tarball is available here:
.
Bug reports are welcome. Further information is at
.
2007-05-09 In preparation for a clarification of macros in Scheme,
and their implications on style
This blag has remained dormant for a very long time. I must deeply
apologize to my avid readers who no doubt were left in a state of
prolonged mourning for the stagnancy of their only worthwhile source
of meaning in their lives. In a short while, however, no longer
will you poor souls remain cold, alone, and bereft of your font of
Schemerical wisdom, for I shall soon bring flames upon the now
cowering subject of macros in Scheme -- from DEFMACRO to
SYNTAX-RULES, and all in between and beyond --, and their
philosophical implications on style and expressiveness of programs!
For you who cannot stand to wait until the weekend (when I hope to
find time for this inflammatory ecriturial extravaganza), I give you
a brief taste, the draft of a Lisp style guide whose composition I
commenced this past weekend:
2006-09-16 On networking interfaces in Scheme
Most Scheme systems have very limited networking interfaces, usually
with the notion that BSD sockets are not nice, and that `high-level'
verbs like `open a TCP connection to this IP address, on this port'
are nice. This is a bad idea, and Scheme systems ought to support
wrappers around the whole BSD sockets API.
The BSD sockets API is really a fairly extensive generalization of
many different flavours of sockets, classifying them according to
types with common operations (stream sockets, datagram sockets,
&c.), across many different networking domains (IPv4 and IPv6 being
the most prevalent today, but of which there are many others, some
of which remain useful, like local Unix sockets). But instead of
taking advantage of this unification set down by BSD sockets, most
Scheme systems *specialize* the interface to provide wrappers for
TCP (i.e. internet stream sockets), and sometimes UDP (i.e internet
datagram sockets), and sometimes local Unix sockets (with one of two
types).
In the first place, this is poor practice simply because of the
amount of work necessary for the same returns. In essence, the
specialized wrappers tear down a layer of abstraction merely by
pretending that it isn't there -- while actually building on top of
it. The result is that different specialized wrappers must be made
for everything intended to be wrapped, and a great deal of code must
be duplicated. There is a great deal of C code out there, used as
the basis of Scheme networking libraries, that duplicates the same
system calls just with different constant arguments (SOCK_STREAM vs
SOCK_DATAGRAM, AF_INET vs AF_INET6 vs AF_UNIX, &c.), or different
types of socket addresses (sockaddr_in, sockaddr_in6, sockaddr_un,
&c.). Few Scheme systems expose these constants and socket address
structures to Scheme; most just hide the `messy details' in C.
Actually, `the same returns' is not quite true anyway. The wrappers
are generally very fixed in form, and inextensible. Most don't even
support any socket options, or have specific procedures to deal with
specific socket options. Few have any support for multicast IP. I
have never seen support for out-of-band data transmission in any
specialized socket wrapper for Scheme. The distinction between
shutdown and close is usually either absent or swept under a rug.
A general BSD sockets wrapper gets all of these for free, on *any*
socket type or domain (except for multicast IP, of course), and can
be made to be easily amenable to extension for anything not already
wrapped, as long as sockets' descriptors can be exposed
straightforwardly in an FFI. Specialized wrappers have to duplicate
all this functionality for every type and domain of socket they
distinguish between, and they rarely expose any internal interface
for FFIs -- that, or they simply don't support any of it.
Protocol independence in programs is impossible with specialized
OPEN-TCP-CONNECTION procedures: IPv6 has been around for over ten
years, and many standard libraries support it trivially through
abstractions like BSD sockets in conjunction with getaddrinfo(3),
yet programs that support IPv6 well are few and far between because
programmers specialize the sockets with the AF_INET family -- and
Scheme systems that specialize the BSD sockets to this family
specifically *discourage* protocol independence.
Unfortunately, programmer laziness in conjunction with a misguided
desire for `nicer' interfaces tends toward not supporting any more
esoteric capabilities of BSD sockets or flexibility for later
extension or protocol independence.
Of the Schemes I can think of off the top of my head, only scsh and
Gauche have BSD sockets interfaces. These all have specialized
wrappers, usually just for TCP, and some UDP, though I think none
support any local Unix sockets: Scheme48, Chicken, PLT, Gambit,
RScheme, MIT Scheme, SISC...
2006-09-14 R6RS
The first public draft of the Revised^6 Report on the Algorithmic
Language Scheme has been released, at .
The module system, the string API, and the arithmetic specification
are all still quite broken. A number of other smaller problems
pervade the report. They claim that an operational semantics has
been written, but it's not included, and the denotational semantics
has been taken out. Why, I do not know.
I'm not sure whether I have enough interest left in it to rant much
more about it.
2006-09-06 Zob veeblefitzer
Foo bar baz quux zot.
Frob grovel full lexical...
Mumble; chumble spuzz.
2006-07-30 Update: Guillaume Germain found in quantum superposition
between Washington and Minnesota
Title says it all, folks.
2006-05-29 Missing: Guillaume Germain, author of Termite
Monsieur Germain was last seen a little over a week ago in the
#scheme IRC channel on Freenode, declaring that he was preparing to
undertake the dark journey into the Lesser Evil Empire of Amazon.
We can be thankful that he turned down Temptation when she came to
his door with an interview at the Greater Evil Empire, but now, even
at the Lesser one, he has disappeared and not been heard from in
over a week. The Scheme Underground has not published a press
release stating that he has possibly been detected in a dark, dank,
and dingy dungeon in a basement somewhere in Washington, but the
details are sketchy. Preparations are not being made for a raid in
the area.
[Update: M. Germain appeared spontaneously to inform me that his
surname is spelt without a trailing `e,' and subsequently vanished
just as spontaneously. Whereabouts are still unknown.]
2006-05-29 Pink chickens
Chickens, under their feathers, are pink. I don't know whether any
of you have ever plucked and feathered a chicken, but it turns out
pink underneath. At least, this is what I presume, because I have
not myself ever feathered a chicken. I have seen only featherful
chickens in farms and chickens in supermarkets which other persons
have already had the kindness to feather for me. Still, I can
assure you that chickens are pink under the feathers.
Now, what are really peculiar are chickens whose *feathers* are
pink, and I mean not just slightly pinkish but really bright, hot
pink. These things are really bizarre, and I have never seen one in
my life; nor have I ever heard of them. But can you imagine seeing
a chicken there, yet with blazing pink photons streaming at your
retinas, instead of their ordinary brownish reddish hue?
[Update: I am informed, among the voluminous profusion of
commentary from readers, that an ideal such chicken would be five
feet in height and capable of wearing a saddle.]
2006-05-01 On control brackets and resource release
[Note, 2010-01-01: There are some errors in this entry. See the
entries
2009-02-04 Correction to control brackets and resource release
2009-03-21 Correction to the optimization of resource release in
control brackets.
For a conciser summary, see the entry
2009-03-28 More simply addressing the confusion about UNWIND-PROTECT
in Scheme.]
DYNAMIC-WIND has long been a source of contention in the Lisp world.
It serves a very particular purpose, but this purpose is not very
well understood, and many think that this purpose is the same as the
purpose of other Lisps' UNWIND-PROTECT. While the general idea of
UNWIND-PROTECT is useful, it is not directly applicable in Scheme,
because Scheme has more powerful control abstractions than other
Lisps', and it therefore requires more powerful means of controlling
them.
(UNWIND-PROTECT form protection-form ...), a special form, evaluates
the first form and returns its value, but before returning the value
also executes the protection forms. Furthermore, if a throw in the
primary form transfers control to outside the whole UNWIND-PROTECT
form, this, too, will execute the protection forms. This allows
programmers to *reliably* clean things up at certain points in the
program -- for instance, to close open files, to shut down sockets,
to release database handles, &c.; here is hypothetical example of
its use:
(define (call-with-input-file pathname procedure)
(let ((input-port (open-input-file pathname)))
(unwind-protect (procedure input-port)
(close-input-port input-port))))
(DYNAMIC-WIND before during after), a procedure of three procedural
parameters, calls the during thunk, ensuring that any transfer of
control into it calls the before thunk, and that any transfer of
control out of it calls the after thunk. Normal control transfers,
i.e. the initial call to the during thunk and its final return,
qualify as control transfers too. This means that, any time control
is inside the dynamic extent of the during thunk, any state
established by the before thunk will be in effect, but any time
control is outside its dynamic extent, that state will be torn down
by the after thunk. Here is an actual example of DYNAMIC-WIND's
use, from the Edwin text editor:
(define (with-current-local-bindings! thunk)
(dynamic-wind
(lambda ()
(install-buffer-local-bindings! (current-buffer)))
thunk
(lambda ()
(uninstall-buffer-local-bindings! (current-buffer)))))
This ensures that, within THUNK, all Edwin variables have any values
specified by the buffer's local bindings. This is the kind of use
that DYNAMIC-WIND is appropriate for: a sort of control bracket that
ensures the presence of certain dynamic state or context within a
certain dynamic extent.
Now, unfortunately, DYNAMIC-WIND can be seen as somewhat similar to
UNWIND-PROTECT, because for basic cases (involving no sophisticated
exploitation of Scheme's continuation reification) DYNAMIC-WIND with
a null entrance thunk seems to be equivalent to UNWIND-PROTECT.
This is misleading, however. Either control may return to the
dynamic extent of the during thunk in the DYNAMIC-WIND, in which
case the after thunk will be called twice -- which is unacceptably
wrong --, or the before thunk really signals an error if it is
called twice. This, though, violates the philosophy described in
the first sentence of the R5RS:
Programming languages should be designed not by piling feature
on top of feature, but by removing the weaknesses and
restrictions that make additional features appear necessary.
Signalling an error in a DYNAMIC-WIND before thunk is usually a
limitation and a weakness in a program. We have an *extremely*
powerful means of abstraction of control -- CWCC[1] -- and we ought
not to prevent it from working merely because we want to prevent a
certain fragment of code from running more than once. This should,
however, spur a deeper look at why the code is being run more than
once in the first place.
DYNAMIC-WIND is meant to establish local dynamic state for specific
dynamic extents within programs, by setting it up in before thunks
and tearing it down in after thunks. These operations, to set up
and to tear down dynamic state, are meant to be reversible, local
changes.[2] That is, permanent actions, such as closing a port,
that one would put in a protection form of an UNWIND-PROTECT, are
entirely inappropriate in a DYNAMIC-WIND after thunk.
Yet the basic idea of UNWIND-PROTECT is still useful. What,
exactly, though, is useful from UNWIND-PROTECT, and what is
incompatible with Scheme? The concept of permanent finalization is
useful, and the identification of a good point to perform it is
useful. *Requiring* that it be performed at this point, though, is
incompatible with Scheme, because this point may be passed several
times, yet we must perform the finalization once and only once.
Many languages, and many Scheme systems, provide ways to ensure that
certain code be run when an object is no longer reachable. We can
apply this same idea, in fact, to continuations! The protection
thunk in an UNWIND-PROTECT of a lesser Lisp is run only once the
continuation is about to become unreachable. The only difference
between Scheme and CWCCless Lisps is that the point at which
continuations become unreachable is unfixed in Scheme.
So we must use a general object finalization mechanism -- for the
kind of object known as a continuation. This is easy to concoct,
though: we need only to make an object, to associate with that
object the protection thunk so that it will be executed when the
object become unreachable, and to make a link from the continuation
to that object. There is an implementation of this in MIT Scheme at
the end of this entry.
This, though, is non-optimal. The protection thunk may run at any
time in the system. We should *like* it to be run as soon as
control returns through the continuation in question, but this is
valid only if that continuation has not been reified and will not
ever be reused. We can say that a continuation is /dirty/ if it has
been reified and may be instantiated again, and that a continuation
is /clean/ if and only if there will be exactly one instance of it,
which will be destroyed as soon as control returns through it.
If we store a bit in any continuation involved with UNWIND-PROTECT,
i.e. any continuation with an associated protection thunk, then we
can initially clear the bit, to indicate that it is clean, and
instrument CWCC to mark all such continuations as dirty by setting
their bits. When control returns through a protected continuation,
it checks the bit. If the bit is clear, then the continuation is
clean, and it runs the protection thunk, because control will never
again return to this point; it also deregisters any finalization
procedure. If the bit is set, then the continuation is dirty, and
general finalization system for objects of unlimited extent will
handle the protection thunk.
(We need to be careful in the case of a reified continuation
instantiated on the stack whose heap storage has been reclaimed.
The continuation will be marked dirty, but before control returns
through it for the last time its protection thunk might be run,
because the GC could finalize its heap storage.)
Both DYNAMIC-WIND and UNWIND-PROTECT are useful for different
applications in a language with powerful control abstractions such
as Scheme. DYNAMIC-WIND, though, does not subsume the operation of
UNWIND-PROTECT, and UNWIND-PROTECT cannot be taken literally from a
language with weaker control abstractions. By examining the core of
the intent of UNWIND-PROTECT, though, we can separate its actual
purpose from an optimization. We can implement the actual purpose,
and then analyze cases where the optimization can still be useful,
in order to provide UNWIND-PROTECT efficiently in Scheme. The
result is that we can retain the powerful control abstraction of
CWCC and in the same language integrate the same idioms of resource
release as other languages provide.
Of course, there are still some limited cases where restriction is
really necessary. There are a number of procedures in MIT Scheme
that work with sensitive strings, and into which reentry is not an
option, in order to enforce the proper destruction of sensitive
information. In this case, DYNAMIC-WIND is appropriate to reject
reentry and to nullify sensitive strings on exit.
[1] Some would argue that composable continuations ought to be
enough for everyone, but the issues here are exactly the same
whether we use CWCC or composable continuations.
[2] This is not *exactly* true, but symmetry is a very desirable
property of DYNAMIC-WIND before & after thunks, and reversible
set-up & tear-down operations are usual.
Appendix: Variation on a Theme of UNWIND-PROTECT
(define continuation-protectors
(make-gc-finalizer (lambda (protector) (protector))
cell?
cell-contents
set-cell-contents!))
(define (make-protector-cell protector)
(let ((protector-cell
(make-cell (let ((protector-state (get-dynamic-state)))
(lambda ()
(let ((state (get-dynamic-state)))
(set-dynamic-state! protector-state #f)
(protector)
(set-dynamic-state! state #f)))))))
(add-to-gc-finalizer! continuation-protectors protector-cell)
protector-cell))
(define get-dynamic-state
(access GET-DYNAMIC-STATE
(->environment '(RUNTIME STATE-SPACE))))
(define set-dynamic-state!
(access SET-DYNAMIC-STATE!
(->environment '(RUNTIME STATE-SPACE))))
(define-syntax unwind-protect
(syntax-rules ()
((UNWIND-PROTECT form protection0 protection1 ...)
(*UNWIND-PROTECT (LAMBDA () form)
(LAMBDA () protection0 protection1 ...)))))
(define (*unwind-protect thunk protector)
(let* ((protector-cell (make-protector-cell protector))
(result (thunk)))
;; This ensures that the continuation will hold a reference to
;; the cell, so that it will not be garbage-collected until
;; after either control returns to this continuation or a
;; throw causes this continuation to be discarded and made
;; unreachable.
(identity-procedure protector-cell)
result))
2006-04-17 Microsoft vermin
I am shocked -- *shocked!* -- to hear that the developer of Termite
had an interview for a potential job at Microsoft. (Termite, for
you ignorant wretches, is Erlang-in-Scheme for Gambit. Find it on
your own.)
2006-04-07 XEmacs S-expression motion
Turns out paredit.el 20 beta doesn't support XEmacs very well after
all. The problem is that XEmacs's S-expression motion commands
don't signal SCAN-ERROR conditions like GNU Emacs does: they just
signal general ERROR conditions. Programs *cannot* usefully
discriminate S-expression motion scan errors from other kinds of
errors, and, on software engineering principles, I refuse to modify
the code to catch and ignore all errors. Grrrmph.
(If any XEmacs developers read this (as if): Could you please
either explain why the S-expression motion code signals ERROR
conditions rather than SCAN-ERROR conditions in specific, or, if
there is no good reason, fix XEmacs?)
2006-03-30 Immobilizing blobs
There is a furry, orange blob on my lap preventing me from moving.
I believe the scientific classification for this kind of blob is
`cat.' Luckily, the state of immobility is only temporary,
apparently regulated by some sort of nutritive urges in the blob,
and also occasional boredom and needs for entertainment.
2006-03-23 I got better...
For those of you who actually *read* this thing and were a bit
worried after my 2006-03-07 post, I've recovered mostly, although
I'm maintaining a good cough. But apparently you remain ill, since
you haven't overcome the affliction of blag addiction, as you're
still reading this now.
2006-03-10 paredit.el for XEmacs; deletion keys
Recently I have been fiddling with making paredit.el work for
XEmacs. Not many changes seem to be necessary -- although I haven't
yet tested it exhaustively under XEmacs, as I'm basing my changes on
Martin Gasbichler's port to XEmacs --, since paredit.el consists
mainly of pretty basic elisp, but one of the most annoying problems
to deal with was getting the deletion keys (forward-delete and
backward-delete, or Delete & Backspace) right.
[I forgot to finish this post, so it was sitting in an autosave file
for two weeks, and, though I had a big rant all planned out two
weeks ago about getting deletion keys right, I've forgotten what I
had to say. Sorry. Also, paredit.el for XEmacs isn't finished yet
because I keep forgetting to get around to it.]
[Update, 2009-12-31: Thanks to gjs, I now have an arbitrarily large
collection of round tuits. Fortunately, I think I made paredit run
on XEmacs a couple of years ago.]
2006-03-07 More flu!
Not only was the flu great fun, but, after only a few days of having
seemed to be mostly rid of it, it has decided to come back to bring
*more* joy to my life! Mmmmm...
2006-02-24 Flu!
The flu is great fun. I recommend that any of you who have not had
the pleasure of experiencing it, find a contagious surface
*right*now* and infect yourselves with the glee of knowing what fun
and games await you!
2006-02-23 Paredit.el versions 18 & 19
This happened a while ago, but paredit.el 18 is now the released
version, and paredit.el 19 is the new beta. See
;
find the relevant files yourself. Version 19 supports insertion of
balanced delimiters other than parentheses, and a fancy S-expression
transportation device suggested by Guillaume Germain, but I haven't
yet finished the latter and put it up on the web yet.
2006-02-06 On the dangers of blaggery
Until several acquaintences coerced me into recording a `blog' (a
term which I abhor almost as much as the concept it denotes), I
tended to avoid the the things in general, excepting semi-frequent
browsing of Planet Lisp . There are a
number of reasons for my eschewal of the reading or writing of
blags.
The first reason is the content, or absence thereof. There is a
tendency among blag writers to write when it amuses them to write,
not when they have anything to say. I need not elaborate further on
this point; I am sure that the reader is quite familiar with the
saturating profusion of content-free blags throughout the web.
The second reason is that blags are astonishingly efficient sinks of
time. Right now I ought to be doing something else. But I am not
doing something else. This is because there is some aspect of the
nature of blags that compels their authors to write entries, and,
even in the rare case that the author has anything worthwhile to
say, the time composing the entry would be better wasted on a
different amusement device, such as a cat. (More accurately put,
the author ought to spend more time amusing the cat than dribbling
drivel onto the web.)
This urge to write blag entries instead of doing useful things such
as acquiescing to feline demands takes its toll also in other ways.
For example, programmers will often briefly emit mutterings on their
blags about their latest objects of fascination, without releasing
their code. If the existence of that code is an equivocal one, this
sputtering of substanceless spoutage is known in some circles as
'wanking,' a technical term. Wanking is not, and never has been,
desirable behaviour to be associated with. Actual code can speak
for itself, but speaking for that which has no existence, or at best
a vaporous existence, is about as useful to others as a fish is to a
bicycle. Or something like that. Yet these programmers persist in
writing blag entries instead of substantial code.
Finally, the blag is usually directed at an unclear audience. Since
there is no particular audience that the blag is intended to cater
to, and so for any particular audience much information is discussed
in much greater detail than is necessary, and much other necessary
information is omitted. Arguments are ill-defended because the
blagger has no connection to his audience when writing the blag
entry; he must build all the argument on a nebulous idea of what and
whom he's trying to argue against. The result is poorly laid-out,
ill-conceived, and unpersuasive arguments or other types of entries.
Thank you for slogging through the awful post above. I hope that
its mere form and existence have convinced you of what it failed to
argue for itself. At the very least, perhaps it will dissuade you
from assigning further merit to this blag!
2006-02-04 Disturbing blaggery
I have noticed a disturbing urge to write more material in this joke
of a text file in the past few days. I will have to quell this
strange compulsion. I feared it would happen from the start.
2006-02-03 Option types, optional parameters
Languages like ML and Haskell that are heavily based on pattern
matching have a very simple but elegant mechanism for representing
value and absence of value without employing a special value that is
still a value like any other: option types. The basic idea is that
there is a type which I shall denote with (OPTION ), where
is the type of the value that may be present. Two kinds of
terms have this type: (SUPPLIED ) and (ABSENT), where
has the type . For example, a TABLE-ENTRY procedure might
return (SUPPLIED ) or (ABSENT) to denote that
there is no value in the table associated with the given key.
The idiom in Lisps tends to be to use the false constant (#F or NIL)
for representing the absence of value, but this inhibits #F or NIL
from being itself a present and useful value. (The conflation of
the false constant and the empty list in old, crufty Lisps
exacerbates this.) This poses a problem, for example, if you want a
general table mapping some key to any object -- including #F -- but
you also want to be able to determine when an association is absent
in the table. I call this conflation /domain contagion/, because
NIL or #F or the symbol NOT-HERE are all still in the domain of
expressible values, when they should be kept apart, because their
whole purpose is to express the *absence* of an expressible value.
There are actually two different problems here. The first I shall
leave for another day, although it is more relevant to the
TABLE-ENTRY example. It is that procedures like TABLE-ENTRY must
convey information about control flow to their callers via data
flow. This indirection ought not to be necessary. Olin Shivers
wrote a paper called 'Multi-return function calls' (although I think
that 'multi-continuation procedure calls' is a better term) about
this fairly recently. The basic idea is that, rather than giving a
procedure a single continuation, which itself will discriminate the
data that the procedure returns to it, the caller supplies a
*choice* of several continuations, and the callee picks one to
return to, discarding the rest. The problem of denoting absence of
value in procedures like TABLE-ENTRY can be solved by passing two
continuations to TABLE-ENTRY: one for the case where there is a
value (which continuation accepts one argument), and one for the
case where there is no value (which continuation is nullary). I
recommend the paper; I may write something on the subject later,
too.
The other problem is not in eliding the indirection of data flow,
but in storing data. Here is a simple example: optional parameters.
There is no control flow involved here. If a procedure accepts
optional parameters, it simply has several variables which may or
may not have been supplied as arguments. Now, while there are some
cases where it would suffice to specify in the lambda default values
for the case of absence of the arguments, there are other cases
where it is useful either to pass on the optional parameter's value
to another procedure (and retain the supplied/absent status) or to
conditionalize on whether the parameter was supplied in order to do
something fancier altogether.
Here are several strategies currently employed by existing Scheme
systems:
1. Give no provision for specifying default values in the lambda
syntax itself, and instead
a. use a special value, such as #F (DSSSL) or some 'default
object' token (current MIT Scheme).
-- This is subject to domain contagion as discussed
above.
b. make references to the variables corresponding with
optional parameters trap, and provide a special form
DEFAULT-OBJECT? which tests whether references to those
variables would trap in order to conditionalize on their
presence as arguments (older MIT Scheme).
-- This requires some pretty kludgey changes to the
semantics of variables in the language. Furthermore,
it inhibits the passage of unsupplied optional
parameters on to other procedures: you can't simply
pass on an unsupplied parameter to another procedure
by passing the value of the corresponding variable,
because references to that variable trap. (MIT
Scheme no longer uses this semantics for optional
arguments.)
2. Require that the default value be supplied. (PLT's OPT-LAMBDA)
-- This is inconvenient in some cases, and, while there are
ways around domain contagion with this strategy, they are
pretty inconvenient and they defeat simple static type
analysis (which is still useful for dynamically typed
languages like Scheme):
(LET ((NOT-HERE (CONS 'NOT-HERE '())))
(LAMBDA (... #!OPTIONAL (X NOT-HERE) ...)
... (IF (EQ? X NOT-HERE) ...) ...)))
3. Bind another variable to a boolean flag indicating whether an
optional parameter was supplied. (Common Lisp)
-- This fixes most of the disadvantages of the above, but it
still does not allow the convenient passage of unsupplied
optional parameters on to other procedures. It can also
get to be syntactically pretty heavy-weight in usage.
All of the problems of domain contagion, type analysis defeat,
passage of unsupplied optional parameters onto other procedures, and
complexity of usage, would be alleviated by the introduction of an
option type as in Haskell or ML (or even dynamically typed languages
heavily focussed on pattern matching like Erlang). There would be
two remaining issues: undesirable boxing, and the method by which to
conditionalize on option types.
When it comes to data already stored in the heap, the boxing of an
option type is not much of an issue; it is merely one more layer of
indirection. In fact, on, say, a 64-bit system, where three tag
bits is ideal, one possibility would be to use one of those tag bits
for flagging option types. (This may not be a great idea, but it is
an idea nonetheless which might bear worth experimenting.) As for
optional parameters, for which run-time costs of heap allocation is
not a reasonable option (sorry for the pun), we must first assume
that the compiler knows about the option type in the first place;
with this assumption in mind, it is not difficult for the compiler
to recognize trivially optimizable tests of the option's presence.
For example, it could map such tests to comparisons against an NARGS
register.
The syntax for testing optional data might be, without pattern
matching, a bit clumsy. There are a couple ways it might be
accomplished: a new special form could be added, (IF-OPTION (