This is Taylor R. Campbell's blag. -*- outline -*-
Copyright (c) 2006--2013, Taylor R. Campbell.
Verbatim copying and distribution of this entire article are permitted
worldwide, without royalty, in any medium, provided this notice, and
the copyright notice, are preserved.
2013-03-21 Fun with C (or, @!*#%!#^&!^#$^@!%#)
Consider the following fragment of C code with the type T defined
to be an integer type:
uint64_t x = UINT64_C(0x0123456789abcdef);
T n = 64;
printf("%016"PRIx64"\n", (x & ~(n - 1)));
Say we're on amd64, with
sizeof(char) = 1,
sizeof(short) = 2,
sizeof(int) = 4, and
sizeof(long) = 8,
and two's-complement arithmetic -- nothing exotic or pathological
here.
Without running the code, can you guess what it prints for each of
the following definitions of the the type T?
typedef uint8_t T; typedef int8_t T;
typedef uint16_t T; typedef int16_t T;
typedef uint32_t T; typedef int32_t T;
typedef uint64_t T; typedef int64_t T;
Hint: It's not 0123456789abcdc0 in every case.
2012-10-31 Moving mathematics forward
I have made a tremendous breakthrough tonight in the greatest open
problem of all mathematics, the Riemann hypothesis, by finding an
example of the elusive Riemann zeta pumpkin. Based on my extensive
research into the literature on the subject[*], I believe I am the
first to have come across such a beast. Unfortunately, I don't
know how to talk to the camera with which I catalogued my evidence
of this critter, so proof by photography will have to wait until I
figure out how to talk to a USB PTP or MTP device in NetBSD.
[*] 30 seconds of browsing Google image search results.
[Update, 2012-11-02: I tried gphoto2 and libmtp, but neither
worked. gphoto2 can identify the device (`gphoto2 -a'), but
everything else fails with a `bad parameter' error or something.
libmtp can't even seem to identify that there is a device. ktrace
just shows that ioctl(USB_DO_REQUEST) fails with EIO or something;
I think I need to dig into the USB layer to find out what's wrong.
I'll add more details when I have physical access to the device
again.]
[Update, 2012-12-02: Here's a photograph:
http://mumble.net/~campbell/images/riemann-zeta-pumpkin.jpg
I'm afraid I cheated, though.]
2012-10-17 What's an epigraph?
An anonymous interlocutor asked me to define the word `epigraph'
some months ago.
An epigraph's a witty pith,
or perhaps a pithy wit,
That you could start a chapter with
to prime the reader's mind a bit.
2012-10-04
My verse may be worse but your prose hurts my nose.
(dribblings from a late night when bad poetry was coming out like a
fit of sneezes)
2012-08-24 What's wrong with Python
I just found this Python code in the wild:
f = open(pathname)
try:
x = foo(f)
finally:
f.close()
process(x)
This is wrong at so many levels I don't know what to say. And it
wouldn't be fair to blame whoever was responsible for putting the
code in this state, because this is very nearly the right way to
write it in Python, and Python will happily accept this in spite of
its insensibility.
2012-05-23 Energy and computation
Critics have noted that the premise in _The Matrix_ of using the
heat of human metabolism as an energy source, the energy premise,
is hopelessly contrived and detracts from the film. And, indeed,
it is hopelessly contrived, but this doesn't bother me because I
have a much better, more plausible premise to replace it -- and a
reason why they would never have used the better premise in the
film.
Instead of the energy premise, I take a computational premise: what
the machines exploit the humans for is not energy but computation
-- human brains are very good at learning quickly and optimizing
approximate solutions to NP-hard problems. That way, the machines
don't need to reverse-engineer the human brain or develop (`learn')
approximation algorithms of their own.
The computational premise, though, doesn't lend itself to a cute
image of Morpheus holding up a D-cell and saying `...turning humans
into this.' The energy premise is easy to explain, if contrived;
whereas it would be a hopeless cinematic blunder to attempt to
explain the computational premise to a general audience in
Morpheus's brief monologue to Neo. Thus, I am satisfied with a
better premise for the film, and forgive the creators for not using
it.
2012-05-21
2012-05-17
Metre matters more to me than rhyme.
Teeter tatters tore to tea, then thyme.
...wait, does that sound rhyte?
2012-05-13T07:34:55Z
Is a poem like a joke in that if the meaning must be explained,
then the value is lost?
2012-04-24 Mail
Mail is hosed. Unreliable queues, mail stores without integrity
checks or sensible indexing, broken flags, hopelessly confused MUA
designs, and more... Argh!
2012-03-15T13:25:46Z Random
`Fix random kernel memory corruption by algo_doublehash(). And by
"random" I don't mean just "arbitrary" as in using an uninitialized
pointer, but random as in corrupting the contents of memory addresses
chosen using a crypto-strength random number generator.'
2012-03-15T06:23:47Z Beware the IDEs of March
Beware the IDEs of March, my son --
The tools that bite, the builds that botch.
Beware Eclipse, Emacs, and shun
The kludgious Binfordsnotch!
...what?
2012-01-22 Laptop
I have been asked a few times what laptop I chose to buy last year.
The answer is that I didn't. The laptops I was most strongly
considering were the Thinkpad W510 and the Sony Vaio Z. But I got
fed up with shopping; instead I got a new disk for an old
MacBook1,1 that I had lying around, and fixed some bugs in NetBSD
to make it run on that.
2011-12-19 The FM Principle
It often amazes me to marvel at just how much of computing is
founded upon the FM principle.
Woodpeckers.
2011-10-29
Antediluvian barmy curmudgeons!
Desolate ergatives fuming their grudgeons!
Heckling isthmus of jocular kerosene!
Lepidopteric morose nitrotoluene!
Peripatetic Orphean qwertigial
Runcible syrup of toad urns vestigial!
[If you comment on this article, please be sure to mention bananas.]
Winkle-picked xanthines, ytterbium seder of
Zoölogues, zits, Zapatistas, &c.!
2011-08-{29,30} paredit patches
There's a pile of patches potentially for paredit 23 at
.
Try them out and let me know what they break, or, in the unlikely
event that they don't break your whole editing feel, whether you
like them, and what cases you especially like in them. That way, I
can write down the cases so they won't be broken again in the
future.
2011-08-29 Crypto library
I am thinking of writing a crypto library to complement djb's
well-known one by providing some bits he omitted. I will call it
1-[5-(1,3-benzodioxol-5-yl)-1-oxo-2,4-pentadienyl]piperidine.
2011-05-09 FOCUS THEFT SHOULD BE A FELONY
!@*#^$@!&*#^@%!*#@$^%!#&*@!%
2011-03-32 New job, project status
I have accepted a position at Microsoft in the finance department,
working on in-house tools for optimizing Microsoft's tax structure.
I am moving to Redmond; I fly out tonight. Anyone in the Seattle
area know any excellent seafood restaurants?
My new manager has asked me to port paredit to Visual Studio so
that we can use it in-house, and since I have only limited time for
personal projects, I shall be dropping support for paredit.el in
GNU Emacs (and paredit.scm in Edwin).
Today I completed and committed a long-running project to switch
MIT Scheme to use dynamic scope by default, in preparation for the
upcoming Scheme standard produced by the working groups; and I also
committed my (incomplete) patches to change Edwin's default key
bindings to match modern vi conventions.
2011-02-24 Paredit on old Emacs
Does anyone still use paredit on XEmacs, or on GNU Emacs 21?
2011-01-17 Laptop
I am looking to buy a new laptop. But I am picky. At
I wrote down a lot of
constraints. Can anyone help me to satisfy them?
2011-01-14 PGP
I'm pretty frustrated with GnuPG.
Straightforward operations (decrypt a message -- no, really, just
*decrypt* it) are not straightforward to perform with the gpg
utility. There are half a bazillion knobs controlling gpg's
behaviour so that straightforward command lines do not have
straightforward semantics. And half the operations can be done
only interactively -- by which I mean not just that it prints
prompts, but that it prints them to /dev/tty and reads input from
/dev/tty.
It's not obvious what the gpg utility is actually doing whenever I
use it, especially when it prompts me for a pass phrase to decrypt
one of my keys to make a signature without telling me what it's
signing. Just about everything keeps state in several different
files in one's GnuPG home directory; it's not clear what operations
are actually stateless.
There is no good introduction to the meaning and use of the trust
model, or any coherent or interactive picture for it. The output
of `gpg --list-keys', `gpg --list-sigs', `gpg --list-packets', and
so on, is confusing on multiple levels.
Here are three things I'd like to see in an OpenPGP implementation:
1. A collection of utilities to perform straightforward operations
straightforwardly. Examples: a utility to generate an RSA key
pair; a utility to decrypt a packet given the Elgamal private
key; a utility to make each particular type of signature; &c.
2. A packet inspector. This could be graphical (maybe Gtk+), or it
could be embedded into Emacs. It would replace `gpg
--list-packets', and allow the user to interactively explore the
packets, see all the information stored therein, decrypt
messages, verify signatures, and so on. Think Wireshark for
OpenPGP.
3. An interactive truth maintenance system. Fundamentally, a PKI
is a system for distributing and interpreting signed assertions,
such as `J. Random Luser is the owner of key 1234abcd', signed
with key 1234abcd. An OpenPGP key ring is a pile of public keys
and signed assertions. The cryptographic signatures make the
assertions as plausible as they would be if made in person, but
we need a computer to handle the crypto and draw plausible
conclusions from the premises we specify.
But we may find some of the conclusions suspect; in that case,
we want to ask why the computer drew those conclusions, and what
premises we must reject if we want to reject the conclusions.
Or we may decide that we're not happy with some of the premises,
and we'd like to see what conclusions lie on shaky grounds --
we'd like to see what conclusions we give up if we reject the
premises. These are exactly the functions of an interactive
truth maintenance system.
The GnuPG trust model is essentially a set of inference rules,
which could be used with such a truth maintenance system. But
gpg provides no tool to explore the `web of trust' that its
trust model induces; it just has an implementation of the engine
to infer conclusions given some premises.
Perhaps some day I'll actually write some of these, or find that
somebody already has.
P.S. It's hard to google anything about PGP. Google turns up
`----BEGIN PGP SIGNED MESSAGE-----' a lot more than it turns up
useful results.
P.P.S. I have glanced at netpgp, a relatively new implementation
of OpenPGP based on openpgpsdk that will probably be part of NetBSD
6.0. Its behaviour does not appear substantially different from
that of GnuPG, though, other than that netpgp is less polished so
far.
2010-11-04 Gtk+
In a Gtk+ program I use regularly, while entering text, I often
accidentally hit C-w out of habit from Unix to kill the preceding
word. Instead of killing the word, this closes the window, to my
interminable frustration. It's not as bad as C-M-DEL zapping X
when I want to kill the preceding S-expression in Emacs, but it is
frustrating enough that I sought to fix this.
It turns out that if you put `gtk-can-change-accels = 1' in your
~/.gtkrc-2.0, hover over the menu item to which the key is bound,
and hit DEL, then the key binding goes away.
Is this documented? Yes! Well, no. Well, sorta kinda, maybe?
Actually, not really. When I googled `"gtk-can-change-accels"',
the first hit I got was from gtk.org, and appears to be the only
actual documentation relevant to the subject. All the other hits
were from mailing list archives, forum posts, blug posts, and FAQs
from various applications, saying that one can use the option to do
such and such.
So what does this actual documentation say? Well, first of all,
it's API documentation, not remotely fit for user consumption. And
here's all that it has to say about the option: `Whether menu
accelerators can be changed by pressing a key over the menu item.'
Does it mention hitting DEL? No. Does it say how to quote DEL, in
order to bind it to an action? No, of course not! Does it mention
~/.gtkrc-2.0? No.
In fact...is there any documentation at all for ~/.gtkrc-2.0? No!
Not even API documentation! You just have to know about it!
Googling it just turned up more mailing list archives, forum posts,
blorg posts, and FAQs with tips about how to do such and such by
frotzing your ~/.gtkrc-2.0 file...
2010-10-09 Paredit 22
Paredit 22 is released:
Not all the patches got incorporated into it. Patches potentially
for paredit 23 are in
.
2010-09-20 Paredit 22 beta
I have merged the aforementioned patches into a new paredit 22
beta, at as
usual; lists the
changes as the output of `darcs changes' on the Darcs repository at
.
Nobody has opined about a paredit mailing list or offered to
administer one for me, so I presume there's no interest.
2010-01-01 Paredit patches, still
Not enough users have given me feedback on the patches to paredit in
. Thus, paredit 22 has
not yet been released. See the (chronologically) previous (but
textually next) entry. [Update: Since the time of writing, the
aforementioned entry has self-destructed.]
Should there be a mailing list for paredit? Contact me if you have
an opinion on this question, or would like to offer to administer
such a mailing list.
2009-03-28 More simply addressing the confusion about UNWIND-PROTECT
in Scheme
Suppose you want to release a resource after a part of a program
that uses the resource. For example, consider a connection to a
database that should be disconnected, a file descriptor that should
be closed, or a buffer for sensitive data such as a pass phrase that
should be destroyed by zeros or random data. Once the resource is
released, it can be neither reused nor readily reconstituted.
Is the resource represented in Scheme by a single object? If so,
ensure that with the object is registered a finalizer that releases
the resource, so that even in the face of an interactive ^C at the
REPL the resource will be released. All of the above examples
qualify.
(define (open-database-connection database-descriptor)
(let ((handle (allocate-database-handle)))
(register-finalizer handle %close-database-connection)
;; After this point, if we are interrupted -- say with ^C at
;; the REPL --, somewhere the handle will be recorded so that
;; its resources can be released even if the usual logic to
;; close the connection is never reached.
(%open-database-connection database-descriptor handle)
handle))
Is it /necessary/ for the resource to be released immediately after
the part of the program that uses it? If so, use REWIND-PROTECT at
. In this
case, control can never re-enter the extent that uses the resource.
Of the above examples, only a buffer for sensitive data qualifies.
(define (call-with-pass-phrase prompt receiver)
(let ((buffer (make-pass-phrase-buffer)))
(rewind-protect
(lambda ()
(prompt-for-pass-phrase prompt buffer)
(receiver buffer))
(lambda ()
(clear-pass-phrase-buffer buffer)))))
Is it /desirable/ for the resource to be released immediately after
the part of the program that uses it? If so, use UNWIND-PROTECT at
. Of the
above examples, both the database connection and file descriptor
qualify.
(define (call-with-database-connection database-descriptor
procedure)
(let ((handle (open-database-connection database-descriptor)))
(unwind-protect (lambda () (procedure handle))
(lambda ()
;; If a continuation is captured inside PROCEDURE so that
;; this cannot execute immediately when PROCEDURE returns,
;; then closing the database connection here is redundant,
;; since registered with HANDLE is a finalizer that will
;; close the connection. However, provided that the
;; finalizer have no effect if the connection is already
;; closed, this redundancy is harmless and enables
;; UNWIND-PROTECT to close the connection immediately if
;; no such continuation has been captured.
(close-database-connection handle)))))
Rationale: Scheme's CWCC (or composable continuations) enables
powerful control abstractions such as inverting control. For
example, (FOR-EACH ) applies to each
element of in sequence. With CWCC, one can reify the
iteration of the list by FOR-EACH into a first-class cursor object
that can be advanced independently. This way, for example, we can
traverse multiple lists in parallel using only the one-list FOR-EACH
procedure.
If we rather consider a procedure (FOR-EACH-QUERY-RESULT ), the situation is not much different, except that
FOR-EACH-QUERY-RESULT may entail the opening of a connection to a
database, and the connection is a resource which should be released
after the query results are finished being received. Using CWCC to
invert control of FOR-EACH-QUERY-RESULT is just as valid as using
CWCC to invert control of FOR-EACH, but if FOR-EACH-QUERY-RESULT
uses REWIND-PROTECT (a common idiom in unfortunately much Scheme
code), it inhibits control inversion. UNWIND-PROTECT allows the
control inversion.
Why involve finalization in UNWIND-PROTECT, rather than just
associating a finalizer with the object representing the connection
to the database? The difference is in the intent implied by the use
of UNWIND-PROTECT that if control cannot re-enter the protected
extent, the protector can be immediately applied when control exits
the protected extent extent even if no garbage collection is about
to run finalizers.
Note that one cannot in general reconcile
(1) needing to release the resource immediately after control exits
the extent that uses it, and
(2) enjoying the benefits of powerful control abstractions such as
inversion of control.
However, if (1) is relaxed to
(1') needing to release the resource immediately after control
/returns normally from/ the extent that uses it,
then one can reconcile (1') and (2) by writing
(CALL-WITH-VALUES (LAMBDA () )
(LAMBDA RESULTS
(APPLY VALUES RESULTS))), or simply
(BEGIN0 )
using the common name BEGIN0 for this idiom. (In Common Lisp, this
is called PROG1.)
Also note that condition systems are not involved; they are a red
herring. Any condition handler that releases a resource should do
so only when the condition implies that the resource is no longer
usable; for example, if a database connection becomes desynchronized
somehow and cannot proceed, then it is reasonable for the handler
for this condition to close the connection and destroy any of its
state. However, when handling conditions that do not imply that the
resource is no longer usable, the resource should not be released.
An enclosing handler may continue within a protected extent, or even
exit and re-enter the protected extent, for instance while iterating
over a reified database cursor procured by control inversion.
Handling any condition by releasing the resource defeats these
useful operations.
At is an
implementation of UNWIND-PROTECT for MIT Scheme. This definition is
lighter-weight than the portable one. However, it does not attempt
to run the protector immediately if it can.
Appendix A: John Cowan has asked for elaboration on the inversion of
control for iteration with FOR-EACH.
(define (list->stream list)
(iteration-procedure->stream
(lambda (receiver)
(for-each receiver list))))
(define (iteration-procedure->stream iteration-procedure)
(lazy
(call-with-current-continuation
(lambda (return-stream)
(iteration-procedure
(lambda (element)
(call-with-current-continuation
(lambda (return-to-iteration)
(return-stream
(stream-cons element
(lazy
(call-with-current-continuation
(lambda (return-cdr)
(set! return-stream return-cdr)
(return-to-iteration))))))))))
(return-stream stream-nil)))))
;;; Simpler definition with reset & shift.
(define (iteration-procedure->stream iteration-procedure)
(lazy
(reset
(begin
(iteration-procedure
(lambda (element)
(shift continue-iteration
(stream-cons element (lazy (continue-iteration))))))
stream-nil))))
Appendix B, discussion question: Should UNWIND-PROTECT evaluate
protection forms in the dynamic context of the call to
UNWIND-PROTECT? This rules out using UNWIND-PROTECT within
REWIND-PROTECT, which may be an onerous requirement.
2009-03-21 On file systems, Unix, and brain damage [updated]
Earlier this month, users of the alpha version of Ubuntu, the first
to support the ext4 file system by default, reported an immense
increase in data loss after crashes leading to corruption of their
desktop environments. After a crash, many files that were recently
written were truncated on disk to zero bytes.[1]
How did this happen?
To atomically replace a file's old data with new data, applications
write the new data to a file linked at a temporary pathname, and then
rename the link to the permanent pathname, overwriting the link to
the file with the old data. This sequence of operations, with the
atomicity guarantee of the POSIX rename() function, guarantees that
at any time, at the permanent pathname will be a link either to a
file containing the old data or to the file containing the new data.
This guarantee is made, at least, during normal system operation.
POSIX guarantees nothing when the system crashes or the power fails.
If the system is even bootable after such an event, POSIX permits it
to have replaced the contents of the file system with a collection of
cat macros. But while this behaviour is permitted, it is not useful.
System crashes and power failures are facts of life, and a well-
engineered file system should limit, not exacerbate, their damage to
user data.
In the name of performance, the ext4 file system as it was bundled
with the alpha version of Ubuntu would delay the allocation of blocks
on the disk for a file so that they can be allocated contiguously
together when needed. Periodically the system forces allocation of
blocks and commits them to the disk; by default, this happens on
sixty-second intervals. But the ext4 file system would sometimes
commit renames to the journal before allocating blocks for the new
data, during a sixty-second window. If the system crashes, and the
journal is replayed, then the rename may be applied -- before the new
data are written to the disk. This effectively reorders the sequence
that many applications rely on to atomically replace files,
destroying its atomicity.
The developers of the ext4 file system answered with several
`solutions': [2], [3]
(1) make your systems more reliable with UPSs and by avoiding buggy
proprietary drivers that cause such frequent crashes, which the
developers of the ext4 file system don't encounter;
(2) use SQLite and other databases instead of writing files to disk;
(3) change all the applications that do write files to disk so that
they fsync before closing and renaming; and if that becomes a
performance problem then
(4) run fsync periodically in a separate thread.
Solution (1) is no excuse for a reliable file system, and is a
dangerous sentiment to be held by the developer of a file system
likely to find widespread use in personal computers running Ubuntu,
computers which may not have the resources available in data centres
for reliability. This suggests that the developers of the ext4 file
system have never seriously stress-tested it with forced power cycles
during heavy I/O loads!
About solution (2), and the general feeling of `we like databases!',
I can only refer to jwz's rant about mbox summary files. Technical
merit aside, solutions (2) and (3) are impractical because there are
simply too many programs to change.
Solutions (3) and (4) are disturbingly misguided. Even if it were
practical to change every application's write/rename sequence into a
write/fsync/rename sequence, to do so would be semantically wrong.
Only programs that require a file's data to be written to permanent
storage before proceeding should use fsync; this is what fsync does.
For example, editors such as Emacs fsync files before informing users
that the files are written to disk; users are unhappy if the files
that their editors told them were written to disk were not. Mail
receiving agents fsync files where they store mail before telling the
sending agents that the mail has been accepted; users are unhappy if
they are told that mail was delivered when it was not.
Programs that require only that the replacement operation be atomic,
and that it not leave a file in a corrupted intermediate state,
should not use fsync, because the purpose of fsync is to provide a
guarantee which these programs do not need. For example, the KDE
window configuration is not critical user data, but may be stored on
the file system. It is OK for the machine to forget that a user has
dragged a window a few pixels, but it is not OK for KDE to read a
corrupted intermediate state of the file between the old and new
versions -- hence atomic write/rename replacement without fsync.
Extraneous fsyncs, in fact, degrade performance, especially for small
files that are frequently overwritten -- it gives the file system no
opportunity to keep the intermediate versions solely in memory.
Calling fsync periodically in a separate thread is nonsense -- if a
program cannot proceed until a file's data are on permanent storage,
it should call fsync and wait; if a program can proceed, then it
shouldn't call fsync. The operating system already periodically
synchronizes the file system with permanent storage -- at least, one
would hope so; if it doesn't do this, then it has a bug.
Actually, fsync does not guarantee that a file's data are written to
permanent storage before proceeding. POSIX (or at least the SUSv3,
)
claims that it shall:
The fsync() function shall request that all data for the open file
descriptor named by fildes is to be transferred to the storage
device associated with the file described by fildes. The nature of
the transfer is implementation-defined. The fsync() function shall
not return until the system has completed that action or until an
error is detected.
Many man pages echo this sentiment. For example, NetBSD's fsync(2)
man page begins with:
fsync() causes all modified data and attributes of fd to be moved
to a permanent storage device.
But on AIX, Darwin, Linux, NetBSD, and probably other Unices as well,
fsync does not actually guarantee that the file's data are written to
permanent storage. Instead it only nudges the file system to force
some blocks associated with the file slightly sooner. If the file
system is backed by a hard disk device that has a write cache, the
blocks might just end up in the write cache -- which is volatile, and
which may evaporate in the case of a power failure.
Common protocols by which operating systems communicates with storage
devices have commands to request that the storage devices force their
write caches to permanent physical media. Some devices ignore these
commands, in which case one cannot win, but the commands can be sent
nevertheless, and this is the best effort that the operating system
can make. And AIX, Darwin, Linux, and NetBSD all provide ways for
programs to request this -- different ways, that is.
AIX and NetBSD provide a system call fsync_range, which has a flag
FDISKSYNC (NetBSD) or O_DSYNC (AIX) that requests any blocks
associated with the file to be forced to permanent physical media.
Linux provides a fancier but similar system call sync_file_range.
Darwin provides an fcntl command F_FULLFSYNC. These approaches are
all slightly different, and for file descriptors not backed by
permanent storage devices they all report different errors. So just
to force a file's data to disk reasonably reliably on at least four
different Unices and to ignore the cases of file descriptors not
backed by permanent storage, it takes a good seventy lines of code
littered with cpp conditionals, together with autoconf or
autoconf-like detection of the relevant system calls.
Ugh!
Fortunately for the hapless users of Ubuntu, the developers of the
ext4 file system have patched it so that renaming a link to a file
that overwrites a link to another file forces any pending data for
the first file to be sent to the disk before the directory structure
is updated to reflect the rename. This may not guarantee that the
data are actually, really, truly on permanent physical media, but
it's better than waiting sixty seconds before they are even delivered
to the device that will eventually write them to permanent physical
media.
But this kerfuffle has not inspired confidence in me about the ext4
file system. In [3], the developers of the ext4 file system make
such absurd suggestions as making a laptop mode in which the kernel
ignores fsync, so that even those applications that use fsync
semantically correctly will not have the very guarantee that is its
sole reason for existence! When did performance become a concern
overriding robustness in file systems?
[1] http://bugs.launchpad.net/ubuntu/+source/linux/+bug/317781
[2] http://bugs.launchpad.net/ubuntu/+source/linux/+bug/317781/comments/45
[3] http://thunk.org/tytso/blog/2009/03/15/dont-fear-the-fsync/
[Update, 2009-03-26: I have been pointed to Linus Torvalds' ideas
on the subject, archived at . I
hope that the ext4 developers take heed of his sentiment.]
[Update, 2012-01-22: All file systems are hosed. Some are just more
obviously hosed than others. Argh.]
2009-03-21 Correction to the optimization of resource release in
control brackets
Due to an unstoppable deluge of the concern of readers after the
2009-02-04 correction, I am obliged to correct the optimization in
the original 2006-05-01 entry.
The correct optimization is to associate a flag with each wind point
which is by default clear and which will be set by CWCC or by any
procedures that reify composable continuations. (This is not an
onerous requirement: invocation of the reified continuations must run
in time linearly proportional to the depth of the wind point tree
anyway.) If the flag is clear when the after thunk is called, then
the finalizer can be run immediately.
2009-02-04 Correction to control brackets and resource release
It has come to my attention that I identified the wrong criterion
for when protector thunks established by UNWIND-PROTECT may be
called. The criterion I stated is that control may never again
return from the protected thunk. The correct criterion is that
control may never again re-enter the protected thunk. (Consider,
for example, a coroutine that does not terminate and return.)
The correct definition for *UNWIND-PROTECT in the appendix is:
(define (*unwind-protect thunk protector)
(let ((protector-cell (make-protector-cell protector)))
(dynamic-wind
(lambda ()
(identity-procedure protector-cell))
thunk
(lambda ()
unspecific))))
In Scheme systems that provide an REGISTER-FINALIZER procedure,
which requests the garbage collector to apply a procedure when a
given object's storage is reclaimed, one might write:
(define (*unwind-protect thunk protector)
(let ((object (cons 0 0)))
(register-finalizer object protector)
(dynamic-wind
(lambda ()
...refer to OBJECT to prevent reclamation of its storage...)
thunk
values)))
2009-01-04
I forgot about the 2007-mm-dd entry. I'm not sorry.
2007-mm-dd On macros and expression in Scheme
[Placeholder for a long-promised entry which I haven't yet forgotten
about.]
2007-08-23 On the Theatrical Security Administration's cleverness
During my flight to Seattle one month ago, my case containing a
guitar was `randomly' selected to be searched, and the TSA left a
gift for me inside: an Orwellian notice about protecting me and my
fellow passengers by randomly violating our constitutional rights
(which might, they add, involve accidentally damaging the bags, with
no liability on the TSA). Flying back to Boston, I left a prominent
copy of the Fourth Amendment to the US Constitution in the same case
-- specifically, in the body of the case. When I opened up the case
this morning, I found no notice from the TSA, but the slip of paper
on which was written the Fourth Amendment was tucked into the very
top of the case, where the headstock of the guitar goes. The paper
does not fit through the neck of the case; it could not have slid
there by its own volition, even when tossed by the roughest of
baggage handlers.
I'll shortly discover whether the TSA's X-ray machine affects ASA
400 film, too. The TSA goon who demanded to put my camera and its
film through that machine did let pass an unintentionally helpful
hint, though: to avoid damaging any film, just tell the TSA goons
that its speed is ASA 1000!
2007-07-16 On debugging in Scheme [updated 2046-03-32]
The two Scheme systems that I use primarily these days are MIT
Scheme and Scheme48
. Both have decent debuggers that work for most
cases. Both have facilities for limiting output to avoid excessive
output for deeply nested data structures, or infinite recursion in
the printer for circular data structures. Both have rudimentary but
working mechanisms for examining pretty much all data in the system.
(I can make objects that Scheme48 and MIT Scheme fail to inspect,
but most people can't, and this is an uncommon operation.)
Let me make this clear right now: these are rudimentary tools. They
have some minor niceties (for instance, the usual mode of operation
in MIT Scheme's debugger is to have a REPL within every stack
frame), but these minor niceties are not especially profound.
Rudimentary as these tools are, though, they *work*, and they
suffice to examine and debug any reasonable program.
And MIT Scheme and Scheme48 are nearly unique in the respect of
having general, if rudimentary, debugging tools. This baffles me.
This flummoxes me. This flabbergasts me. (No, this is not
something new; this is something that I have been aware of for a
long time, and privately fumed about.)
Today I wanted to debug a program in Gambit. Gambit, too, has a
rudimentary debugger. Yet it completely fails in two prominent
ways:
(1) There is no limitation to the output at the REPL. (The values
of local variables on stack frames are printed truncated, but that
doesn't help if I want to inspect them more thoroughly.) Although
it is possible, with a bizarre incantation about *read* tables
(boggle) to protect the *writer* against circular output, there is
no apparent provision for limiting the depth of the output, and the
default is for the writer and pretty-printer to lose horribly with
infinite recursion until you hit C-c. And the default format for
printing records involves recursively printing all of the records'
components. Graphs? You lose.
(2) There is no object inspector by which to interactively inspect
the components of objects without having to grovel through megabytes
of output from an infinitely recurring writer. No, there is not
even a way to inspect the local variables of a closure found on the
stack. In fact, I couldn't even find a way to look at the condition
that was signalled.
And Gambit is better than most other Schemes.
Chicken will print a spare trace of call frames that were on the
stack. That is all. It does not even show the nesting of the
frames; garbage frames that have already returned, but are retained
on the stack because of Chicken's architecture, are shown along with
everything else. And the stack is traced *after* the error message
is printed, so if the error message printed a complex object, what
the poor, deprived programmer will see as a trace of the call frames
on the stack is procedures used to print the error message --
information that is completely irrelevant to the actual problem.
I wrote an object inspector for Chicken (the `dissector'), but it's
useless because (1) records no longer contain any introspective
information, and (2) it is impossible to get at the objects in call
frames on the stack.
DrScheme will give a helpful display of a flurry of arrows covering
the whole source file in question, so that programmers can see by
what flow of control they arrived at the error. Flow of data?
What's that?
If you plan to create a new implementation of Scheme which you mean
to be useful to other humans, *please* implement a general debugger
and object inspector. The state of debugging tools in Scheme is
absolutely appalling and utterly inexcusible.
(I'm told that Chez has a general, if rudimentary, debugger as well;
because Chez is commercial and proprietary, however, and the free
software Schemes out there meet my needs far better than Chez does
anyway, I don't have a licence for Chez with which to experiment
with its debugger.)
2007-07-15 On the public domain, licence terms for software, and
practical legal choices
I am not a lawyer. This is not legal advice. I have not pursued
legal studies, and have no background in civil litigation, copyright
law, or contract law. (The terminology that I employ in this
article will without a doubt induce any lawyer to cringe.) For that
matter, I'd rather devote my time to hacking than to burying my head
in legal studies for decades: Scheme is a much nicer language for
expressing ideas concisely and clearly than English, but
unfortunately our laws are expressed in English.
It is precisely because I am more interested in hacking than legal
analysis, in fact, and because I should prefer never to worry about
embroilment in courts of law as a consequence of careless decisions,
that I write now.
For the past couple of years, I have been releasing my publicly
available software into the public domain. I ought to be precise,
however: I have been releasing said software with a message claiming
that it is in the public domain. I ignored remarks equivocating the
notion that any works can be released into the public domain, my
reason being that it was simply absurd for an author to be unable to
forswear any and all claims of copyright: surely, if an author has
no intention of claiming copyright, she ought to be able to
relinquish the right of doing so. And the simplest status for works
lacking copyright claims is release into the public domain.
Not only did I place message stating that the code was in the public
domain, but I also added a disclaimer of all warranties -- a rather
inept and negligent disclaimer, which read `All warranties are
disclaimed.'. My reason for this was that I preferred to avoid
cluttering all of my files with long, uppercase, distracting
disclaimers of all warranties that some lawyer at Berkeley could
think of and enumerate; and that I preferred also to avoid requiring
the distribution of a separate LICENCE file with every other file I
released, purely for the sake of laziness.
What I failed to consider, though, was the actual implications of
these messages. Consequently, on realizing this, I briefly
considered the following line of reasoning, after encountering
another suggestion that works may not clearly be released into the
public domain:
1. What matters in the preamble to the software is what terms a
court of law will accept and base decisions upon.
2. If a court of law will not accept a statement that the software
is released into the public domain, there is no reason for it
to prefer the acceptance of a statement of licence terms: one
could imagine two copies of a file, one with the terms and one
without, and of neither one could anyone claim authority.
My second point was faulty, though, because it questioned only
witness credibility, which is an unrelated issue; and I omitted a
third point, on contract law, warranty, and gifts. I honestly don't
know whether a court of law will accept the preamble to a source
file, but my guess is that it probably would. The more important
question is whether a user (perhaps some corporate entity -- whether
good or evil, according to your favourite definition of that
spectrum --, or some governmental entity, or anyone) can have
confidence in the terms set by the author.
There are two aspects of the software in which users may wish to
have confidence: (1) the users want to be confident that the
copyright and licence terms are fixed, and that the author will
never pursue or even threaten litigation for violations of licence;
and (2) the users may also want the power to pursue the author for
damages if the author's software fails at the cost of the users.
Contrariwise, the authors of free software (except perhaps djb)
prefer to disclaim themselves of any such warranties.
[This entry is not yet finished. Sorry.]
Summary: The public domain is murky, even if it may seem to make
common sense. I prefer the modified (3-clause) BSD licence for
similar terms and greater protection, or at least a clearer idea of
the protection for all parties involved. It is also fairly concise.
Furthermore, it has been examined by lawyers. [I don't know whether
it has been tested in court, and don't have the time or inclination
to research this in greater detail. Any references to cases
involving the 3-clause BSD licence would be welcome, though.]
I repeat, though, that I am not a lawyer, and that this is not legal
advice.
If the terms of the modified BSD licence are not to your taste, John
Cowan has a utility for choosing another one by some simple
criteria: .
2007-07-14 Summarizing several issues with compilation and
interpretation
I ran out of steam writing last month about compilation and
interpretation, so here I'll try to summarize as succinctly as I can
a few points about which a disturbing number of people are confused.
Look to the previous entry for definitions of `interpreter' and
`compiler'.
1. Most implementations of Lisps are not pure interpreters; most are
based on compilers in some way or another. Call them
implementations, not interpreters or compilers: they may *have*
interpreters or compilers, but they are not themselves
interpreters or compilers.
2. A `compiler' is not a tool chain that turns source code into
stand-alone executables. A compiler is a tool that analyzes the
meaning of programs and generates programs in another language
with equivalent meaning.
3. Your physical machine is an interpreter. The universe is a kind
of interpreter as well.
4. No compiled program is `stand-alone'. Not only do compiled
programs still require interpreters to execute them, but they
also must be structured in the appropriate format for the
intended operating system under which they will run, and they
generally require support from some run-time libraries. It just
so happens that most computers that most people are interested in
developing programs for have libc built-in. It also happens that
most computers don't already have the run-time libraries needed
by Lisp programs.
2007-06-19 Concerning compilation and interpretation, and an
interpretation of the compilation of confusion surrounding their
meaning and execution
I often find myself in an unpleasant proximity to a bickering
arglebargle about whether compiled languages or interpreted
languages are better; whether Java is compiled or interpreted;
whether Sun's Java implementation (or others) is a compiler or an
interpreter; whether Lisp can conceivably be compiled, if it has
EVAL built-in; how Lisp systems like SBCL can be compilers if they
can't spit out files that someone's favourite operating system can
execute (which is, in fact, not really true); and other matters of
confused frivolity concerning compilation and interpretation.
Sometimes I even find a participant or two in the discussion who
tries to allay the heat by suggesting that the distinction between
compilation and interpretation is murky at best, citing examples
like Java implementations that involve both Java to byte-code
compilers and byte-code interpreters, or modern x86 implementations,
which translate the x86 instruction set every time the i-cache is
refilled into an internal more RISC-like format to execute.
Sometimes I even do this myself, if I am in a hurry and lack the
time to explain the subject more precisely.
Let us commence with definitions:
/Compilation/ is the analysis of a program's meaning, rendering
the same semantics into another language. A /compiler/ is a
total function from programs to meanings.
/Interpretation/ is the execution of a program given the state of
the machine, either diverging or yielding a final answer. An
/interpreter/ is a partial function from programs and machine
states (or simply machine states, if the program is a part of
the machine state) to answers.
(Programming language theoreticians in the audience will recognize
that compilation corresponds with denotational semantics and that
interpretation corresponds with operational semantics. I hope,
however, that such members of the audience are already adequately
familiar with the concepts that they need read no further. I hope.)
A usable language implementation that is based on a compiler also
requires an interpreter. When most people hear about `compilers',
they are hearing about compilers that map programs in languages like
C to machine programs, i.e. programs in languages for the
instruction sets implemented by physical processors such as the x86,
the PowerPC, the SPARC, the PDP-10, and so on.
[This entry is not yet finished.]
2007-05-13 SLIME48
(Apologies: the post prognosticated by the previous preparation has
been permanently postponed.)
After several long months of version incompatibility, there is a new
version of SLIME48 available that will work with CVS SLIME as of
2007-05-13. A tarball is available here:
.
Bug reports are welcome. Further information is at
.
2007-05-09 In preparation for a clarification of macros in Scheme,
and their implications on style
This blag has remained dormant for a very long time. I must deeply
apologize to my avid readers who no doubt were left in a state of
prolonged mourning for the stagnancy of their only worthwhile source
of meaning in their lives. In a short while, however, no longer
will you poor souls remain cold, alone, and bereft of your font of
Schemerical wisdom, for I shall soon bring flames upon the now
cowering subject of macros in Scheme -- from DEFMACRO to
SYNTAX-RULES, and all in between and beyond --, and their
philosophical implications on style and expressiveness of programs!
For you who cannot stand to wait until the weekend (when I hope to
find time for this inflammatory ecriturial extravaganza), I give you
a brief taste, the draft of a Lisp style guide whose composition I
commenced this past weekend:
2006-09-16 On networking interfaces in Scheme
Most Scheme systems have very limited networking interfaces, usually
with the notion that BSD sockets are not nice, and that `high-level'
verbs like `open a TCP connection to this IP address, on this port'
are nice. This is a bad idea, and Scheme systems ought to support
wrappers around the whole BSD sockets API.
The BSD sockets API is really a fairly extensive generalization of
many different flavours of sockets, classifying them according to
types with common operations (stream sockets, datagram sockets,
&c.), across many different networking domains (IPv4 and IPv6 being
the most prevalent today, but of which there are many others, some
of which remain useful, like local Unix sockets). But instead of
taking advantage of this unification set down by BSD sockets, most
Scheme systems *specialize* the interface to provide wrappers for
TCP (i.e. internet stream sockets), and sometimes UDP (i.e internet
datagram sockets), and sometimes local Unix sockets (with one of two
types).
In the first place, this is poor practice simply because of the
amount of work necessary for the same returns. In essence, the
specialized wrappers tear down a layer of abstraction merely by
pretending that it isn't there -- while actually building on top of
it. The result is that different specialized wrappers must be made
for everything intended to be wrapped, and a great deal of code must
be duplicated. There is a great deal of C code out there, used as
the basis of Scheme networking libraries, that duplicates the same
system calls just with different constant arguments (SOCK_STREAM vs
SOCK_DATAGRAM, AF_INET vs AF_INET6 vs AF_UNIX, &c.), or different
types of socket addresses (sockaddr_in, sockaddr_in6, sockaddr_un,
&c.). Few Scheme systems expose these constants and socket address
structures to Scheme; most just hide the `messy details' in C.
Actually, `the same returns' is not quite true anyway. The wrappers
are generally very fixed in form, and inextensible. Most don't even
support any socket options, or have specific procedures to deal with
specific socket options. Few have any support for multicast IP. I
have never seen support for out-of-band data transmission in any
specialized socket wrapper for Scheme. The distinction between
shutdown and close is usually either absent or swept under a rug.
A general BSD sockets wrapper gets all of these for free, on *any*
socket type or domain (except for multicast IP, of course), and can
be made to be easily amenable to extension for anything not already
wrapped, as long as sockets' descriptors can be exposed
straightforwardly in an FFI. Specialized wrappers have to duplicate
all this functionality for every type and domain of socket they
distinguish between, and they rarely expose any internal interface
for FFIs -- that, or they simply don't support any of it.
Protocol independence in programs is impossible with specialized
OPEN-TCP-CONNECTION procedures: IPv6 has been around for over ten
years, and many standard libraries support it trivially through
abstractions like BSD sockets in conjunction with getaddrinfo(3),
yet programs that support IPv6 well are few and far between because
programmers specialize the sockets with the AF_INET family -- and
Scheme systems that specialize the BSD sockets to this family
specifically *discourage* protocol independence.
Unfortunately, programmer laziness in conjunction with a misguided
desire for `nicer' interfaces tends toward not supporting any more
esoteric capabilities of BSD sockets or flexibility for later
extension or protocol independence.
Of the Schemes I can think of off the top of my head, only scsh and
Gauche have BSD sockets interfaces. These all have specialized
wrappers, usually just for TCP, and some UDP, though I think none
support any local Unix sockets: Scheme48, Chicken, PLT, Gambit,
RScheme, MIT Scheme, SISC...
2006-09-14 R6RS
The first public draft of the Revised^6 Report on the Algorithmic
Language Scheme has been released, at .
The module system, the string API, and the arithmetic specification
are all still quite broken. A number of other smaller problems
pervade the report. They claim that an operational semantics has
been written, but it's not included, and the denotational semantics
has been taken out. Why, I do not know.
I'm not sure whether I have enough interest left in it to rant much
more about it.
2006-09-06 Zob veeblefitzer
Foo bar baz quux zot.
Frob grovel full lexical...
Mumble; chumble spuzz.
2006-07-30 Update: Guillaume Germain found in quantum superposition
between Washington and Minnesota
Title says it all, folks.
2006-05-29 Missing: Guillaume Germain, author of Termite
Monsieur Germain was last seen a little over a week ago in the
#scheme IRC channel on Freenode, declaring that he was preparing to
undertake the dark journey into the Lesser Evil Empire of Amazon.
We can be thankful that he turned down Temptation when she came to
his door with an interview at the Greater Evil Empire, but now, even
at the Lesser one, he has disappeared and not been heard from in
over a week. The Scheme Underground has not published a press
release stating that he has possibly been detected in a dark, dank,
and dingy dungeon in a basement somewhere in Washington, but the
details are sketchy. Preparations are not being made for a raid in
the area.
[Update: M. Germain appeared spontaneously to inform me that his
surname is spelt without a trailing `e', and subsequently vanished
just as spontaneously. Whereabouts are still unknown.]
2006-05-29 Pink chickens
Chickens, under their feathers, are pink. I don't know whether any
of you have ever plucked and feathered a chicken, but it turns out
pink underneath. At least, this is what I presume, because I have
not myself ever feathered a chicken. I have seen only featherful
chickens in farms and chickens in supermarkets which other persons
have already had the kindness to feather for me. Still, I can
assure you that chickens are pink under the feathers.
Now, what are really peculiar are chickens whose *feathers* are
pink, and I mean not just slightly pinkish but really bright, hot
pink. These things are really bizarre, and I have never seen one in
my life; nor have I ever heard of them. But can you imagine seeing
a chicken there, yet with blazing pink photons streaming at your
retinas, instead of their ordinary brownish reddish hue?
[Update: I am informed, among the voluminous profusion of
commentary from readers, that an ideal such chicken would be five
feet in height and capable of wearing a saddle.]
2006-05-01 On control brackets and resource release
[Note, 2010-01-01: There are some errors in this entry. See the
entries
2009-02-04 Correction to control brackets and resource release
2009-03-21 Correction to the optimization of resource release in
control brackets.
For a conciser summary, see the entry
2009-03-28 More simply addressing the confusion about UNWIND-PROTECT
in Scheme.]
DYNAMIC-WIND has long been a source of contention in the Lisp world.
It serves a very particular purpose, but this purpose is not very
well understood, and many think that this purpose is the same as the
purpose of other Lisps' UNWIND-PROTECT. While the general idea of
UNWIND-PROTECT is useful, it is not directly applicable in Scheme,
because Scheme has more powerful control abstractions than other
Lisps', and it therefore requires more powerful means of controlling
them.
(UNWIND-PROTECT form protection-form ...), a special form, evaluates
the first form and returns its value, but before returning the value
also executes the protection forms. Furthermore, if a throw in the
primary form transfers control to outside the whole UNWIND-PROTECT
form, this, too, will execute the protection forms. This allows
programmers to *reliably* clean things up at certain points in the
program -- for instance, to close open files, to shut down sockets,
to release database handles, &c.; here is hypothetical example of
its use:
(define (call-with-input-file pathname procedure)
(let ((input-port (open-input-file pathname)))
(unwind-protect (procedure input-port)
(close-input-port input-port))))
(DYNAMIC-WIND before during after), a procedure of three procedural
parameters, calls the during thunk, ensuring that any transfer of
control into it calls the before thunk, and that any transfer of
control out of it calls the after thunk. Normal control transfers,
i.e. the initial call to the during thunk and its final return,
qualify as control transfers too. This means that, any time control
is inside the dynamic extent of the during thunk, any state
established by the before thunk will be in effect, but any time
control is outside its dynamic extent, that state will be torn down
by the after thunk. Here is an actual example of DYNAMIC-WIND's
use, from the Edwin text editor:
(define (with-current-local-bindings! thunk)
(dynamic-wind
(lambda ()
(install-buffer-local-bindings! (current-buffer)))
thunk
(lambda ()
(uninstall-buffer-local-bindings! (current-buffer)))))
This ensures that, within THUNK, all Edwin variables have any values
specified by the buffer's local bindings. This is the kind of use
that DYNAMIC-WIND is appropriate for: a sort of control bracket that
ensures the presence of certain dynamic state or context within a
certain dynamic extent.
Now, unfortunately, DYNAMIC-WIND can be seen as somewhat similar to
UNWIND-PROTECT, because for basic cases (involving no sophisticated
exploitation of Scheme's continuation reification) DYNAMIC-WIND with
a null entrance thunk seems to be equivalent to UNWIND-PROTECT.
This is misleading, however. Either control may return to the
dynamic extent of the during thunk in the DYNAMIC-WIND, in which
case the after thunk will be called twice -- which is unacceptably
wrong --, or the before thunk really signals an error if it is
called twice. This, though, violates the philosophy described in
the first sentence of the R5RS:
Programming languages should be designed not by piling feature
on top of feature, but by removing the weaknesses and
restrictions that make additional features appear necessary.
Signalling an error in a DYNAMIC-WIND before thunk is usually a
limitation and a weakness in a program. We have an *extremely*
powerful means of abstraction of control -- CWCC[1] -- and we ought
not to prevent it from working merely because we want to prevent a
certain fragment of code from running more than once. This should,
however, spur a deeper look at why the code is being run more than
once in the first place.
DYNAMIC-WIND is meant to establish local dynamic state for specific
dynamic extents within programs, by setting it up in before thunks
and tearing it down in after thunks. These operations, to set up
and to tear down dynamic state, are meant to be reversible, local
changes.[2] That is, permanent actions, such as closing a port,
that one would put in a protection form of an UNWIND-PROTECT, are
entirely inappropriate in a DYNAMIC-WIND after thunk.
Yet the basic idea of UNWIND-PROTECT is still useful. What,
exactly, though, is useful from UNWIND-PROTECT, and what is
incompatible with Scheme? The concept of permanent finalization is
useful, and the identification of a good point to perform it is
useful. *Requiring* that it be performed at this point, though, is
incompatible with Scheme, because this point may be passed several
times, yet we must perform the finalization once and only once.
Many languages, and many Scheme systems, provide ways to ensure that
certain code be run when an object is no longer reachable. We can
apply this same idea, in fact, to continuations! The protection
thunk in an UNWIND-PROTECT of a lesser Lisp is run only once the
continuation is about to become unreachable. The only difference
between Scheme and CWCCless Lisps is that the point at which
continuations become unreachable is unfixed in Scheme.
So we must use a general object finalization mechanism -- for the
kind of object known as a continuation. This is easy to concoct,
though: we need only to make an object, to associate with that
object the protection thunk so that it will be executed when the
object become unreachable, and to make a link from the continuation
to that object. There is an implementation of this in MIT Scheme at
the end of this entry.
This, though, is non-optimal. The protection thunk may run at any
time in the system. We should *like* it to be run as soon as
control returns through the continuation in question, but this is
valid only if that continuation has not been reified and will not
ever be reused. We can say that a continuation is /dirty/ if it has
been reified and may be instantiated again, and that a continuation
is /clean/ if and only if there will be exactly one instance of it,
which will be destroyed as soon as control returns through it.
If we store a bit in any continuation involved with UNWIND-PROTECT,
i.e. any continuation with an associated protection thunk, then we
can initially clear the bit, to indicate that it is clean, and
instrument CWCC to mark all such continuations as dirty by setting
their bits. When control returns through a protected continuation,
it checks the bit. If the bit is clear, then the continuation is
clean, and it runs the protection thunk, because control will never
again return to this point; it also deregisters any finalization
procedure. If the bit is set, then the continuation is dirty, and
general finalization system for objects of unlimited extent will
handle the protection thunk.
(We need to be careful in the case of a reified continuation
instantiated on the stack whose heap storage has been reclaimed.
The continuation will be marked dirty, but before control returns
through it for the last time its protection thunk might be run,
because the GC could finalize its heap storage.)
Both DYNAMIC-WIND and UNWIND-PROTECT are useful for different
applications in a language with powerful control abstractions such
as Scheme. DYNAMIC-WIND, though, does not subsume the operation of
UNWIND-PROTECT, and UNWIND-PROTECT cannot be taken literally from a
language with weaker control abstractions. By examining the core of
the intent of UNWIND-PROTECT, though, we can separate its actual
purpose from an optimization. We can implement the actual purpose,
and then analyze cases where the optimization can still be useful,
in order to provide UNWIND-PROTECT efficiently in Scheme. The
result is that we can retain the powerful control abstraction of
CWCC and in the same language integrate the same idioms of resource
release as other languages provide.
Of course, there are still some limited cases where restriction is
really necessary. There are a number of procedures in MIT Scheme
that work with sensitive strings, and into which reentry is not an
option, in order to enforce the proper destruction of sensitive
information. In this case, DYNAMIC-WIND is appropriate to reject
reentry and to nullify sensitive strings on exit.
[1] Some would argue that composable continuations ought to be
enough for everyone, but the issues here are exactly the same
whether we use CWCC or composable continuations.
[2] This is not *exactly* true, but symmetry is a very desirable
property of DYNAMIC-WIND before & after thunks, and reversible
set-up & tear-down operations are usual.
Appendix: Variation on a Theme of UNWIND-PROTECT
(define continuation-protectors
(make-gc-finalizer (lambda (protector) (protector))
cell?
cell-contents
set-cell-contents!))
(define (make-protector-cell protector)
(let ((protector-cell
(make-cell (let ((protector-state (get-dynamic-state)))
(lambda ()
(let ((state (get-dynamic-state)))
(set-dynamic-state! protector-state #f)
(protector)
(set-dynamic-state! state #f)))))))
(add-to-gc-finalizer! continuation-protectors protector-cell)
protector-cell))
(define get-dynamic-state
(access GET-DYNAMIC-STATE
(->environment '(RUNTIME STATE-SPACE))))
(define set-dynamic-state!
(access SET-DYNAMIC-STATE!
(->environment '(RUNTIME STATE-SPACE))))
(define-syntax unwind-protect
(syntax-rules ()
((UNWIND-PROTECT form protection0 protection1 ...)
(*UNWIND-PROTECT (LAMBDA () form)
(LAMBDA () protection0 protection1 ...)))))
(define (*unwind-protect thunk protector)
(let* ((protector-cell (make-protector-cell protector))
(result (thunk)))
;; This ensures that the continuation will hold a reference to
;; the cell, so that it will not be garbage-collected until
;; after either control returns to this continuation or a
;; throw causes this continuation to be discarded and made
;; unreachable.
(identity-procedure protector-cell)
result))
2006-04-17 Microsoft vermin
I am shocked -- *shocked!* -- to hear that the developer of Termite
had an interview for a potential job at Microsoft. (Termite, for
you ignorant wretches, is Erlang-in-Scheme for Gambit. Find it on
your own.)
2006-04-07 XEmacs S-expression motion
Turns out paredit.el 20 beta doesn't support XEmacs very well after
all. The problem is that XEmacs's S-expression motion commands
don't signal SCAN-ERROR conditions like GNU Emacs does: they just
signal general ERROR conditions. Programs *cannot* usefully
discriminate S-expression motion scan errors from other kinds of
errors, and, on software engineering principles, I refuse to modify
the code to catch and ignore all errors. Grrrmph.
(If any XEmacs developers read this (as if): Could you please
either explain why the S-expression motion code signals ERROR
conditions rather than SCAN-ERROR conditions in specific, or, if
there is no good reason, fix XEmacs?)
2006-03-30 Immobilizing blobs
There is a furry, orange blob on my lap preventing me from moving.
I believe the scientific classification for this kind of blob is
`cat'. Luckily, the state of immobility is only temporary,
apparently regulated by some sort of nutritive urges in the blob,
and also occasional boredom and needs for entertainment.
2006-03-23 I got better...
For those of you who actually *read* this thing and were a bit
worried after my 2006-03-07 post, I've recovered mostly, although
I'm maintaining a good cough. But apparently you remain ill, since
you haven't overcome the affliction of blag addiction, as you're
still reading this now.
2006-03-10 paredit.el for XEmacs; deletion keys
Recently I have been fiddling with making paredit.el work for
XEmacs. Not many changes seem to be necessary -- although I haven't
yet tested it exhaustively under XEmacs, as I'm basing my changes on
Martin Gasbichler's port to XEmacs --, since paredit.el consists
mainly of pretty basic elisp, but one of the most annoying problems
to deal with was getting the deletion keys (forward-delete and
backward-delete, or Delete & Backspace) right.
[I forgot to finish this post, so it was sitting in an autosave file
for two weeks, and, though I had a big rant all planned out two
weeks ago about getting deletion keys right, I've forgotten what I
had to say. Sorry. Also, paredit.el for XEmacs isn't finished yet
because I keep forgetting to get around to it.]
[Update, 2009-12-31: Thanks to gjs, I now have an arbitrarily large
collection of round tuits. Fortunately, I think I made paredit run
on XEmacs a couple of years ago.]
2006-03-07 More flu!
Not only was the flu great fun, but, after only a few days of having
seemed to be mostly rid of it, it has decided to come back to bring
*more* joy to my life! Mmmmm...
2006-02-24 Flu!
The flu is great fun. I recommend that any of you who have not had
the pleasure of experiencing it, find a contagious surface
*right*now* and infect yourselves with the glee of knowing what fun
and games await you!
2006-02-23 Paredit.el versions 18 & 19
This happened a while ago, but paredit.el 18 is now the released
version, and paredit.el 19 is the new beta. See
;
find the relevant files yourself. Version 19 supports insertion of
balanced delimiters other than parentheses, and a fancy S-expression
transportation device suggested by Guillaume Germain, but I haven't
yet finished the latter and put it up on the web yet.
2006-02-06 On the dangers of blaggery
Until several acquaintences coerced me into recording a `blog' (a
term which I abhor almost as much as the concept it denotes), I
tended to avoid the the things in general, excepting semi-frequent
browsing of Planet Lisp . There are a
number of reasons for my eschewal of the reading or writing of
blags.
The first reason is the content, or absence thereof. There is a
tendency among blag writers to write when it amuses them to write,
not when they have anything to say. I need not elaborate further on
this point; I am sure that the reader is quite familiar with the
saturating profusion of content-free blags throughout the web.
The second reason is that blags are astonishingly efficient sinks of
time. Right now I ought to be doing something else. But I am not
doing something else. This is because there is some aspect of the
nature of blags that compels their authors to write entries, and,
even in the rare case that the author has anything worthwhile to
say, the time composing the entry would be better wasted on a
different amusement device, such as a cat. (More accurately put,
the author ought to spend more time amusing the cat than dribbling
drivel onto the web.)
This urge to write blag entries instead of doing useful things such
as acquiescing to feline demands takes its toll also in other ways.
For example, programmers will often briefly emit mutterings on their
blags about their latest objects of fascination, without releasing
their code. If the existence of that code is an equivocal one, this
sputtering of substanceless spoutage is known in some circles as
`wanking', a technical term. Wanking is not, and never has been,
desirable behaviour to be associated with. Actual code can speak
for itself, but speaking for that which has no existence, or at best
a vaporous existence, is about as useful to others as a fish is to a
bicycle. Or something like that. Yet these programmers persist in
writing blag entries instead of substantial code.
Finally, the blag is usually directed at an unclear audience. Since
there is no particular audience that the blag is intended to cater
to, and so for any particular audience much information is discussed
in much greater detail than is necessary, and much other necessary
information is omitted. Arguments are ill-defended because the
blagger has no connection to his audience when writing the blag
entry; he must build all the argument on a nebulous idea of what and
whom he's trying to argue against. The result is poorly laid-out,
ill-conceived, and unpersuasive arguments or other types of entries.
Thank you for slogging through the awful post above. I hope that
its mere form and existence have convinced you of what it failed to
argue for itself. At the very least, perhaps it will dissuade you
from assigning further merit to this blag!
2006-02-04 Disturbing blaggery
I have noticed a disturbing urge to write more material in this joke
of a text file in the past few days. I will have to quell this
strange compulsion. I feared it would happen from the start.
2006-02-03 Option types, optional parameters
Languages like ML and Haskell that are heavily based on pattern
matching have a very simple but elegant mechanism for representing
value and absence of value without employing a special value that is
still a value like any other: option types. The basic idea is that
there is a type which I shall denote with (OPTION ), where
is the type of the value that may be present. Two kinds of
terms have this type: (SUPPLIED ) and (ABSENT), where
has the type . For example, a TABLE-ENTRY procedure might
return (SUPPLIED ) or (ABSENT) to denote that
there is no value in the table associated with the given key.
The idiom in Lisps tends to be to use the false constant (#F or NIL)
for representing the absence of value, but this inhibits #F or NIL
from being itself a present and useful value. (The conflation of
the false constant and the empty list in old, crufty Lisps
exacerbates this.) This poses a problem, for example, if you want a
general table mapping some key to any object -- including #F -- but
you also want to be able to determine when an association is absent
in the table. I call this conflation /domain contagion/, because
NIL or #F or the symbol NOT-HERE are all still in the domain of
expressible values, when they should be kept apart, because their
whole purpose is to express the *absence* of an expressible value.
There are actually two different problems here. The first I shall
leave for another day, although it is more relevant to the
TABLE-ENTRY example. It is that procedures like TABLE-ENTRY must
convey information about control flow to their callers via data
flow. This indirection ought not to be necessary. Olin Shivers
wrote a paper called `Multi-return function calls' (although I think
that `multi-continuation procedure calls' is a better term) about
this fairly recently. The basic idea is that, rather than giving a
procedure a single continuation, which itself will discriminate the
data that the procedure returns to it, the caller supplies a
*choice* of several continuations, and the callee picks one to
return to, discarding the rest. The problem of denoting absence of
value in procedures like TABLE-ENTRY can be solved by passing two
continuations to TABLE-ENTRY: one for the case where there is a
value (which continuation accepts one argument), and one for the
case where there is no value (which continuation is nullary). I
recommend the paper; I may write something on the subject later,
too.
The other problem is not in eliding the indirection of data flow,
but in storing data. Here is a simple example: optional parameters.
There is no control flow involved here. If a procedure accepts
optional parameters, it simply has several variables which may or
may not have been supplied as arguments. Now, while there are some
cases where it would suffice to specify in the lambda default values
for the case of absence of the arguments, there are other cases
where it is useful either to pass on the optional parameter's value
to another procedure (and retain the supplied/absent status) or to
conditionalize on whether the parameter was supplied in order to do
something fancier altogether.
Here are several strategies currently employed by existing Scheme
systems:
1. Give no provision for specifying default values in the lambda
syntax itself, and instead
a. use a special value, such as #F (DSSSL) or some `default
object' token (current MIT Scheme).
-- This is subject to domain contagion as discussed
above.
b. make references to the variables corresponding with
optional parameters trap, and provide a special form
DEFAULT-OBJECT? which tests whether references to those
variables would trap in order to conditionalize on their
presence as arguments (older MIT Scheme).
-- This requires some pretty kludgey changes to the
semantics of variables in the language. Furthermore,
it inhibits the passage of unsupplied optional
parameters on to other procedures: you can't simply
pass on an unsupplied parameter to another procedure
by passing the value of the corresponding variable,
because references to that variable trap. (MIT
Scheme no longer uses this semantics for optional
arguments.)
2. Require that the default value be supplied. (PLT's OPT-LAMBDA)
-- This is inconvenient in some cases, and, while there are
ways around domain contagion with this strategy, they are
pretty inconvenient and they defeat simple static type
analysis (which is still useful for dynamically typed
languages like Scheme):
(LET ((NOT-HERE (CONS 'NOT-HERE '())))
(LAMBDA (... #!OPTIONAL (X NOT-HERE) ...)
... (IF (EQ? X NOT-HERE) ...) ...)))
3. Bind another variable to a boolean flag indicating whether an
optional parameter was supplied. (Common Lisp)
-- This fixes most of the disadvantages of the above, but it
still does not allow the convenient passage of unsupplied
optional parameters on to other procedures. It can also
get to be syntactically pretty heavy-weight in usage.
All of the problems of domain contagion, type analysis defeat,
passage of unsupplied optional parameters onto other procedures, and
complexity of usage, would be alleviated by the introduction of an
option type as in Haskell or ML (or even dynamically typed languages
heavily focussed on pattern matching like Erlang). There would be
two remaining issues: undesirable boxing, and the method by which to
conditionalize on option types.
When it comes to data already stored in the heap, the boxing of an
option type is not much of an issue; it is merely one more layer of
indirection. In fact, on, say, a 64-bit system, where three tag
bits is ideal, one possibility would be to use one of those tag bits
for flagging option types. (This may not be a great idea, but it is
an idea nonetheless which might bear worth experimenting.) As for
optional parameters, for which run-time costs of heap allocation is
not a reasonable option (sorry for the pun), we must first assume
that the compiler knows about the option type in the first place;
with this assumption in mind, it is not difficult for the compiler
to recognize trivially optimizable tests of the option's presence.
For example, it could map such tests to comparisons against an NARGS
register.
The syntax for testing optional data might be, without pattern
matching, a bit clumsy. There are a couple ways it might be
accomplished: a new special form could be added, (IF-OPTION (