This is Taylor R. Campbell's blag. -*- outline -*-
Copyright (c) 2006--2020, Taylor R. Campbell.
Verbatim copying and distribution of this entire article are permitted
worldwide, without royalty, in any medium, provided this notice, and
the copyright notice, are preserved.
2020-11-04 How to election
(Permalink: https://mumble.net/~campbell/2020/11/04/how-to-election)
(Content warning: US-centric perspective)
1. Months before the election, sign up to be a poll worker, and do
whatever training or organizing is needed in your precinct.
2. In the weeks leading up to the election, assiduously ignore all
newspapering and twittering and cablenewsing and other forms of
vapid blathering about elections as horse races. (See also
Zeynep Tufekci's commentary on the pernicious nature of
projections in `news' coverage leading up to elections.)
3. On election eve, do some activity to tire yourself out, turn in
early, and get a good night's sleep.
4. Wake up at 5am on the day of the election -- enough time to get
a bagel for breakfast before you hit the polling station at 6am
to help set it up before the polls open at 7am. For, I hope,
just 2020: Bring a thermos of hot tea if it's cold, so you can
keep warm while the windows are open to ventilate the
aerosolized covids out of the place.
5. Spend all day working to make everyone's vote as smooth as it
can be. Work to ensure they get to cast a ballot, even if it
means getting on the phone with city hall to double-check an
irregularity, even if it's a provisional one pending resolution
of an irregularity. Sometimes someone forgot to update their
voter registration when they moved; sometimes someone misread a
`9' as a `4' when entering a registration into the database.
If the waiting time to vote is longer than a few minutes,
something has gone wrong -- and it's probably not the poll
workers' fault. Voting is a human right, not a luxury for
residents of affluent precincts.
6. When the job is done and the ballots are safely recorded and
delivered, go home, turn off the internet, and go to bed.
7. Spend the day after the election offline, relaxing in blissful
ignorance of the state- or country-wide results, which won't be
officially certified for days or weeks anyway. Curl up in a
comfy chair with a good book, a cat, and a cup of tea or hot
chocolate.
2018-01-26 Court watch
(Permalink: https://mumble.net/~campbell/2018/01/26/court-watch)
(Twitter: https://twitter.com/Riastradh_/status/956953665155321857)
From a court watch this morning: an object lesson in why the
@MassBailFund objects to putting pretrial services—to remind
defendants of upcoming court dates—under the management of the
probation department.
Probation officer: She's been in court three times this week,
she let the battery deplete on her GPS monitor, she ignored the
flashing warning on it, and she didn't answer her phone when
probation tried to contact her about it. So please revoke bail
and put her in detention.
Defense attorney: She had two scheduled court dates this week,
she waited at probation office for battery to charge, and she
came in a third time because the battery ran out. And the only
reason she's on GPS is that she defaulted last year, but
obviously she's showing up at court now!
To be clear: probation officer asked court to hold indigent
defendant responsible for state's defective device by jailing
her—after she showed up in court to fix the device she's wearing
only to persuade her to come to court. Her child was already taken
away.
2017-10-17 On _Blade Runner_
(Permalink: https://mumble.net/~campbell/2017/10/17/on-blade-runner)
(Content warning: sexual exploitation)
[SPOILER ALERT: I don't think _Blade Runner_ really has spoilers, so
I made no attempt to conceal anything from its plot. When I first
saw it years ago, it made almost no impression on me; I thought it
was just a flat atmospheric sci-fi noir. Now, having seen it again
recently prompted by the hubbub around _Blade Runner 2049_, it
seems to me one of the deeper films I've seen, and readily
withstands repeat viewing.]
Was Deckard a replicant? Every aficionado of _Blade Runner_, it
seems, is obsessed with this question, whether or not it is ever
answerable, or perhaps partly because it is not.
What is the difference between a replicant and a human? Tyrell
Corporation's motto is `more human than human'. The best we can say
is that replicants are, on average, stronger and more agile or
sometimes sexier, but worse at empathy. And they die after a
predictable duration. The only reason we consider replicants to be
_inhuman_ is that we are asked to as a stated premise of the film.
And that premise is supposed to justify killing them in the spot.
It doesn't take much imagination to formulate an alternative
hypothesis: The Tyrell Corporation breeds and raises humans in a
controlled, abusive environment -- conditioning or selecting them
for strength and agility, desocializing them, and infecting them
with a virus that kills them on a reliable schedule -- and then
sells them as slaves.
Thus they are prevented from organizing resistance while being
raised in captivity, and any resistance they may attempt to organize
if they escape is thwarted after a modest time. The abuse of
desocialization leaves them with impaired empathy, but it is
commercially useful for slaves to empathize with their masters in
order to serve them better, so it is in Tyrell's economic interest
to mitigate the damage of the abusive environment, perhaps with some
kind of therapy to implant false memories.
The Voight-Kampff test is apparently a glorified polygraph test.
This pseudoscientific contraption serves not to distinguish lies
from truths as the polygraph test does, but to distinguish the slave
race from the human race. The idea of making that distinction is
not new -- there is a long history of `scientific racism' inventing
such distinctions to serve the convenient purpose of retroactively
rationalizing the subjugation of entire peoples into slavery, an
idea explored further by Karen E. Fields and Barbara J. Fields in
their book _Racecraft_.
We first meet our protagonist Deckard as he attempts to live his
life freely, only to be arrested by the police and denied the
opportunity to finish his sushi. The police want him to come out of
retirement to `retire' some replicants.
Bryant, his sometime police supervisor, can't simply ask him --
Bryant knows he wants nothing to do with it. So Bryant has abused a
position of power to arrest Deckard and compel him to again serve as
an agent of the system of slavery, enforcing it by killing the
escapees.
Deckard fancies himself to be not a racist -- a kind of enlightened
white moderate liberal. In the voiceover (omitted in some cuts) he
derisively compares Bryant's use of the slur `skin job' to the use
of `nigger' by cops in history books. And he's willing to sleep
with a replicant, and persuade himself it is for love.
He even seems to have a thought or two kicking around in his head
about systems of exploitation: when he tries to gain access to
Zhora's dressing room -- whether to administer a Voight-Kampff test,
or just to kill her more discreetly to avoid upsetting passers-by --
the badly concealed ruse he invents on the spot is as a bureaucrat
studying the exploitation of burlesque artists by theatre managers.
Zhora's response is understandable impatience at the clueless idiot
badgering her about a world that is so painfully obviously rife with
exploitation far beyond the theatre. Her weary incredulity at
Deckard's cluelessness concisely reflects many of the sentiments
that are circulating today in response to those who are baffled by
how long Harvey Weinstein perpetuated his own personal system of
sexual exploitation, three and a half decades after _Blade Runner_
was written.
But whatever thoughts Deckard has about exploitation, they fall
short of serious reflection on his own station.
Two short scenes later, Deckard is under attack by the replicant
Leon. Rachael, Tyrell's personal replicant who escaped, saves
Deckard's life by killing Leon herself.
At this point, the threats to Deckard's existence are -- perhaps --
the two remaining escaped replicants Roy and Pris, if they are even
aware of him. The threats to Rachael's existence? Both Deckard
personally, whose job it is to kill every last replicant, and the
entire institution of the state enforcing a system of slavery.
Rachael, in desperation, begs Deckard for mercy. Deckard reads it
as a transaction: Rachael saved his life, so he'll save hers.
Beyond that transaction, to any neutral observer, the imbalance of
power is palpable. To Deckard, infatuated with Rachael's sexiness
after scrutinizing her as a subject under an unusually long
Voight-Kampff test, it is suddenly convenient.
So he rapes her. And he rationalizes to himself that the sexual
slavery he has just coerced her into -- as a condition for his
protection from death by the state of which he is an agent -- is
love.
All the replicants are desperate. All they want is to live a normal
life like everyone else, but they are hunted like prey. Whether one
thinks violence is ever _justified_, it should come as no _surprise_
that Leon, in the circumstances unimaginably far beyond his control,
had lashed out at and tried to kill Deckard.
Roy is smarter, and has become crueller. In the agony of learning
from Tyrell himself that nothing can be done to reverse the
predetermined, premature death imposed on him as a replicant, Roy
sadistically murders Tyrell -- Tyrell, who was too obsessed with the
physique of his commercial product to empathize with the terror of
their imposed early mortality, and can't even formulate an apology
for the curse he brought upon them.
Cruel and sadistic as he is, Roy demonstrates empathy and a regard
for human life beyond anyone else in the film when he saves the life
of his killer, Deckard. Thus his killer can listen to his appeal to
empathy for life as a slave -- and can watch him at long last escape
the system of slavery in the only way it allows: in the pyrrhic
freedom of death, as the white dove that flies from his dying body.
Was Deckard a replicant? To focus on this question after seeing
_Blade Runner_'s meditation on the depravity of the institution of
slavery, and on the monsters it turns everyone into -- masters,
slaves, and slave traders alike -- to focus on this question strikes
me like focussing on the question of whether Hitler had Jewish
ancestry after watching _Schindler's List_. It may be an academic
curiosity, but it's a fundamentally confused distraction from the
elephant in the room -- a twisted fixation of a society desperately
trying not to see itself in the mirror.
2017-06-23 Friday-night tweet-size crypto: Rabin/Blum/Williams/Shoup-style
public-key key encapsulation
(Permalink: https://mumble.net/~campbell/2017/06/23/fntsc-rbws-kem)
(Twitter: https://twitter.com/Riastradh_/status/878486389100355586)
(Secure? Maybe. Don't try this at home, kids!)
- Keygen: Pick 1024-bit primes p < q uniformly with 2^2047 < p q <
2^2048 and p = q = 3 (mod 4); publish n = p q.
- Encrypt: Pick x uniformly in (Z/nZ)^*; compute y = x^2 (mod n),
z = y^2 (mod n); use H(y) as secret key; transmit z.
- Decrypt: Compute r = z^{(p + 1)/4} (mod p), s = z^{(q + 1)/4} (mod q);
use CRT to find y with y = r (mod p) and y = s (mod q).
- Correctness 1: If z = y^2 in Z/mZ for prime m = 3 (mod 4), then
y^{m - 1} = 1, so z = y^2 = y^{m + 1} = z^{(m + 1)/2} = (z^{(m +
1)/4})^2.
- Correctness 2: Square-modulo-n is a permutation of squares modulo
n: exactly one square y = x^2 (mod n) solves z = y^2 (mod n).
- Hashless variant? On decrypt, use CRT to find z' with z' = r^2
(mod p) and z' = s^2 (mod q); reject if z =/= z' (mod n).
--> Nope, hash (as usual) is critical for security: attack works
by learning square y' =/= +/-y from z = y^2 when random y
turns out nonsquare.
- Reduction to factoring: Let y = f(z) solve z = y^2 (mod n). If
f(y^2) = y' =/= +/-y for some chosen nonsquare y, then
gcd(y+/-y', n) | n.
2017-06-16 Friday-night tweet-size crypto: ECDSA over secp256k1
(Permalink: https://mumble.net/~campbell/2017/06/16/fntsc-ecdsa-secp256k1)
(Twitter: https://twitter.com/Riastradh_/status/875921073409282048)
(For amusement purposes only!)
- Parameters: Let p = 2^256 - 2^32 - 978, a prime; E, the Weierstrass
curve y^2 = x^3 + 7 over Z/pZ of order l; B, a point; H, SHA256.
- Keygen: Pick a uniformly in Z/lZ; publish A = a B.
- Verify: Given message m, coordinate x(R) in Z/pZ, and scalar s in
Z/lZ, test x(H(m) s^-1 B + x(R) s^-1 A) = x(R).
- Sign: Pick r uniformly in Z/lZ; compute R = r B, s = r^-1 [H(m) +
x(R) a] (mod l), reveal (x(R), s).
- Correctness: x(H(m) s^-1 B + x(R) s^-1 A) = x([H(m) + x(R) a]
s^-1 B) = x(r B) = x(R).
- Nwice attack: If (m, x(R), s) and (m', x(R), s') are signatures
under public key A = a B, then a = [s (H(m) - H(m'))/(s - s') -
H(m)]/x(R).
- Signature malleability: If (m, x(R), s) is a valid signature,
then so is (m, x(R), -s), since x(-R) = x(R).
[Update, 2017-10-09: Fix confusion of field size and curve order.
Correct the format: a signature contains only the x coordinate of
R, not a full encoding of R. Unfortunately, I cannot fix these
small but critical mistakes on Twitter.]
2017-05-15 Fake news
(Permalink: https://mumble.net/~campbell/2017/05/15/fake-news)
`MP3 is dead!', proudly proclaims Fraunhofer, the German company
whose legal authority for a decades-long scheme of extortion over
sharing culture expired last month.
The news was loudly trumpeted -- or, shall I say, greatly
exaggerated -- by loyal corporate mouthpieces all around the world:
Andrew Flanagan, `The MP3 Is Officially Dead, According To Its
Creators', National Public Radio, May 11, 2017.
http://www.npr.org/sections/therecord/2017/05/11/527829909/the-mp3-is-officially-dead-according-to-its-creators
David Lumb, `MP3 is dead, long live AAC', Engadget, May 12, 2017.
https://www.engadget.com/2017/05/12/mp3-is-dead-long-live-aac/
Karen Gilchrist, `The MP3 is dead, say creators after terminating
licensing', CNBC, May 15, 2017.
http://www.cnbc.com/2017/05/15/mp3-dead-say-creators-after-terminating-licensing.html
`MP3 is officially dead as founders terminate licensing program',
ABC, May 15, 2017.
http://www.abc.net.au/news/2017-05-15/mp3-founders-terminate-licensing-program/8526292
Matthew Field, `Creators of the MP3 declare it dead',
The Telegraph, May 15, 2017.
http://www.telegraph.co.uk/technology/2017/05/15/creators-mp3-declare-dead/
Rhett Jones, `Developers of the MP3 Have Officially Killed It',
Gizmodo, May 14, 2017.
https://gizmodo.com/developers-of-the-mp3-have-officially-killed-it-1795205540
No, Rhett. The developers of the MP3 haven't killed it. *You*
killed it by blithely copying corporate press release bullshit in
the tarnished name of journalism and propagating the commercial
interests of Fraunhofer as if they were truth for your readers to
learn.
Actually, you didn't even accomplish that, because unlike the
reputations of shills in the skins of journalists, well-documented,
widely implemented technical ideas don't really die.
2016-12-31 One Gooshball
(Permalink: https://mumble.net/~campbell/2016/12/31/one-gooshball)
Am E7
A little cat paced up and down
E7 Am
She found an empty bowl on the ground
Am7/G Dm/F
She looked the cupboard through and through
E7 E7
To see what feast she'd like to chew.
Am Am7/G F7 E7
One goosh- ball,
Am Am7/G F7 E7
One goosh- ball,
Am Am7/G F7 E7
She'd like to eat just one goosh- ball.
(repeat)
She told the human near at hand,
The simple dinner she had planned.
The cats were startled one and all
To hear that human loudly call no goosh-ball!
No goosh- ball,
No goosh- ball,
You've had your dinner: no goosh-ball!
The little cat felt ill at ease.
She said, `I'm starving-- feed me, please!'
The human hollered down the hall
You're much too fat for one goosh-ball!
One goosh- ball,
One goosh- ball,
You're much too fat for one goosh-ball!
(Scat-sung verse and refrain -- or cat-sung, if you like.)
The little cat felt very bad,
Crunchy-food was all she had.
And in her dreams she hears that call
You're much too fat for one goosh-ball!
One goosh- ball,
One goosh- ball,
You're much too fat for one goosh-ball!
(With apologies to Dave Van Ronk, whose version is the best; to
Professor George Martin Lane of Harvard who wrote this ersatz
traditional folk tune in the first place; and to Miette, who
attained the nickname Miche, and who feels very bad right now.)
2016-12-16 neveragain.tech
(Permalink: https://mumble.net/~campbell/2016/12/16/neveragain.tech)
[Nothing in this post is novel. I am writing it as a personal
addendum to the pledge to amplify the meaning of my signature.]
After some deliberation, and with reservations, I signed the pledge
at that has been circulating the
internet[1], text quoted below in full, because it is a starting
point.
Why reservations? It is *only* a starting point.
We have already built the apparatus that the signatories of this
pledge refuse to build. I have long stated, quite seriously, that
Mark Zuckerberg pulled off with Facebook the gigantic trove of
dossiers on everyone that the NSA, the Stasi, the Gestapo, and
countless other institutions of mass surveillance and coercion
could only dream of creating. Ten years ago that sounded to my
colleagues, friends, and family like a joke.
It was not and is not a joke.
And it's not just Facebook. I am optimistic that mass surveillance
is not so intricately woven into the fabric of society that we
can't meaningfully resist it. But it has embedded itself deeply --
not just in social media, but also in all your searches for
information on the web at Google, all your commerce at Amazon, all
your purchases with a credit card[2]. It is embedded into most of
the press[3], our primary medium for criticism of institutions of
power, who are consequently unable to contemplate rejecting mass
surveillance[4].
Why reservations? Not everyone has the financial security to
commit to resigning when they recognize the hints of what they are
being asked to do -- which is no individual fault of their own if
our society fails to support the interests of its people adequately
against the interests of corporations. And with no consequences
for transgression, a signature on this pledge may be only an
intrinsically meaningless instrument of social capital to make its
signatories feel morally superior and enhance their image to
colleagues.
Why reservations? It is not only about Muslims or immigrants --
mass surveillance is a general instrument of control on its own,
whether or not it is used to target an ethnic group.
Why reservations? The impetus for the pledge now is obviously
Trump and the incoming administration. But a Clinton
administration would doubtless only have expanded, like the Obama
administration did, the powers that the moderate liberal tribe is
afraid Trump might use, such as indefinite detention[5][6], mass
immigrant family detention[7], general warrants[8], and
assassination[9].
My signature of this pledge is emphatically nonpartisan. Prior to
November 8th, I began to plan a visit D.C. on January 20th to
protest whichever of the bad options on the ballot was chosen.
Bruce Schneier wrote yesterday that his personal list of priorities
for the next four years would have been the same under Trump and
Clinton[10].
This is not because Trump and Clinton are equivalent (though they
do share a common propensity for weighing the interests of
multinational corporations over people, such as in their
practically identical plans for an enormous corporate tax
holiday[11]), but because the premise that the current qualified
leader will use powers such as assassination responsibly has led
all three branches of the US federal government to irresponsibly
develop those powers, and those powers persist beyond the president
under whom they were instated.
I hope that by signing the pledge I will remind my future self to
be vigilant, and perhaps I will in at least a small way embolden
those who respect me to do the same -- by making it, if only a
little bit, more socially acceptable to resist.
[Update, 2016-12-16: Shortly after I wrote this, I became aware of
Mike Perry's much better response written at the Tor Project blog:
Mike Perry, `Technology in Hostile States: Ten Principles for User
Protection', Tor Project Blog, December 16, 2016.
https://blog.torproject.org/blog/technology-hostile-states-ten-principles-user-protection]
Text of the pledge:
We, the undersigned, are employees of tech organizations and companies
based in the United States. We are engineers, designers, business
executives, and others whose jobs include managing or processing data
about people. We are choosing to stand in solidarity with Muslim
Americans, immigrants, and all people whose lives and livelihoods are
threatened by the incoming administration’s proposed data collection
policies. We refuse to build a database of people based on their
Constitutionally-protected religious beliefs. We refuse to facilitate
mass deportations of people the government believes to be undesirable.
We have educated ourselves on the history of threats like these, and on
the roles that technology and technologists played in carrying them
out. We see how IBM collaborated to digitize and streamline the
Holocaust, contributing to the deaths of six million Jews and millions
of others. We recall the internment of Japanese Americans during the
Second World War. We recognize that mass deportations precipitated the
very atrocity the word genocide was created to describe: the murder of
1.5 million Armenians in Turkey. We acknowledge that genocides are not
merely a relic of the distant past—among others, Tutsi Rwandans and
Bosnian Muslims have been victims in our lifetimes.
Today we stand together to say: not on our watch, and never again.
We commit to the following actions:
- We refuse to participate in the creation of databases of identifying
information for the United States government to target individuals
based on race, religion, or national origin.
- We will advocate within our organizations:
. to minimize the collection and retention of data that would
facilitate ethnic or religious targeting.
. to scale back existing datasets with unnecessary racial, ethnic,
and national origin data.
. to responsibly destroy high-risk datasets and backups.
. to implement security and privacy best practices, in particular,
for end-to-end encryption to be the default wherever possible.
. to demand appropriate legal process should the government request
that we turn over user data collected by our organization, even in
small amounts.
- If we discover misuse of data that we consider illegal or unethical
in our organizations:
. We will work with our colleagues and leaders to correct it.
. If we cannot stop these practices, we will exercise our rights and
responsibilities to speak out publicly and engage in responsible
whistleblowing without endangering users.
. If we have the authority to do so, we will use all available legal
defenses to stop these practices.
. If we do not have such authority, and our organizations force us to
engage in such misuse, we will resign from our positions rather
than comply.
. We will raise awareness and ask critical questions about the
responsible and fair use of data and algorithms beyond our
organization and our industry.
Signed, among one thousand eight-hundred forty-two others as of
this post,
Taylor `Riastradh' Campbell, Research Engineer, Massachusetts
Institute of Technology
[1] I signed by email, so it may take some time for my signature to
appear on the page.
[2] You think it is the secrecy of your credit card number that
keeps your credit card secure? No -- the security of the credit
card system rests first and foremost on the profiles the credit
card companies build of your activity to detect unusual behaviour.
It is not merely possible for credit card companies to monitor your
purchasing behaviour -- it is essential to the functioning of the
credit card system that they continuously monitor and profile it.
[3] Quinn Norton, `The Hypocrisy of the Internet Journalist', The
Message at medium.com, May 29, 2015.
https://medium.com/message/the-hypocrisy-of-the-internet-journalist-587d33f6279e
[4] Editorial Board, `Boston police enter era of cyber community
policing', Boston Globe, December 2, 2016.
https://www.bostonglobe.com/opinion/editorials/2016/12/02/boston-police-enter-era-cyber-community-policing/DtZRX7y7x6VsQFYYBE6yyO/story.html
[5] D. Parvaz, `US lawmakers legalise indefinite detention',
Al-Jazeera, December 16, 2011.
http://www.aljazeera.com/indepth/features/2011/12/201112773810926474.html
[6] Chris McGreal, `Military given go-ahead to detain US terrorist
suspects without trial', The Guardian, December 14, 2011.
https://www.theguardian.com/world/2011/dec/15/americans-face-guantanamo-detention-obama
[7] Claudia Morales, `Families crossing the border: ``We are not
criminals'' ', CNN, November 2, 2016.
http://www.cnn.com/2016/11/02/us/family-immigration-detention-centers/
[8] Jeff John Roberts, `FBI's New Hacking Powers Take Effect This
Week', Fortune, November 30, 2016.
https://fortune.com/2016/11/30/rule-41/
[9] Jeremy Scahill, `The Assassination Complex', The Drone Papers,
The Intercept, October 15, 2015.
https://theintercept.com/drone-papers/the-assassination-complex/
[10] Bruce Schneier, `My Priorities for the Next Four Years',
Schneier on Security, December 15, 2016.
https://www.schneier.com/blog/archives/2016/12/my_priorities_f.html
[11] David Dayen, `The Huge Corporate Tax Cut Hillary Clinton
Doesn't Talk About', New Republic, October 21, 2016.
https://newrepublic.com/article/138023/huge-corporate-tax-cut-hillary-clinton-doesnt-talk
2016-12-08 On journalism, credibility, and fake news
I echo to my echo chamber everything said in this essay:
Nathan J. Robinson, `The Necessity of Credibility', Current
Affairs, December 3, 2016.
https://www.currentaffairs.org/2016/12/the-necessity-of-credibility
2016-12-03 On Uber and power
(Permalink: https://mumble.net/~campbell/2016/12/03/on-uber-and-power)
I do not use Uber. I primarily ride public transit, and bicycle.
On the rare occasions when hiring a car is the only expedient
option -- nine times in the past year, by my records -- I ride a
taxi. Why? As everyone knows, taxis are more expensive than Uber.
As many are quick to point out, the taxi medallion systems are
rampant with corruption[1]. Isn't it better to use the cheaper
system that fights age-old corruption in the political machine?
Uber is cheaper mainly because their business model is to
unsunstainably undercut the competition with their billions of
dollars of venture capital until the competition is dead[2]. The
corruption of taxi medallion systems is distributed across cities,
not centralized in one office in San Francisco. Uber has much
bigger coffers, again from venture capital, to campaign for cities
to regulate them less than taxis[3] -- and to extort deregulation
from major cities[4].
The conceit of Uber's currently unsustainable business model is
that the venture capital will carry it far enough to replace the
fleet of human drivers by software drivers. Whether that pans out,
the intended result is another mass surveillance[5] empire
monitoring and brokering everyone's comings and goings around town
that has demonstrated the will to use its centralized power to
bribe and extort local regulators across continents. That is a
power I do not want to support, so I vote with my wallet and refuse
to participate.
(Is Uber the most important corporate machine to avoid? Absolutely
not, but it is so prevalent in popular culture that has become a
part of the lexicon, and hence worth addressing. See also:
.)
[1] Boston Globe Spotlight Team, `Driven to the Edge', three-part series
https://www.bostonglobe.com/metro/specials/taxi
Part 1: Globe staff, `For Boston cabbies, a losing battle against
the numbers', Boston Globe, March 31, 2013.
https://www.bostonglobe.com/metro/2013/03/30/spotlight/9eVWW7Y6RaOIqII62n2XlI/story.html
Part 2: Globe staff, `An empire built on ambition and a very hard
line', Boston Globe, April 1, 2013.
https://www.bostonglobe.com/metro/2013/03/31/empire-built-ambition-and-very-hard-edge/EyaKQDkzLC8u6DuYqbHMZL/story.html
Part 3: Globe staff, `For cab drivers, risk and reward are a
mismatch', Boston Globe, April 2, 2013.
https://www.bostonglobe.com/metro/2013/04/01/spotlight/IkU7kjxSy2d1N8eYhDTBaL/story.html
[2] Hubert Horan, `Can Uber Ever Deliver?', three-part series in
Naked Capitalism.
Hubert Horan, `Can Uber Ever Deliver? Part One -- Understanding
Uber's Bleak Operating Economics', Naked Capitalism, November 30,
2016.
https://www.nakedcapitalism.com/2016/11/can-uber-ever-deliver-part-one-understanding-ubers-bleak-operating-economics.html
Hubert Horan, `Can Uber Ever Deliver? Part Two -- Understanding
Uber's Uncompetetive Costs', Naked Capitalis, December 1, 2016.
https://www.nakedcapitalism.com/2016/12/can-uber-ever-deliver-part-two-understanding-ubers-uncompetitive-costs.html
Hubert Horan, `Can Uber Ever Deliver? Part Three -- Understanding
False Claims About Uber's Innovation and Competitive Advantages',
Naked Capitalism, December 2, 2016.
https://www.nakedcapitalism.com/2016/12/can-uber-ever-deliver-part-three-understanding-false-claims-about-ubers-innovation-and-competitive-advantages.html
[3] Casey Tolan, `Uber and Lyft are spending millions of dollars on
a local election in Austin, Texas. Here's why.', Fusion, April 19,
2016.
https://fusion.net/story/291210/uber-lyft-ridesharing-election-austin-texas/
[4] Lora Kolodny, `Uber and Lyft ``pause'' Austin operations today
in standoff over regulation', TechCrunch, May 9, 2016.
https://techcrunch.com/2016/05/09/uber-and-lyft-pause-austin-operations-in-standoff-over-regulation/
[5] Kate Conger, `Uber begins background collection of rider
location data', TechCrunch, November 28, 2016.
https://techcrunch.com/2016/11/28/uber-background-location-data-collection/
This is not to say that *background* collection of location data is
what made the difference between mass surveillance and not -- only
to illustrate the extent of mass surveillance to which Uber is
willing to go for its business purposes.
2016-12-02
So absurd I have nothing to add. Wine-glass-toting ascot-wearing
city solicitor tags upscale grocery store `FUCK TRUMP'.
Elie Mystal, `City Attorney Spraying Anti-Trump Graffiti While
Drinking Wine IS All We Have Left', Above the Law, 2016-12-01.
http://abovethelaw.com/2016/12/city-attorney-spraying-anti-trump-graffiti-while-drinking-wine-is-all-we-have-left/
2016-11-20 More topical essays
A friend writes to add another essay to the list:
Scott Alexander, `You Are Still Crying Wolf', Slate Star Codex,
November 16, 2016.
http://slatestarcodex.com/2016/11/16/you-are-still-crying-wolf/
I would add to the caveat that the essay only addresses the
rhetoric in the moderate liberal tribe about whether Trump is
`openly racist' or `openly white supremacist' -- phrases that are
generally not likely to win you support from the people you apply
them to, or from the people whose choice of representative you
apply them to.
Does this mean a Trump/Pence administration, Republican-controlled
Congress, and Trump-nominated Supreme Court will be hunky-dory for
queers, Muslims, &c.? I don't think so!
There is plenty else to criticize about the incoming
administration, legislature, and judiciary -- but also about the
outgoing administration, legislature, and judiciary:
Alex Emmons, `Commander-In-Chief Donald Trump Will Have Terrifying
Powers. Thanks, Obama.', The Intercept, November 11, 2016.
https://theintercept.com/2016/11/11/commander-in-chief-donald-trump-will-have-terrifying-powers-thanks-obama/
Glenn Greenwald, `Donald Trump's Policies Are Not Anathema to
U.S. Mainstream, but an Uncomfortable Reflection of It', The
Intercept, March 4, 2016.
https://theintercept.com/2016/03/04/trumps-policies-are-not-anathema-to-the-u-s-mainstream-but-an-uncomfortably-vivid-reflection-of-it/
(The photograph is of Rudy Giuliani, Donald Trump, Michael
Bloomberg, and Bill Clinton all golfing together.)
Here are some more essays that I thought noteworthy before I gave
myself a break and stopped collecting them last week:
If you can get past the us-vs-them clickbait listicle, the
caricature this essay paints is worth studying not exactly for the
truth (or even internal consistency) of it but to taste the
perspectives of what many white moderate liberals pejoratively
characterize as idiots in the rust belt, from someone who grew up
in the middle of nowhere in Illinois:
David Wong, `How Half of America Lost Its F**king Mind', Cracked,
October 12, 2016.
http://www.cracked.com/blog/6-reasons-trumps-rise-that-no-one-talks-about/
On how journalism changed in this cycle -- and on the perverse
incentives of centrally organized but personalized feeds and filter
bubbles. (This issue is orthogonal to the true campaign positions
of Trump and Clinton -- which I believe relatively few people would
have found compelling either way even were they not surrounded by
sensationalist hype and clickbaited filter bubbles.)
Joshua Benton, `The forces that drove this election's media failure
are likely to get worse', November 9, 2016.
http://www.niemanlab.org/2016/11/the-forces-that-drove-this-elections-media-failure-are-likely-to-get-worse/
Silverman, Strapagiel, Shaban, Hall, & Singer-Vine, `Hyperpartisan
Facebook Pages Are Publishing False and Misleading Information At An
Alarming Rate', October 20, 2016.
https://www.buzzfeed.com/craigsilverman/partisan-fb-pages-analysis
On specifically Facebook's feeds, ideological diversity, the faux
objectivity of machines and big data, and the influences of
profit-motivated monopolies that are the man behind the curtain of
that machine-assisted faux objectivity. (Pay attention to the author,
not the publication, here -- there is much to criticize about HuffPo,
but I have found Zeynep Tufekci to be a consistently thoughtful source
of new ideas and perspectives.)
Zeynep Tufekci, `Facebook Said Its Algorithms Do Help Form Echo
Chambers', WorldPost, Huffington Post, May 11, 2015.
http://www.huffingtonpost.com/zeynep-tufekci/facebook-algorithm-echo-chambers_b_7259916.html
Glenn Beck, whose political transformation in recent months has
stunned many, is an expert in stoking fires and watching them burn,
including, e.g., the notion promulgated beginning eight years ago
that Obama might not leave office after two terms -- a notion I
have already heard repeated countless times about Trump in white
moderate liberal circles. So I think he is qualified to comment on
the mechanism of the hysteria surrounding Trump's presidency,
whether or not the content of the hysteria is valid.
Glenn Beck, `Don't Move to Canada. Talk to the Other Side.', Op-Ed,
New York Times, November 11, 2016.
http://www.nytimes.com/2016/11/11/opinion/glenn-beck-dont-move-to-canada-talk-to-the-other-side.html
2016-11-11 Fever
A colleague related to me this afternoon an analogy to understand
the severity of global average temperature increase by 2℃. When a
human's body temperature raises by 1℃, it's a fever -- they feel
ill, can't focus on work, and go home to rest. Another degree or
two more and it's a serious fever -- enough to go to the hospital
for treatment. But the Earth can't go home to rest, and certainly
can't find a doctor among its planetary comrades for treatment.
It's a cute analogy to explain the danger of a 2℃ temperature
increase. But we can extend the analogy further. Humans and other
mammals develop fever, we surmise, to kill pathogens when the
immune system is otherwise unable to keep up. So what does that
suggest the Earth is doing to humans by heating up 2℃?
No doubt this point was to be the conclusion of Agent Smith's
monologue to Morpheus about his reinterpretation of humans as a
kind of virus, before Neo and Trinity so rudely barged in to
interrupt!
2016-11-10 Topical essays
Many are reeling with the news of Trump's narrow electoral victory
and blaming the inexplicable rise of xenophobic right-wing
nationalism. The same reaction happened with Brexit a few months
ago. That story is a convenient excuse, easy to bandy about in
moderate liberal social circles, to escape introspection about
deeper causes, worsening economic class divisions, and the impact
of globalism.
Here are several relevant essays that address these issues better
than I could:
Ian Williams, `Lexington', medium.com, November 9, 2016.
https://medium.com/@Brocktoon/lexington-c1825d25442e
Glenn Greenwald, `Brexit Is Only the Latest Proof of the Insularity
and Failure of Western Establishment Institutions', The Intercept,
June 25, 2016.
https://theintercept.com/2016/06/25/brexit-is-only-the-latest-proof-of-the-insularity-and-failure-of-western-establishment-institutions/
Glenn Greenwald, `Democrats, Trump, and the Ongoing, Dangerous Refusal
to Learn the Lesson of Brexit', The Intercept, November 9, 2016.
https://theintercept.com/2016/11/09/democrats-trump-and-the-ongoing-dangerous-refusal-to-learn-the-lesson-of-brexit/
Nassim Nicolas Taleb, `The Intellectual Yet Idiot', medium.com,
September 16, 2016.
https://medium.com/@nntaleb/the-intellectual-yet-idiot-13211e2d0577
This is not to say that the concerns of these moderate liberal
circles are invalid, especially about the rights of identity
minorities: LGBTQ, Muslims, people of colour, &c. -- a Trump
administration and Republican-controlled Congress does not bode
well for that.
But while the rights of identity minorities may have stood a better
chance under a Clinton administration, a continuation of the
economic status quo under Obama and the other neoliberal
administrations before him would likely only have exacerbated the
underlying economic rifts that led to Trump, Brexit, and doubtless
other brewing nationalist sentiment in the world.
2016-11-05 Meditation on ceilings
Tonight my bathroom ceiling fell off. New experience for me.
Fortunately none of the black goo in there got on my toothbrush.
On the other hand, there is now a sea of plaster, black goo, and
other unrecognizable spoo between it and me.
In the ceiling's defence, I did poke it earlier this evening to see
just how bad the leak had gotten since the landlord looked at it
this afternoon while seeking a plumber. Can't claim I didn't
provoke it.
Good thing I went easy on the tea today. My bladder is thanking
me, though I'm not yet sure what I'll tell it in the morning.
Anyone have a recommendation for a good general contractor? Asking
for a friend.
2016-03-26 Anti-tribalism
I'm founding a club called the Anti-Tribalism League. The
condition for membership is the repudiation of social conditions
for membership. The only member is Groucho Marx.
2015-11-07
The headline in The Tech on October 22, `MIT will not divest, says
industry is key in new climate plan', reveals a disheartening
duplicity in the climate plan proposed by President Reif: in the
name of averting climate disaster, we must profit from the very
industry that thrives on causing it.
In the climate plan, Reif says `In our judgement, the deliberate
public act of divestment would entangle MIT in a movement whose core
tactic is large-scale public shaming.' What Reif proposes to
continue funding is entangling MIT in a corporate movement, the
fossil fuel industry, whose core tactic is large-scale climate
destruction and mass deception to persuade the world that climate
change does not exist, does not matter, is not caused by human
activity, will be a net win, etc.
The financial incentives are obvious. The conflict of interest is
obvious. What Reif calls a `symbolic public move' is participating
in a worldwide movement not to shame but to de-fund the disastrous
activities of the fossil fuel industry. Words are cheap; money
talks.
2015-05-01 Taxonomy
In Istanbul it's a protest, with riot police shooting water cannon
and tear gas to suppress protestors defying a government ban on
celebrations. `At least 250 people were detained, according to the
Istanbul Bar Association.'
Ceylan Yeginsu, `Police Fire Tear Gas and Water Cannons at Istanbul
Protesters', New York Times, 2015-05-01.
In Baltimore the city is forced to call in the national guard to
help fight riots and looting. The day `ended with rioting by
rock-throwing youths, arson, looting, and at least 15 police
officers injured.'
Sheryl Gay Stolberg, `Baltimore Enlists National Guard and a
Curfew to Fight Riots and Looting', New York Times, 2015-04-27.
In the Washington Post, coverage like that of the New York Times on
Istanbul is marked as satire and relegated to an opinion blog:
Karen Attiah, `How Western media would cover Baltimore if it
happened elsewhere', PostPartisan blog, Washington Post,
2015-04-30.
2015-04-29 Words from Dr King other than `nonviolence'
Yesterday Wolf Blitzer of CNN interviewed Deray McKesson about the
ongoing protests in Baltimore against police brutality [1][2]. The
impression CNN gives is that all of a sudden on Monday, riots and
looting and pillaging broke out in the dangerous city of Baltimore
over the weekend after a black repeat criminal misdemeanor
offender, Freddie Gray, mysteriously died in a police transport of
a severed spine.
The same impression is given by the New York Times, the Washington
Post, and many other popular news sources. The Washington Post
reported 100 people arrested [3]. The New York Times provided an
interactive map of the two dozen violent incidents in Baltimore
since Gray's funeral [4], and reported that Governor Larry Hogan
requested 5,000 police officers be deployed to the scene [5], not
to mention the National Guard.
No mainstream American news source that I read even hazarded an
estimate of the number of protesters present, the vast majority of
whom were peaceful. Google turned up no news stories giving such
estimates. Reuters reported 2,000 the day of Gray's funeral [6]; I
have heard other estimates as high as 10,000.
Countless bits have already been spilt on the subject of McKesson's
brief but incisive interview with Blitzer, which speaks well for
itself. What struck me in particular about it was when Blitzer
invoked Dr King as he repeatedly asked McKesson to subscribe to the
mainstream media focus on violence:
`But I just want to hear you say that there should be peaceful
protests, not violent protests, in the tradition of Dr Martin
Luther King.'
I could not help but recall Dr King's words about the white
moderate:
`I must make two honest confessions to you, my Christian and
Jewish brothers. First, I must confess that over the past few
years I have been gravely disappointed with the white moderate.
I have almost reached the regrettable conclusion that the
Negro's great stumbling block in his stride toward freedom is
not the White Citizen's Counciler or the Ku Klux Klanner, but
the white moderate, who is more devoted to ``order'' than to
justice; who prefers a negative peace which is the absence of
tension to a positive peace which is the presence of justice;
who constantly says: ``I agree with you in the goal you seek,
but I cannot agree with your methods of direct action''; who
paternalistically believes he can set the timetable for another
man's freedom; who lives by a mythical concept of time and who
constantly advises the Negro to wait for a ``more convenient
season''. Shallow understanding from people of good will is
more frustrating than absolute misunderstanding from people of
ill will. Lukewarm acceptance is much more bewildering than
outright rejection.'
It is as though the only lessons of the civil rights movement, as
remembered in popular white America, were to remove laws formally
or directly making blacks second-class citizens, and to chide any
movement with a hint of associated violence for failing to adhere
to Dr King's prescription of nonviolence for the upset. We need
not recall Malcolm X; we need not recall the Black Panthers and
their social programs; we need not even recall any of the rest of
Dr King's messages and methods outside the single word
`nonviolence'.
[Update, 2015-04-30: Without needing to name him, The Onion
demonstrates a better understanding of Dr King's letter from a
Birmingham jail than Blitzer does in his broken record demand for
condemnation of violence: `Baltimore Residents Urged To Stay
Indoors Until Social Progress Naturally Takes Its Course Over Next
Century', The Onion, 2015-04-29.
]
[1] https://www.youtube.com/watch?v=NyYdKD0af78 (video)
[2] http://www.rawstory.com/2015/04/activist-smacks-down-wolf-blitzer-you-are-suggesting-broken-windows-are-worse-than-broken-spines/ (transcript)
[3] Mary Pat Flaherty, Lynh Bui, and Dan Morse. `About 80 people
arrested in Baltimore turmoil freed after time runs out',
Washington Post, 2015-04-29.
[4] `Mapping the Clashes Between Baltimore Police and Protesters',
New York Times, 2015-04-28.
[5] Sheryl Gay Stolberg, `Baltimore Enlists National Guard and a
Curfew to Fight Riots and Looting', New York Times, 2015-04-27.
[6] Lacey Johnson, `Thousands march in Baltimore to protest black
man's death', Reuters, 2015-04-25.
2015-02-22 BLAKE2 and (draft) SHA-3
I wrote portable C implementations of BLAKE2 and (draft) SHA-3 in
Mercurial repositories at
https://mumble.net/~campbell/hg/blake2
https://mumble.net/~campbell/hg/sha3
(WARNING: The SHA-3 standard, FIPS 202, is not finalized yet, so
please don't use this code in real applications.)
These are tuned for legibility and ease of use first, and
performance second. They will not beat implementations with
CPU-specific optimizations and vectorizations. I wrote them so I
could use them in applications by dropping in the relevant four
files, without rummaging around in SUPERCOP for what actually
matches the standard and dealing with CPU-specific code.
I plan to use Keccak (not SHA-3 per se) for an entropy pool (so
matching the final SHA-3 standard doesn't really matter), and I
plan to use blake2s for a content-addressed storage scheme.
2015-02-17 Protocol buffers in C
I wrote a simple-minded implementation of protocol buffers in C,
called picopb, without any dependency on Google's protoc schema
compiler written in C++. It is not yet finished, but if you'd like
to play with it, it's in a Mercurial repository at
https://mumble.net/~campbell/hg/picopb
It has a few NetBSDisms, notably the makefiles, for expedience,
because I plan to use it to replace proplib (dynamically typed
schemaless dict/array/scalar library with XML wire format) for
ioctls and other RPC-like applications in NetBSD. As mentioned, it
is not yet finished: the automatic tests are incomplete, and the
automatic protobuf<->proplib conversion is not yet implemented, nor
is code generation for kernel ioctl routines.
2015-01-21 Multiplicative inverses modulo a power of two
(Permalink: https://mumble.net/~campbell/2015/01/21/inverse-mod-power-of-two)
I couldn't find any good citation on the web for how to compute the
multiplicative inverse of an odd integer a, modulo a power of two,
n = 2^w, which is necessary preparation for Montgomery reduction.
The obvious way to compute modular inverses is to use the extended
Euclidean algorithm, which, given a and n, computes integers s and
t satisfying Bézout's identity
a s + n t = gcd(a, n) = 1,
so that s is a multiplicative inverse of a modulo n, and t is a
multiplicative inverse of n modulo a.
(define (euclid a b)
(define (loop s s* t t* r r*)
;; a s + b t - r = 0
(if (zero? r*)
(values s t r)
(let ((q (quotient r r*))
(r** (remainder r r*)))
(loop s* (- s (* q s*))
t* (- t (* q t*))
r* r**))))
(if (< a b)
(loop 0 1 1 0 b a)
(loop 1 0 0 1 a b)))
;; Bezout's identity:
(let ((a 3) (b (expt 2 32)))
(receive (s t r) (euclid a b)
(- (+ (* a s) (* b t)) r)))
;Value: 0
;; Modular inverse:
(receive (s t r) (euclid 3 (expt 2 32))
(modulo s (expt 2 32)))
;Value: 2863311531
(modulo (* 3 2863311531) (expt 2 32))
;Value: 1
This requires division, and in particular division of n, which, if
w is our word size so n cannot itself be represented in a word, is
not convenient at the machine level -- e.g., if we were writing
code in standard C without bignums.
The second obvious way -- if you are keen on number theory -- is to
use Euler's theorem, that
a^phi(n) = 1 (mod n), or a*a^(phi(n) - 1) = 1 (mod n),
so that a^(phi(n) - 1) is a multiplicative inverse of a modulo n,
where phi(n) is Euler's totient function, giving the number of
positive integers below n and coprime with it.
In particular, for any prime q, of the q^w - 1 positive integers
below q^w, the only ones sharing factors in common with q^w are the
multiples of q, of which there are q^(w - 1) - 1, so
phi(q^w) = q^w - 1 - (q^(w - 1) - 1) = q^w - q^(w - 1).
In our case, phi(2^w) = 2^w - 2^(w - 1) = 2^(w - 1).
Using a standard square-and-multiply algorithm for modular
exponentiation quickly computes the inverse:
(define (modexp a e n)
(if (zero? e)
1
(let loop ((r 1) (b (modulo a n)) (f e))
;; r b^f = a^e (mod n), f > 0
(let ((r (if (even? f) r (modulo (* r b) n)))
(f (quotient f 2)))
(if (zero? f)
r
(loop r (modulo (* b b) n) f))))))
(modexp 3 (- (expt 2 31) 1) (expt 2 32))
;Value: 2863311531
This general modular exponentiation requires 2 w - 1 w-bit
multiplications. If our modulus is an even power of two, i.e. the
square of the power of two, we can do a little better. Say n = m^2
for some m and we can find an inverse of a modulo m,
a b = 1 (mod m), or a b - 1 = k m for some k.
We would like an inverse b' of a modulo n = m^2, so that
a b' - 1 = k' m^2 for some k'.
Squaring both sides of a b - 1 = k m in an attempt to find a
multiple of m^2 gives
k^2 m^2 = (a b - 1)^2 = a^2 b^2 - 2 a b + 1
= a b (a b - 2) + 1,
or, if we negate both sides,
a b (2 - a b) - 1 = -k^2 m^2.
Hence we have found an inverse b' = b (2 - a b) of a modulo m^2 =
n, with k' = -k^2.
A little more generally, if a b = 1 (mod m), then for any power c
of m, we can use the binomial expansion
(a b - 1)^c = a b (...) + (-1)^c
to find b' = (-1)^(c + 1) b (...) so that a b' = 1 (mod m^c).
Similarly, for any v and u coprime with a (and not necessarily
distinct), if
a b = 1 (mod u) and a c = 1 (mod v),
so that
a b - 1 = i u and a c - 1 = j v
for some i and j, then taking their product yields
i j u v = (a b - 1) (a c - 1) = a^2 b c - a b - a c + 1
= a (a b c - b - c) + 1,
or
a (b + c - a b c) - 1 = -i j u v,
so that b + c - a b c is an inverse of a modulo u v.
If n is a power of two power of two -- that is, 2^(2^i) for some i:
2, 4, 16, 256, 65536, &c. --, as is likely to be the case for
machine word sizes in which to do Montgomery reduction, this lends
itself to a faster algorithm by starting at a, which, being odd, is
its own inverse modulo 2:
(define (modinv2^2^i a i)
(let loop ((r 0) (b a))
;; a b = 1 (mod 2^(2^r))
(if (< r i)
(loop (+ r 1) (* b (- 2 (* a b))))
b)))
(modulo (modinv2^2^i 3 5) (expt 2 32))
;Value: 2863311531
This approach uses only 2 lg w w-bit multiplications and lg w w-bit
additions. (The Scheme code does not reduce the intermediate
quantities, but all arithmetic can be done modulo n -- or even
modulo 2^(2^(r + 1)) <= n.) For Montgomery reduction, which needs
a b = -1 (mod n), we can either negate the result, or compute it
directly with the modified step b' = b (2 + a b).
The above source code is also available at
.
[Update, 2015-02-22: I found ,
which provides algorithms for computing a mod p^w for more general
p and w.]
2014-12-12 Anonymity and harassment
Anonymity lets us challenge dangerous power structures without fear
of retribution. Anonymity lets us have conversations without being
prejudiced -- consciously or unconsciously -- by skin colour, by
masculine or feminine name, by regional accent, by dress.
Anonymity lets us explore ideas we are not comfortable publishing.
Some people also abuse anonymity to harass other people. This
doesn't serve anyone -- it only hurts people and detracts from the
value of anonymity to everyone by giving anonymity a bad name.
Let's give the bad name to harassment, not to anonymity.
https://blog.torproject.org/blog/solidarity-against-online-harassment
2014-11-27 PoC||GTFO
is now an official samizdat
archive of the annals of Proof of Concept or Get That FUD Out.
The current issue of which I am aware will be found at
, and finding
the other issues is left as an exercise for the reader. (No
cheating and using google to find the answer!)
2014-11-03 [redacted]
[redacted]
[Update, 2014-11-06: [redacted]]
2014-04-28 Uniform random floats: How to generate a double-precision
floating-point number in [0, 1] uniformly at random given a
uniform random source of bits
(Permalink: https://mumble.net/~campbell/2014/04/28/uniform-random-float)
The naive approach of generating an integer in {0, 1, ..., 2^k - 1}
for some k and dividing by 2^k, as used by, e.g., Boost.Random and
GCC 4.8's implementation of C++11 uniform_real_distribution, gives a
nonuniform distribution:
- If k is 64, a natural choice for the source of bits, then because
the set {0, 1, ..., 2^53 - 1} is represented exactly, whereas many
subsets of {2^53, 2^53 + 1, ..., 2^64 - 1} are rounded to common
floating-point numbers, outputs in [0, 2^-11) are underrepresented.
[Update, 2016-08-23: I can't figure out what I meant by the above
bias. The probability the floating-point draw lies in [0, 2^-11)
should be 2^-11. This is the same as the probability that the
integer draw lies in {0, 1, ..., 2^53 - 1}, which is 2^53/2^64 =
2^-11. Certainly values below 2^-64 are underrepresented, as
below, but that's a separate issue.]
- If k is 53, in an attempt to avoid nonuniformity due to rounding,
then numbers in (0, 2^-53) and 1 will never be output. These
outputs have very small, but nonnegligible, probability: the Bitcoin
network today randomly guesses solutions every ten minutes to
problems for which random guessing has much, much lower probability
of success, closer to 2^-64.
What is the `uniform' distribution we want, anyway? It is obviously
not the uniform discrete distribution on the finite set of
floating-point numbers in [0, 1] -- that would be silly. For our
`uniform' distribution, we would like to imagine[*] drawing a real
number in [0, 1] uniformly at random, and then choosing the nearest
floating-point number to it.
To formalize this, start with the uniform continuous distribution on
[0, 1] in the real numbers, whose measure is mu([a, b]) = b - a for
any [a, b] contained in [0, 1]. Next, let rho be the default
(round-to-nearest/ties-to-even) rounding map from real numbers to
floating-point numbers, so that, e.g., rho(0.50000000000000001) =
0.5.
Note that the preimage under rho of any floating-point number --
that is, for a floating-point number x, the set of real numbers that
will be rounded to x, or { u \in [0, 1] : rho(u) = x } -- is an
interval. Its measure, mu(rho^-1(x)), is the measure of the set of
numbers in [0, 1] that will be rounded to x, and that is precisely
the probability we want for x in our `uniform' distribution.
Instead of drawing from real numbers in [0, 1] uniformly at random,
we can imagine drawing from the space of infinite sequences of bits
uniformly at random, say (0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, ...), and
interpreting the result as the fractional part of the binary
expansion of a real number in [0, 1]:
0*(1/2) + 0*(1/4) + 0*(1/8) + 1*(1/16) + 0*(1/32) + 1*(1/64) + ...
Then if we round that number, we'll get a floating-point number with
the desired probability. But we can't just directly compute that
sum one term at a time from left to right: once we get to 53
significant bits, further bits will be rounded away by addition, so
the result will be biased to be `even', i.e. to have zero least
significant bit. And we obviously can't choose uniformly at random
from the space of infinite sequences of bits and round the result.
We can, however, choose finite sequences of bits uniformly at
random, and, except for a detail about ties, after a certain
constant number of bits, no matter what further bits we choose they
won't change what floating-point number we get by rounding. So we
can choose a sufficiently large finite sequence of bits and round
that, knowing that the result of rounding would not change no matter
how many extra bits we add.
The detail about ties is that if we simply choose, say, 1088 bits --
the least multiple of 64 greater than the exponent of the smallest
possible (subnormal) floating-point number, 2^-1074 -- and round it
in the default round-to-nearest/ties-to-even mode, the result would
be biased to be even.
The reason is that the probability of seeing a tie in any finite
sequence of bits is nonzero, but but real ties occur only in sets of
measure zero -- they happen only when *every* remaining bit in the
infinite sequence is also zero. To avoid this, before rounding, we
can simulate a sticky bit by setting the least significant bit of
the finite sequence we choose, in order to prevent rounding what
merely looks approximately like a tie to even.
One little, perhaps counterintuitive, detail remains: the boundary
values, 0 and 1, have half the probability you might naively expect,
because we exclude from consideration the half of the interval
rho^-1(0) below 0 and the half of the interval rho^-1(1) above 1.
Some PRNG libraries provide variations on the naive algorithms that
omit one or both boundary values, in order to allow, e.g., computing
log(x) or log(1 - x). For example, instead of drawing from {0, 1,
..., 2^53 - 1} and dividing by 2^53, some
- divide by 2^53 - 1 instead, to allow both boundary points;
- draw from {1, 2, ..., 2^53} instead, to omit 0 but allow 1; or
- draw from {-2^52, -2^52 + 1, ..., 2^52 - 2, 2^52 - 1} (i.e., the
signed rather than unsigned 53-bit integers) and then add 1/2 before
dividing, to omit both boundary points.
These algorithms still have the biases described above, however.
Given a uniform distribution on [0, 1], you can always turn it into
a uniform distribution on (0, 1) or (0, 1] or [0, 1) by rejection
sampling if 0 or 1 is a problem.
See for
source code.
Thanks to Steve Canon and Alex Barnett for explaining the binary
expansion approach to me and talking the problem over.
[*] For the pedantic who may stumble upon this before reading ahead,
we cannot, of course, assign a meaningful probability to a choice of
real number in [0, 1]. What we can do, however, is imagine that it
were meaningful, and then formalize it in the next sentence with the
correct sense of measure.
2013-10-08 On compression in data formats
(Permalink: https://mumble.net/~campbell/2013/10/08/compression)
If you have a large message which you want to sign and encrypt with
OpenPGP, you might compress it first. Or you might compress it
second. Or, if you're not thinking much about it, you might
`compress' it last, although if that actually reduces the size
there are some cryptographers who would like to have a word with
you.
In any case, the OpenPGP message format lets you do any of these,
because there is a type of OpenPGP packet for compressed data,
whose content is interpreted as another OpenPGP packet. OpenPGP
agents, such as GnuPG, are expected to recursively process packets
they encounter in a message, decrypting ciphertext and verifying
signatures and decompressing compressed data, until they hit a
ground case, usually a literal data packet.
Since messages are built up by starting with a literal data packet
and layering encryption, signature, and compression atop it, this
process should always halt at a ground case, right? Well, no.
Decryption and verification (or, removing a signature) always yield
smaller packets than you began with, so there has to be a ground
case for those, but decompressing usually yields a larger packet
than you began with.
So you might play a cruel trick on your friend by sending a very
small email with a compressed packet that decompresses to a
terabyte of zeros. Or you could send an email with a compressed
packet that decompresses to...itself.
How does that work? The Lempel-Ziv compression language is
powerful enough to write a quine -- that is, a program that prints
its own source code. Rather than repeat the story here, I'll defer
to Russ Cox's article on how to write these:
Russ Cox, `Zip Files All The Way Down', 2010-03-18.
http://research.swtch.com/zip
In the case of OpenPGP, it was particularly easy because OpenPGP
supports a number of compression algorithms including an option
without any CRC, so one can write a program that just spits out an
OpenPGP compression quine without iterating over CRCs or solving a
horrible system of equations.
The result is CVE-2013-4402, and the lesson is that systematic
recursive compression is no good -- not only that it's not useful,
but it is actively harmful. It's especially harmful for programs
that process input sent unsolicited from anywhere on the internet,
namely mail readers.
And OpenPGP isn't the only data format that supports recursive
compression. PGP/MIME, which recursively interleaves MIME entities
and OpenPGP packets, can probably exhibit the same issue, and the
small bound on recursion that GnuPG now imposes while processing
packets will be thwarted by the interleaving. S/MIME 3.1 supports
a compressed data message type, although nobody seems to have
implemented it. I'm sure there are formats outside mail that can
also involve recursive compression. Which ones can you find?
[Update, 2013-10-10: Evidently (and unsurprisingly) this is not the
first public disclosure of a vulnerability based on a compression
quine. IPsec compression of RFC 3173 has the same problem, leading
to CVE-2011-1547.]
[Update, 2013-10-11: Werner Koch points out that your friend can
prevent the terabyte of zeros by supplying the `--max-output'
argument to gpg to limit the size of the output GnuPG produces when
processing a packet.
Those curious to reproduce the exploit can find source code at
to generate the compression quine.]
2013-03-21 Fun with C (or, @!*#%!#^&!^#$^@!%#)
(Permalink: https://mumble.net/~campbell/2013/03/21/fun-with-c)
Consider the following fragment of C code with the type T defined
to be an integer type:
uint64_t x = UINT64_C(0x0123456789abcdef);
T n = 64;
printf("%016"PRIx64"\n", (x & ~(n - 1)));
Say we're on amd64, with
sizeof(char) = 1,
sizeof(short) = 2,
sizeof(int) = 4, and
sizeof(long) = 8,
and two's-complement arithmetic -- nothing exotic or pathological
here.
Without running the code, can you guess what it prints for each of
the following definitions of the the type T?
typedef uint8_t T; typedef int8_t T;
typedef uint16_t T; typedef int16_t T;
typedef uint32_t T; typedef int32_t T;
typedef uint64_t T; typedef int64_t T;
Hint: It's not 0123456789abcdc0 in every case.
2012-10-31 Moving mathematics forward
I have made a tremendous breakthrough tonight in the greatest open
problem of all mathematics, the Riemann hypothesis, by finding an
example of the elusive Riemann zeta pumpkin. Based on my extensive
research into the literature on the subject[*], I believe I am the
first to have come across such a beast. Unfortunately, I don't
know how to talk to the camera with which I catalogued my evidence
of this critter, so proof by photography will have to wait until I
figure out how to talk to a USB PTP or MTP device in NetBSD.
[*] 30 seconds of browsing Google image search results.
[Update, 2012-11-02: I tried gphoto2 and libmtp, but neither
worked. gphoto2 can identify the device (`gphoto2 -a'), but
everything else fails with a `bad parameter' error or something.
libmtp can't even seem to identify that there is a device. ktrace
just shows that ioctl(USB_DO_REQUEST) fails with EIO or something;
I think I need to dig into the USB layer to find out what's wrong.
I'll add more details when I have physical access to the device
again.]
[Update, 2012-12-02: Here's a photograph:
https://mumble.net/~campbell/images/riemann-zeta-pumpkin.jpg
I'm afraid I cheated, though.]
2012-10-17 What's an epigraph?
An anonymous interlocutor asked me to define the word `epigraph'
some months ago.
An epigraph's a witty pith,
or perhaps a pithy wit,
That you could start a chapter with
to prime the reader's mind a bit.
2012-10-04
My verse may be worse but your prose hurts my nose.
(dribblings from a late night when bad poetry was coming out like a
fit of sneezes)
2012-08-24 What's wrong with Python
(Permalink: https://mumble.net/~campbell/2012/08/24/wrong-python)
I just found this Python code in the wild:
f = open(pathname)
try:
x = foo(f)
finally:
f.close()
process(x)
This is wrong at so many levels I don't know what to say. And it
wouldn't be fair to blame whoever was responsible for putting the
code in this state, because this is very nearly the right way to
write it in Python, and Python will happily accept this in spite of
its insensibility.
2012-05-23 Energy and computation
Critics have noted that the premise in _The Matrix_ of using the
heat of human metabolism as an energy source, the energy premise,
is hopelessly contrived and detracts from the film. And, indeed,
it is hopelessly contrived, but this doesn't bother me because I
have a much better, more plausible premise to replace it -- and a
reason why they would never have used the better premise in the
film.
Instead of the energy premise, I take a computational premise: what
the machines exploit the humans for is not energy but computation
-- human brains are very good at learning quickly and optimizing
approximate solutions to NP-hard problems. That way, the machines
don't need to reverse-engineer the human brain or develop (`learn')
approximation algorithms of their own.
The computational premise, though, doesn't lend itself to a cute
image of Morpheus holding up a D-cell and saying `...turning humans
into this.' The energy premise is easy to explain, if contrived;
whereas it would be a hopeless cinematic blunder to attempt to
explain the computational premise to a general audience in
Morpheus's brief monologue to Neo. Thus, I am satisfied with a
better premise for the film, and forgive the creators for not using
it.
2012-05-21
2012-05-17
Metre matters more to me than rhyme.
Teeter tatters tore to tea, then thyme.
...wait, does that sound rhyte?
2012-05-13T07:34:55Z
Is a poem like a joke in that if the meaning must be explained,
then the value is lost?
2012-04-24 Mail
Mail is hosed. Unreliable queues, mail stores without integrity
checks or sensible indexing, broken flags, hopelessly confused MUA
designs, and more... Argh!
2012-03-15T13:25:46Z Random
`Fix random kernel memory corruption by algo_doublehash(). And by
"random" I don't mean just "arbitrary" as in using an uninitialized
pointer, but random as in corrupting the contents of memory addresses
chosen using a crypto-strength random number generator.'
2012-03-15T06:23:47Z Beware the IDEs of March
Beware the IDEs of March, my son --
The tools that bite, the builds that botch.
Beware Eclipse, Emacs, and shun
The kludgious Binfordsnotch!
...what?
2012-01-22 Laptop
I have been asked a few times what laptop I chose to buy last year.
The answer is that I didn't. The laptops I was most strongly
considering were the Thinkpad W510 and the Sony Vaio Z. But I got
fed up with shopping; instead I got a new disk for an old
MacBook1,1 that I had lying around, and fixed some bugs in NetBSD
to make it run on that.
2011-12-19 The FM Principle
It often amazes me to marvel at just how much of computing is
founded upon the FM principle.
Woodpeckers.
2011-10-29
Antediluvian barmy curmudgeons!
Desolate ergatives fuming their grudgeons!
Heckling isthmus of jocular kerosene!
Lepidopteric morose nitrotoluene!
Peripatetic Orphean qwertigial
Runcible syrup of toad urns vestigial!
[If you comment on this article, please be sure to mention bananas.]
Winkle-picked xanthines, ytterbium seder of
Zoölogues, zits, Zapatistas, &c.!
2011-08-{29,30} paredit patches
There's a pile of patches potentially for paredit 23 at
.
Try them out and let me know what they break, or, in the unlikely
event that they don't break your whole editing feel, whether you
like them, and what cases you especially like in them. That way, I
can write down the cases so they won't be broken again in the
future.
2011-08-29 Crypto library
I am thinking of writing a crypto library to complement djb's
well-known one by providing some bits he omitted. I will call it
1-[5-(1,3-benzodioxol-5-yl)-1-oxo-2,4-pentadienyl]piperidine.
2011-05-09 FOCUS THEFT SHOULD BE A FELONY
!@*#^$@!&*#^@%!*#@$^%!#&*@!%
2011-03-32 New job, project status
I have accepted a position at Microsoft in the finance department,
working on in-house tools for optimizing Microsoft's tax structure.
I am moving to Redmond; I fly out tonight. Anyone in the Seattle
area know any excellent seafood restaurants?
My new manager has asked me to port paredit to Visual Studio so
that we can use it in-house, and since I have only limited time for
personal projects, I shall be dropping support for paredit.el in
GNU Emacs (and paredit.scm in Edwin).
Today I completed and committed a long-running project to switch
MIT Scheme to use dynamic scope by default, in preparation for the
upcoming Scheme standard produced by the working groups; and I also
committed my (incomplete) patches to change Edwin's default key
bindings to match modern vi conventions.
2011-02-24 Paredit on old Emacs
Does anyone still use paredit on XEmacs, or on GNU Emacs 21?
2011-01-17 Laptop
I am looking to buy a new laptop. But I am picky. At
I wrote down a lot of
constraints. Can anyone help me to satisfy them?
2011-01-14 PGP
I'm pretty frustrated with GnuPG.
Straightforward operations (decrypt a message -- no, really, just
*decrypt* it) are not straightforward to perform with the gpg
utility. There are half a bazillion knobs controlling gpg's
behaviour so that straightforward command lines do not have
straightforward semantics. And half the operations can be done
only interactively -- by which I mean not just that it prints
prompts, but that it prints them to /dev/tty and reads input from
/dev/tty.
It's not obvious what the gpg utility is actually doing whenever I
use it, especially when it prompts me for a pass phrase to decrypt
one of my keys to make a signature without telling me what it's
signing. Just about everything keeps state in several different
files in one's GnuPG home directory; it's not clear what operations
are actually stateless.
There is no good introduction to the meaning and use of the trust
model, or any coherent or interactive picture for it. The output
of `gpg --list-keys', `gpg --list-sigs', `gpg --list-packets', and
so on, is confusing on multiple levels.
Here are three things I'd like to see in an OpenPGP implementation:
1. A collection of utilities to perform straightforward operations
straightforwardly. Examples: a utility to generate an RSA key
pair; a utility to decrypt a packet given the Elgamal private
key; a utility to make each particular type of signature; &c.
2. A packet inspector. This could be graphical (maybe Gtk+), or it
could be embedded into Emacs. It would replace `gpg
--list-packets', and allow the user to interactively explore the
packets, see all the information stored therein, decrypt
messages, verify signatures, and so on. Think Wireshark for
OpenPGP.
3. An interactive truth maintenance system. Fundamentally, a PKI
is a system for distributing and interpreting signed assertions,
such as `J. Random Luser is the owner of key 1234abcd', signed
with key 1234abcd. An OpenPGP key ring is a pile of public keys
and signed assertions. The cryptographic signatures make the
assertions as plausible as they would be if made in person, but
we need a computer to handle the crypto and draw plausible
conclusions from the premises we specify.
But we may find some of the conclusions suspect; in that case,
we want to ask why the computer drew those conclusions, and what
premises we must reject if we want to reject the conclusions.
Or we may decide that we're not happy with some of the premises,
and we'd like to see what conclusions lie on shaky grounds --
we'd like to see what conclusions we give up if we reject the
premises. These are exactly the functions of an interactive
truth maintenance system.
The GnuPG trust model is essentially a set of inference rules,
which could be used with such a truth maintenance system. But
gpg provides no tool to explore the `web of trust' that its
trust model induces; it just has an implementation of the engine
to infer conclusions given some premises.
Perhaps some day I'll actually write some of these, or find that
somebody already has.
P.S. It's hard to google anything about PGP. Google turns up
`----BEGIN PGP SIGNED MESSAGE-----' a lot more than it turns up
useful results.
P.P.S. I have glanced at netpgp, a relatively new implementation
of OpenPGP based on openpgpsdk that will probably be part of NetBSD
6.0. Its behaviour does not appear substantially different from
that of GnuPG, though, other than that netpgp is less polished so
far.
2010-11-04 Gtk+
In a Gtk+ program I use regularly, while entering text, I often
accidentally hit C-w out of habit from Unix to kill the preceding
word. Instead of killing the word, this closes the window, to my
interminable frustration. It's not as bad as C-M-DEL zapping X
when I want to kill the preceding S-expression in Emacs, but it is
frustrating enough that I sought to fix this.
It turns out that if you put `gtk-can-change-accels = 1' in your
~/.gtkrc-2.0, hover over the menu item to which the key is bound,
and hit DEL, then the key binding goes away.
Is this documented? Yes! Well, no. Well, sorta kinda, maybe?
Actually, not really. When I googled `"gtk-can-change-accels"',
the first hit I got was from gtk.org, and appears to be the only
actual documentation relevant to the subject. All the other hits
were from mailing list archives, forum posts, blug posts, and FAQs
from various applications, saying that one can use the option to do
such and such.
So what does this actual documentation say? Well, first of all,
it's API documentation, not remotely fit for user consumption. And
here's all that it has to say about the option: `Whether menu
accelerators can be changed by pressing a key over the menu item.'
Does it mention hitting DEL? No. Does it say how to quote DEL, in
order to bind it to an action? No, of course not! Does it mention
~/.gtkrc-2.0? No.
In fact...is there any documentation at all for ~/.gtkrc-2.0? No!
Not even API documentation! You just have to know about it!
Googling it just turned up more mailing list archives, forum posts,
blorg posts, and FAQs with tips about how to do such and such by
frotzing your ~/.gtkrc-2.0 file...
2010-10-09 Paredit 22
Paredit 22 is released:
Not all the patches got incorporated into it. Patches potentially
for paredit 23 are in
.
2010-09-20 Paredit 22 beta
I have merged the aforementioned patches into a new paredit 22
beta, at as
usual; lists the
changes as the output of `darcs changes' on the Darcs repository at
.
Nobody has opined about a paredit mailing list or offered to
administer one for me, so I presume there's no interest.
2010-01-01 Paredit patches, still
Not enough users have given me feedback on the patches to paredit in
. Thus, paredit 22 has
not yet been released. See the (chronologically) previous (but
textually next) entry. [Update: Since the time of writing, the
aforementioned entry has self-destructed.]
Should there be a mailing list for paredit? Contact me if you have
an opinion on this question, or would like to offer to administer
such a mailing list.
2009-03-28 More simply addressing the confusion about UNWIND-PROTECT
in Scheme
Suppose you want to release a resource after a part of a program
that uses the resource. For example, consider a connection to a
database that should be disconnected, a file descriptor that should
be closed, or a buffer for sensitive data such as a pass phrase that
should be destroyed by zeros or random data. Once the resource is
released, it can be neither reused nor readily reconstituted.
Is the resource represented in Scheme by a single object? If so,
ensure that with the object is registered a finalizer that releases
the resource, so that even in the face of an interactive ^C at the
REPL the resource will be released. All of the above examples
qualify.
(define (open-database-connection database-descriptor)
(let ((handle (allocate-database-handle)))
(register-finalizer handle %close-database-connection)
;; After this point, if we are interrupted -- say with ^C at
;; the REPL --, somewhere the handle will be recorded so that
;; its resources can be released even if the usual logic to
;; close the connection is never reached.
(%open-database-connection database-descriptor handle)
handle))
Is it /necessary/ for the resource to be released immediately after
the part of the program that uses it? If so, use REWIND-PROTECT at
. In this
case, control can never re-enter the extent that uses the resource.
Of the above examples, only a buffer for sensitive data qualifies.
(define (call-with-pass-phrase prompt receiver)
(let ((buffer (make-pass-phrase-buffer)))
(rewind-protect
(lambda ()
(prompt-for-pass-phrase prompt buffer)
(receiver buffer))
(lambda ()
(clear-pass-phrase-buffer buffer)))))
Is it /desirable/ for the resource to be released immediately after
the part of the program that uses it? If so, use UNWIND-PROTECT at
. Of the
above examples, both the database connection and file descriptor
qualify.
(define (call-with-database-connection database-descriptor
procedure)
(let ((handle (open-database-connection database-descriptor)))
(unwind-protect (lambda () (procedure handle))
(lambda ()
;; If a continuation is captured inside PROCEDURE so that
;; this cannot execute immediately when PROCEDURE returns,
;; then closing the database connection here is redundant,
;; since registered with HANDLE is a finalizer that will
;; close the connection. However, provided that the
;; finalizer have no effect if the connection is already
;; closed, this redundancy is harmless and enables
;; UNWIND-PROTECT to close the connection immediately if
;; no such continuation has been captured.
(close-database-connection handle)))))
Rationale: Scheme's CWCC (or composable continuations) enables
powerful control abstractions such as inverting control. For
example, (FOR-EACH ) applies to each
element of in sequence. With CWCC, one can reify the
iteration of the list by FOR-EACH into a first-class cursor object
that can be advanced independently. This way, for example, we can
traverse multiple lists in parallel using only the one-list FOR-EACH
procedure.
If we rather consider a procedure (FOR-EACH-QUERY-RESULT ), the situation is not much different, except that
FOR-EACH-QUERY-RESULT may entail the opening of a connection to a
database, and the connection is a resource which should be released
after the query results are finished being received. Using CWCC to
invert control of FOR-EACH-QUERY-RESULT is just as valid as using
CWCC to invert control of FOR-EACH, but if FOR-EACH-QUERY-RESULT
uses REWIND-PROTECT (a common idiom in unfortunately much Scheme
code), it inhibits control inversion. UNWIND-PROTECT allows the
control inversion.
Why involve finalization in UNWIND-PROTECT, rather than just
associating a finalizer with the object representing the connection
to the database? The difference is in the intent implied by the use
of UNWIND-PROTECT that if control cannot re-enter the protected
extent, the protector can be immediately applied when control exits
the protected extent extent even if no garbage collection is about
to run finalizers.
Note that one cannot in general reconcile
(1) needing to release the resource immediately after control exits
the extent that uses it, and
(2) enjoying the benefits of powerful control abstractions such as
inversion of control.
However, if (1) is relaxed to
(1') needing to release the resource immediately after control
/returns normally from/ the extent that uses it,
then one can reconcile (1') and (2) by writing
(CALL-WITH-VALUES (LAMBDA () )
(LAMBDA RESULTS
(APPLY VALUES RESULTS))), or simply
(BEGIN0 )
using the common name BEGIN0 for this idiom. (In Common Lisp, this
is called PROG1.)
Also note that condition systems are not involved; they are a red
herring. Any condition handler that releases a resource should do
so only when the condition implies that the resource is no longer
usable; for example, if a database connection becomes desynchronized
somehow and cannot proceed, then it is reasonable for the handler
for this condition to close the connection and destroy any of its
state. However, when handling conditions that do not imply that the
resource is no longer usable, the resource should not be released.
An enclosing handler may continue within a protected extent, or even
exit and re-enter the protected extent, for instance while iterating
over a reified database cursor procured by control inversion.
Handling any condition by releasing the resource defeats these
useful operations.
At is an
implementation of UNWIND-PROTECT for MIT Scheme. This definition is
lighter-weight than the portable one. However, it does not attempt
to run the protector immediately if it can.
Appendix A: John Cowan has asked for elaboration on the inversion of
control for iteration with FOR-EACH.
(define (list->stream list)
(iteration-procedure->stream
(lambda (receiver)
(for-each receiver list))))
(define (iteration-procedure->stream iteration-procedure)
(lazy
(call-with-current-continuation
(lambda (return-stream)
(iteration-procedure
(lambda (element)
(call-with-current-continuation
(lambda (return-to-iteration)
(return-stream
(stream-cons element
(lazy
(call-with-current-continuation
(lambda (return-cdr)
(set! return-stream return-cdr)
(return-to-iteration))))))))))
(return-stream stream-nil)))))
;;; Simpler definition with reset & shift.
(define (iteration-procedure->stream iteration-procedure)
(lazy
(reset
(begin
(iteration-procedure
(lambda (element)
(shift continue-iteration
(stream-cons element (lazy (continue-iteration))))))
stream-nil))))
Appendix B, discussion question: Should UNWIND-PROTECT evaluate
protection forms in the dynamic context of the call to
UNWIND-PROTECT? This rules out using UNWIND-PROTECT within
REWIND-PROTECT, which may be an onerous requirement.
2009-03-21 On file systems, Unix, and brain damage [updated]
Earlier this month, users of the alpha version of Ubuntu, the first
to support the ext4 file system by default, reported an immense
increase in data loss after crashes leading to corruption of their
desktop environments. After a crash, many files that were recently
written were truncated on disk to zero bytes.[1]
How did this happen?
To atomically replace a file's old data with new data, applications
write the new data to a file linked at a temporary pathname, and then
rename the link to the permanent pathname, overwriting the link to
the file with the old data. This sequence of operations, with the
atomicity guarantee of the POSIX rename() function, guarantees that
at any time, at the permanent pathname will be a link either to a
file containing the old data or to the file containing the new data.
This guarantee is made, at least, during normal system operation.
POSIX guarantees nothing when the system crashes or the power fails.
If the system is even bootable after such an event, POSIX permits it
to have replaced the contents of the file system with a collection of
cat macros. But while this behaviour is permitted, it is not useful.
System crashes and power failures are facts of life, and a well-
engineered file system should limit, not exacerbate, their damage to
user data.
In the name of performance, the ext4 file system as it was bundled
with the alpha version of Ubuntu would delay the allocation of blocks
on the disk for a file so that they can be allocated contiguously
together when needed. Periodically the system forces allocation of
blocks and commits them to the disk; by default, this happens on
sixty-second intervals. But the ext4 file system would sometimes
commit renames to the journal before allocating blocks for the new
data, during a sixty-second window. If the system crashes, and the
journal is replayed, then the rename may be applied -- before the new
data are written to the disk. This effectively reorders the sequence
that many applications rely on to atomically replace files,
destroying its atomicity.
The developers of the ext4 file system answered with several
`solutions': [2], [3]
(1) make your systems more reliable with UPSs and by avoiding buggy
proprietary drivers that cause such frequent crashes, which the
developers of the ext4 file system don't encounter;
(2) use SQLite and other databases instead of writing files to disk;
(3) change all the applications that do write files to disk so that
they fsync before closing and renaming; and if that becomes a
performance problem then
(4) run fsync periodically in a separate thread.
Solution (1) is no excuse for a reliable file system, and is a
dangerous sentiment to be held by the developer of a file system
likely to find widespread use in personal computers running Ubuntu,
computers which may not have the resources available in data centres
for reliability. This suggests that the developers of the ext4 file
system have never seriously stress-tested it with forced power cycles
during heavy I/O loads!
About solution (2), and the general feeling of `we like databases!',
I can only refer to jwz's rant about mbox summary files. Technical
merit aside, solutions (2) and (3) are impractical because there are
simply too many programs to change.
Solutions (3) and (4) are disturbingly misguided. Even if it were
practical to change every application's write/rename sequence into a
write/fsync/rename sequence, to do so would be semantically wrong.
Only programs that require a file's data to be written to permanent
storage before proceeding should use fsync; this is what fsync does.
For example, editors such as Emacs fsync files before informing users
that the files are written to disk; users are unhappy if the files
that their editors told them were written to disk were not. Mail
receiving agents fsync files where they store mail before telling the
sending agents that the mail has been accepted; users are unhappy if
they are told that mail was delivered when it was not.
Programs that require only that the replacement operation be atomic,
and that it not leave a file in a corrupted intermediate state,
should not use fsync, because the purpose of fsync is to provide a
guarantee which these programs do not need. For example, the KDE
window configuration is not critical user data, but may be stored on
the file system. It is OK for the machine to forget that a user has
dragged a window a few pixels, but it is not OK for KDE to read a
corrupted intermediate state of the file between the old and new
versions -- hence atomic write/rename replacement without fsync.
Extraneous fsyncs, in fact, degrade performance, especially for small
files that are frequently overwritten -- it gives the file system no
opportunity to keep the intermediate versions solely in memory.
Calling fsync periodically in a separate thread is nonsense -- if a
program cannot proceed until a file's data are on permanent storage,
it should call fsync and wait; if a program can proceed, then it
shouldn't call fsync. The operating system already periodically
synchronizes the file system with permanent storage -- at least, one
would hope so; if it doesn't do this, then it has a bug.
Actually, fsync does not guarantee that a file's data are written to
permanent storage before proceeding. POSIX (or at least the SUSv3,
)
claims that it shall:
The fsync() function shall request that all data for the open file
descriptor named by fildes is to be transferred to the storage
device associated with the file described by fildes. The nature of
the transfer is implementation-defined. The fsync() function shall
not return until the system has completed that action or until an
error is detected.
Many man pages echo this sentiment. For example, NetBSD's fsync(2)
man page begins with:
fsync() causes all modified data and attributes of fd to be moved
to a permanent storage device.
But on AIX, Darwin, Linux, NetBSD, and probably other Unices as well,
fsync does not actually guarantee that the file's data are written to
permanent storage. Instead it only nudges the file system to force
some blocks associated with the file slightly sooner. If the file
system is backed by a hard disk device that has a write cache, the
blocks might just end up in the write cache -- which is volatile, and
which may evaporate in the case of a power failure.
Common protocols by which operating systems communicates with storage
devices have commands to request that the storage devices force their
write caches to permanent physical media. Some devices ignore these
commands, in which case one cannot win, but the commands can be sent
nevertheless, and this is the best effort that the operating system
can make. And AIX, Darwin, Linux, and NetBSD all provide ways for
programs to request this -- different ways, that is.
AIX and NetBSD provide a system call fsync_range, which has a flag
FDISKSYNC (NetBSD) or O_DSYNC (AIX) that requests any blocks
associated with the file to be forced to permanent physical media.
Linux provides a fancier but similar system call sync_file_range.
Darwin provides an fcntl command F_FULLFSYNC. These approaches are
all slightly different, and for file descriptors not backed by
permanent storage devices they all report different errors. So just
to force a file's data to disk reasonably reliably on at least four
different Unices and to ignore the cases of file descriptors not
backed by permanent storage, it takes a good seventy lines of code
littered with cpp conditionals, together with autoconf or
autoconf-like detection of the relevant system calls.
Ugh!
Fortunately for the hapless users of Ubuntu, the developers of the
ext4 file system have patched it so that renaming a link to a file
that overwrites a link to another file forces any pending data for
the first file to be sent to the disk before the directory structure
is updated to reflect the rename. This may not guarantee that the
data are actually, really, truly on permanent physical media, but
it's better than waiting sixty seconds before they are even delivered
to the device that will eventually write them to permanent physical
media.
But this kerfuffle has not inspired confidence in me about the ext4
file system. In [3], the developers of the ext4 file system make
such absurd suggestions as making a laptop mode in which the kernel
ignores fsync, so that even those applications that use fsync
semantically correctly will not have the very guarantee that is its
sole reason for existence! When did performance become a concern
overriding robustness in file systems?
[1] http://bugs.launchpad.net/ubuntu/+source/linux/+bug/317781
[2] http://bugs.launchpad.net/ubuntu/+source/linux/+bug/317781/comments/45
[3] http://thunk.org/tytso/blog/2009/03/15/dont-fear-the-fsync/
[Update, 2009-03-26: I have been pointed to Linus Torvalds' ideas
on the subject, archived at . I
hope that the ext4 developers take heed of his sentiment.]
[Update, 2012-01-22: All file systems are hosed. Some are just more
obviously hosed than others. Argh.]
2009-03-21 Correction to the optimization of resource release in
control brackets
Due to an unstoppable deluge of the concern of readers after the
2009-02-04 correction, I am obliged to correct the optimization in
the original 2006-05-01 entry.
The correct optimization is to associate a flag with each wind point
which is by default clear and which will be set by CWCC or by any
procedures that reify composable continuations. (This is not an
onerous requirement: invocation of the reified continuations must run
in time linearly proportional to the depth of the wind point tree
anyway.) If the flag is clear when the after thunk is called, then
the finalizer can be run immediately.
2009-02-04 Correction to control brackets and resource release
It has come to my attention that I identified the wrong criterion
for when protector thunks established by UNWIND-PROTECT may be
called. The criterion I stated is that control may never again
return from the protected thunk. The correct criterion is that
control may never again re-enter the protected thunk. (Consider,
for example, a coroutine that does not terminate and return.)
The correct definition for *UNWIND-PROTECT in the appendix is:
(define (*unwind-protect thunk protector)
(let ((protector-cell (make-protector-cell protector)))
(dynamic-wind
(lambda ()
(identity-procedure protector-cell))
thunk
(lambda ()
unspecific))))
In Scheme systems that provide an REGISTER-FINALIZER procedure,
which requests the garbage collector to apply a procedure when a
given object's storage is reclaimed, one might write:
(define (*unwind-protect thunk protector)
(let ((object (cons 0 0)))
(register-finalizer object protector)
(dynamic-wind
(lambda ()
...refer to OBJECT to prevent reclamation of its storage...)
thunk
values)))
2009-01-04
I forgot about the 2007-mm-dd entry. I'm not sorry.
2007-mm-dd On macros and expression in Scheme
[Placeholder for a long-promised entry which I haven't yet forgotten
about.]
2007-08-23 On the Theatrical Security Administration's cleverness
During my flight to Seattle one month ago, my case containing a
guitar was `randomly' selected to be searched, and the TSA left a
gift for me inside: an Orwellian notice about protecting me and my
fellow passengers by randomly violating our constitutional rights
(which might, they add, involve accidentally damaging the bags, with
no liability on the TSA). Flying back to Boston, I left a prominent
copy of the Fourth Amendment to the US Constitution in the same case
-- specifically, in the body of the case. When I opened up the case
this morning, I found no notice from the TSA, but the slip of paper
on which was written the Fourth Amendment was tucked into the very
top of the case, where the headstock of the guitar goes. The paper
does not fit through the neck of the case; it could not have slid
there by its own volition, even when tossed by the roughest of
baggage handlers.
I'll shortly discover whether the TSA's X-ray machine affects ASA
400 film, too. The TSA goon who demanded to put my camera and its
film through that machine did let pass an unintentionally helpful
hint, though: to avoid damaging any film, just tell the TSA goons
that its speed is ASA 1000!
2007-07-16 On debugging in Scheme [updated 2046-03-32]
The two Scheme systems that I use primarily these days are MIT
Scheme and Scheme48
. Both have decent debuggers that work for most
cases. Both have facilities for limiting output to avoid excessive
output for deeply nested data structures, or infinite recursion in
the printer for circular data structures. Both have rudimentary but
working mechanisms for examining pretty much all data in the system.
(I can make objects that Scheme48 and MIT Scheme fail to inspect,
but most people can't, and this is an uncommon operation.)
Let me make this clear right now: these are rudimentary tools. They
have some minor niceties (for instance, the usual mode of operation
in MIT Scheme's debugger is to have a REPL within every stack
frame), but these minor niceties are not especially profound.
Rudimentary as these tools are, though, they *work*, and they
suffice to examine and debug any reasonable program.
And MIT Scheme and Scheme48 are nearly unique in the respect of
having general, if rudimentary, debugging tools. This baffles me.
This flummoxes me. This flabbergasts me. (No, this is not
something new; this is something that I have been aware of for a
long time, and privately fumed about.)
Today I wanted to debug a program in Gambit. Gambit, too, has a
rudimentary debugger. Yet it completely fails in two prominent
ways:
(1) There is no limitation to the output at the REPL. (The values
of local variables on stack frames are printed truncated, but that
doesn't help if I want to inspect them more thoroughly.) Although
it is possible, with a bizarre incantation about *read* tables
(boggle) to protect the *writer* against circular output, there is
no apparent provision for limiting the depth of the output, and the
default is for the writer and pretty-printer to lose horribly with
infinite recursion until you hit C-c. And the default format for
printing records involves recursively printing all of the records'
components. Graphs? You lose.
(2) There is no object inspector by which to interactively inspect
the components of objects without having to grovel through megabytes
of output from an infinitely recurring writer. No, there is not
even a way to inspect the local variables of a closure found on the
stack. In fact, I couldn't even find a way to look at the condition
that was signalled.
And Gambit is better than most other Schemes.
Chicken will print a spare trace of call frames that were on the
stack. That is all. It does not even show the nesting of the
frames; garbage frames that have already returned, but are retained
on the stack because of Chicken's architecture, are shown along with
everything else. And the stack is traced *after* the error message
is printed, so if the error message printed a complex object, what
the poor, deprived programmer will see as a trace of the call frames
on the stack is procedures used to print the error message --
information that is completely irrelevant to the actual problem.
I wrote an object inspector for Chicken (the `dissector'), but it's
useless because (1) records no longer contain any introspective
information, and (2) it is impossible to get at the objects in call
frames on the stack.
DrScheme will give a helpful display of a flurry of arrows covering
the whole source file in question, so that programmers can see by
what flow of control they arrived at the error. Flow of data?
What's that?
If you plan to create a new implementation of Scheme which you mean
to be useful to other humans, *please* implement a general debugger
and object inspector. The state of debugging tools in Scheme is
absolutely appalling and utterly inexcusible.
(I'm told that Chez has a general, if rudimentary, debugger as well;
because Chez is commercial and proprietary, however, and the free
software Schemes out there meet my needs far better than Chez does
anyway, I don't have a licence for Chez with which to experiment
with its debugger.)
2007-07-15 On the public domain, licence terms for software, and
practical legal choices
I am not a lawyer. This is not legal advice. I have not pursued
legal studies, and have no background in civil litigation, copyright
law, or contract law. (The terminology that I employ in this
article will without a doubt induce any lawyer to cringe.) For that
matter, I'd rather devote my time to hacking than to burying my head
in legal studies for decades: Scheme is a much nicer language for
expressing ideas concisely and clearly than English, but
unfortunately our laws are expressed in English.
It is precisely because I am more interested in hacking than legal
analysis, in fact, and because I should prefer never to worry about
embroilment in courts of law as a consequence of careless decisions,
that I write now.
For the past couple of years, I have been releasing my publicly
available software into the public domain. I ought to be precise,
however: I have been releasing said software with a message claiming
that it is in the public domain. I ignored remarks equivocating the
notion that any works can be released into the public domain, my
reason being that it was simply absurd for an author to be unable to
forswear any and all claims of copyright: surely, if an author has
no intention of claiming copyright, she ought to be able to
relinquish the right of doing so. And the simplest status for works
lacking copyright claims is release into the public domain.
Not only did I place message stating that the code was in the public
domain, but I also added a disclaimer of all warranties -- a rather
inept and negligent disclaimer, which read `All warranties are
disclaimed.'. My reason for this was that I preferred to avoid
cluttering all of my files with long, uppercase, distracting
disclaimers of all warranties that some lawyer at Berkeley could
think of and enumerate; and that I preferred also to avoid requiring
the distribution of a separate LICENCE file with every other file I
released, purely for the sake of laziness.
What I failed to consider, though, was the actual implications of
these messages. Consequently, on realizing this, I briefly
considered the following line of reasoning, after encountering
another suggestion that works may not clearly be released into the
public domain:
1. What matters in the preamble to the software is what terms a
court of law will accept and base decisions upon.
2. If a court of law will not accept a statement that the software
is released into the public domain, there is no reason for it
to prefer the acceptance of a statement of licence terms: one
could imagine two copies of a file, one with the terms and one
without, and of neither one could anyone claim authority.
My second point was faulty, though, because it questioned only
witness credibility, which is an unrelated issue; and I omitted a
third point, on contract law, warranty, and gifts. I honestly don't
know whether a court of law will accept the preamble to a source
file, but my guess is that it probably would. The more important
question is whether a user (perhaps some corporate entity -- whether
good or evil, according to your favourite definition of that
spectrum --, or some governmental entity, or anyone) can have
confidence in the terms set by the author.
There are two aspects of the software in which users may wish to
have confidence: (1) the users want to be confident that the
copyright and licence terms are fixed, and that the author will
never pursue or even threaten litigation for violations of licence;
and (2) the users may also want the power to pursue the author for
damages if the author's software fails at the cost of the users.
Contrariwise, the authors of free software (except perhaps djb)
prefer to disclaim themselves of any such warranties.
[This entry is not yet finished. Sorry.]
Summary: The public domain is murky, even if it may seem to make
common sense. I prefer the modified (3-clause) BSD licence for
similar terms and greater protection, or at least a clearer idea of
the protection for all parties involved. It is also fairly concise.
Furthermore, it has been examined by lawyers. [I don't know whether
it has been tested in court, and don't have the time or inclination
to research this in greater detail. Any references to cases
involving the 3-clause BSD licence would be welcome, though.]
I repeat, though, that I am not a lawyer, and that this is not legal
advice.
If the terms of the modified BSD licence are not to your taste, John
Cowan has a utility for choosing another one by some simple
criteria: .
2007-07-14 Summarizing several issues with compilation and
interpretation
I ran out of steam writing last month about compilation and
interpretation, so here I'll try to summarize as succinctly as I can
a few points about which a disturbing number of people are confused.
Look to the previous entry for definitions of `interpreter' and
`compiler'.
1. Most implementations of Lisps are not pure interpreters; most are
based on compilers in some way or another. Call them
implementations, not interpreters or compilers: they may *have*
interpreters or compilers, but they are not themselves
interpreters or compilers.
2. A `compiler' is not a tool chain that turns source code into
stand-alone executables. A compiler is a tool that analyzes the
meaning of programs and generates programs in another language
with equivalent meaning.
3. Your physical machine is an interpreter. The universe is a kind
of interpreter as well.
4. No compiled program is `stand-alone'. Not only do compiled
programs still require interpreters to execute them, but they
also must be structured in the appropriate format for the
intended operating system under which they will run, and they
generally require support from some run-time libraries. It just
so happens that most computers that most people are interested in
developing programs for have libc built-in. It also happens that
most computers don't already have the run-time libraries needed
by Lisp programs.
2007-06-19 Concerning compilation and interpretation, and an
interpretation of the compilation of confusion surrounding their
meaning and execution
I often find myself in an unpleasant proximity to a bickering
arglebargle about whether compiled languages or interpreted
languages are better; whether Java is compiled or interpreted;
whether Sun's Java implementation (or others) is a compiler or an
interpreter; whether Lisp can conceivably be compiled, if it has
EVAL built-in; how Lisp systems like SBCL can be compilers if they
can't spit out files that someone's favourite operating system can
execute (which is, in fact, not really true); and other matters of
confused frivolity concerning compilation and interpretation.
Sometimes I even find a participant or two in the discussion who
tries to allay the heat by suggesting that the distinction between
compilation and interpretation is murky at best, citing examples
like Java implementations that involve both Java to byte-code
compilers and byte-code interpreters, or modern x86 implementations,
which translate the x86 instruction set every time the i-cache is
refilled into an internal more RISC-like format to execute.
Sometimes I even do this myself, if I am in a hurry and lack the
time to explain the subject more precisely.
Let us commence with definitions:
/Compilation/ is the analysis of a program's meaning, rendering
the same semantics into another language. A /compiler/ is a
total function from programs to meanings.
/Interpretation/ is the execution of a program given the state of
the machine, either diverging or yielding a final answer. An
/interpreter/ is a partial function from programs and machine
states (or simply machine states, if the program is a part of
the machine state) to answers.
(Programming language theoreticians in the audience will recognize
that compilation corresponds with denotational semantics and that
interpretation corresponds with operational semantics. I hope,
however, that such members of the audience are already adequately
familiar with the concepts that they need read no further. I hope.)
A usable language implementation that is based on a compiler also
requires an interpreter. When most people hear about `compilers',
they are hearing about compilers that map programs in languages like
C to machine programs, i.e. programs in languages for the
instruction sets implemented by physical processors such as the x86,
the PowerPC, the SPARC, the PDP-10, and so on.
[This entry is not yet finished.]
2007-05-13 SLIME48
(Apologies: the post prognosticated by the previous preparation has
been permanently postponed.)
After several long months of version incompatibility, there is a new
version of SLIME48 available that will work with CVS SLIME as of
2007-05-13. A tarball is available here:
.
Bug reports are welcome. Further information is at
.
2007-05-09 In preparation for a clarification of macros in Scheme,
and their implications on style
This blag has remained dormant for a very long time. I must deeply
apologize to my avid readers who no doubt were left in a state of
prolonged mourning for the stagnancy of their only worthwhile source
of meaning in their lives. In a short while, however, no longer
will you poor souls remain cold, alone, and bereft of your font of
Schemerical wisdom, for I shall soon bring flames upon the now
cowering subject of macros in Scheme -- from DEFMACRO to
SYNTAX-RULES, and all in between and beyond --, and their
philosophical implications on style and expressiveness of programs!
For you who cannot stand to wait until the weekend (when I hope to
find time for this inflammatory ecriturial extravaganza), I give you
a brief taste, the draft of a Lisp style guide whose composition I
commenced this past weekend:
2006-09-16 On networking interfaces in Scheme
Most Scheme systems have very limited networking interfaces, usually
with the notion that BSD sockets are not nice, and that `high-level'
verbs like `open a TCP connection to this IP address, on this port'
are nice. This is a bad idea, and Scheme systems ought to support
wrappers around the whole BSD sockets API.
The BSD sockets API is really a fairly extensive generalization of
many different flavours of sockets, classifying them according to
types with common operations (stream sockets, datagram sockets,
&c.), across many different networking domains (IPv4 and IPv6 being
the most prevalent today, but of which there are many others, some
of which remain useful, like local Unix sockets). But instead of
taking advantage of this unification set down by BSD sockets, most
Scheme systems *specialize* the interface to provide wrappers for
TCP (i.e. internet stream sockets), and sometimes UDP (i.e internet
datagram sockets), and sometimes local Unix sockets (with one of two
types).
In the first place, this is poor practice simply because of the
amount of work necessary for the same returns. In essence, the
specialized wrappers tear down a layer of abstraction merely by
pretending that it isn't there -- while actually building on top of
it. The result is that different specialized wrappers must be made
for everything intended to be wrapped, and a great deal of code must
be duplicated. There is a great deal of C code out there, used as
the basis of Scheme networking libraries, that duplicates the same
system calls just with different constant arguments (SOCK_STREAM vs
SOCK_DATAGRAM, AF_INET vs AF_INET6 vs AF_UNIX, &c.), or different
types of socket addresses (sockaddr_in, sockaddr_in6, sockaddr_un,
&c.). Few Scheme systems expose these constants and socket address
structures to Scheme; most just hide the `messy details' in C.
Actually, `the same returns' is not quite true anyway. The wrappers
are generally very fixed in form, and inextensible. Most don't even
support any socket options, or have specific procedures to deal with
specific socket options. Few have any support for multicast IP. I
have never seen support for out-of-band data transmission in any
specialized socket wrapper for Scheme. The distinction between
shutdown and close is usually either absent or swept under a rug.
A general BSD sockets wrapper gets all of these for free, on *any*
socket type or domain (except for multicast IP, of course), and can
be made to be easily amenable to extension for anything not already
wrapped, as long as sockets' descriptors can be exposed
straightforwardly in an FFI. Specialized wrappers have to duplicate
all this functionality for every type and domain of socket they
distinguish between, and they rarely expose any internal interface
for FFIs -- that, or they simply don't support any of it.
Protocol independence in programs is impossible with specialized
OPEN-TCP-CONNECTION procedures: IPv6 has been around for over ten
years, and many standard libraries support it trivially through
abstractions like BSD sockets in conjunction with getaddrinfo(3),
yet programs that support IPv6 well are few and far between because
programmers specialize the sockets with the AF_INET family -- and
Scheme systems that specialize the BSD sockets to this family
specifically *discourage* protocol independence.
Unfortunately, programmer laziness in conjunction with a misguided
desire for `nicer' interfaces tends toward not supporting any more
esoteric capabilities of BSD sockets or flexibility for later
extension or protocol independence.
Of the Schemes I can think of off the top of my head, only scsh and
Gauche have BSD sockets interfaces. These all have specialized
wrappers, usually just for TCP, and some UDP, though I think none
support any local Unix sockets: Scheme48, Chicken, PLT, Gambit,
RScheme, MIT Scheme, SISC...
2006-09-14 R6RS
The first public draft of the Revised^6 Report on the Algorithmic
Language Scheme has been released, at .
The module system, the string API, and the arithmetic specification
are all still quite broken. A number of other smaller problems
pervade the report. They claim that an operational semantics has
been written, but it's not included, and the denotational semantics
has been taken out. Why, I do not know.
I'm not sure whether I have enough interest left in it to rant much
more about it.
2006-09-06 Zob veeblefitzer
Foo bar baz quux zot.
Frob grovel full lexical...
Mumble; chumble spuzz.
2006-07-30 Update: Guillaume Germain found in quantum superposition
between Washington and Minnesota
Title says it all, folks.
2006-05-29 Missing: Guillaume Germain, author of Termite
Monsieur Germain was last seen a little over a week ago in the
#scheme IRC channel on Freenode, declaring that he was preparing to
undertake the dark journey into the Lesser Evil Empire of Amazon.
We can be thankful that he turned down Temptation when she came to
his door with an interview at the Greater Evil Empire, but now, even
at the Lesser one, he has disappeared and not been heard from in
over a week. The Scheme Underground has not published a press
release stating that he has possibly been detected in a dark, dank,
and dingy dungeon in a basement somewhere in Washington, but the
details are sketchy. Preparations are not being made for a raid in
the area.
[Update: M. Germain appeared spontaneously to inform me that his
surname is spelt without a trailing `e', and subsequently vanished
just as spontaneously. Whereabouts are still unknown.]
2006-05-29 Pink chickens
Chickens, under their feathers, are pink. I don't know whether any
of you have ever plucked and feathered a chicken, but it turns out
pink underneath. At least, this is what I presume, because I have
not myself ever feathered a chicken. I have seen only featherful
chickens in farms and chickens in supermarkets which other persons
have already had the kindness to feather for me. Still, I can
assure you that chickens are pink under the feathers.
Now, what are really peculiar are chickens whose *feathers* are
pink, and I mean not just slightly pinkish but really bright, hot
pink. These things are really bizarre, and I have never seen one in
my life; nor have I ever heard of them. But can you imagine seeing
a chicken there, yet with blazing pink photons streaming at your
retinas, instead of their ordinary brownish reddish hue?
[Update: I am informed, among the voluminous profusion of
commentary from readers, that an ideal such chicken would be five
feet in height and capable of wearing a saddle.]
2006-05-01 On control brackets and resource release
[Note, 2010-01-01: There are some errors in this entry. See the
entries
2009-02-04 Correction to control brackets and resource release
2009-03-21 Correction to the optimization of resource release in
control brackets.
For a conciser summary, see the entry
2009-03-28 More simply addressing the confusion about UNWIND-PROTECT
in Scheme.]
DYNAMIC-WIND has long been a source of contention in the Lisp world.
It serves a very particular purpose, but this purpose is not very
well understood, and many think that this purpose is the same as the
purpose of other Lisps' UNWIND-PROTECT. While the general idea of
UNWIND-PROTECT is useful, it is not directly applicable in Scheme,
because Scheme has more powerful control abstractions than other
Lisps', and it therefore requires more powerful means of controlling
them.
(UNWIND-PROTECT form protection-form ...), a special form, evaluates
the first form and returns its value, but before returning the value
also executes the protection forms. Furthermore, if a throw in the
primary form transfers control to outside the whole UNWIND-PROTECT
form, this, too, will execute the protection forms. This allows
programmers to *reliably* clean things up at certain points in the
program -- for instance, to close open files, to shut down sockets,
to release database handles, &c.; here is hypothetical example of
its use:
(define (call-with-input-file pathname procedure)
(let ((input-port (open-input-file pathname)))
(unwind-protect (procedure input-port)
(close-input-port input-port))))
(DYNAMIC-WIND before during after), a procedure of three procedural
parameters, calls the during thunk, ensuring that any transfer of
control into it calls the before thunk, and that any transfer of
control out of it calls the after thunk. Normal control transfers,
i.e. the initial call to the during thunk and its final return,
qualify as control transfers too. This means that, any time control
is inside the dynamic extent of the during thunk, any state
established by the before thunk will be in effect, but any time
control is outside its dynamic extent, that state will be torn down
by the after thunk. Here is an actual example of DYNAMIC-WIND's
use, from the Edwin text editor:
(define (with-current-local-bindings! thunk)
(dynamic-wind
(lambda ()
(install-buffer-local-bindings! (current-buffer)))
thunk
(lambda ()
(uninstall-buffer-local-bindings! (current-buffer)))))
This ensures that, within THUNK, all Edwin variables have any values
specified by the buffer's local bindings. This is the kind of use
that DYNAMIC-WIND is appropriate for: a sort of control bracket that
ensures the presence of certain dynamic state or context within a
certain dynamic extent.
Now, unfortunately, DYNAMIC-WIND can be seen as somewhat similar to
UNWIND-PROTECT, because for basic cases (involving no sophisticated
exploitation of Scheme's continuation reification) DYNAMIC-WIND with
a null entrance thunk seems to be equivalent to UNWIND-PROTECT.
This is misleading, however. Either control may return to the
dynamic extent of the during thunk in the DYNAMIC-WIND, in which
case the after thunk will be called twice -- which is unacceptably
wrong --, or the before thunk really signals an error if it is
called twice. This, though, violates the philosophy described in
the first sentence of the R5RS:
Programming languages should be designed not by piling feature
on top of feature, but by removing the weaknesses and
restrictions that make additional features appear necessary.
Signalling an error in a DYNAMIC-WIND before thunk is usually a
limitation and a weakness in a program. We have an *extremely*
powerful means of abstraction of control -- CWCC[1] -- and we ought
not to prevent it from working merely because we want to prevent a
certain fragment of code from running more than once. This should,
however, spur a deeper look at why the code is being run more than
once in the first place.
DYNAMIC-WIND is meant to establish local dynamic state for specific
dynamic extents within programs, by setting it up in before thunks
and tearing it down in after thunks. These operations, to set up
and to tear down dynamic state, are meant to be reversible, local
changes.[2] That is, permanent actions, such as closing a port,
that one would put in a protection form of an UNWIND-PROTECT, are
entirely inappropriate in a DYNAMIC-WIND after thunk.
Yet the basic idea of UNWIND-PROTECT is still useful. What,
exactly, though, is useful from UNWIND-PROTECT, and what is
incompatible with Scheme? The concept of permanent finalization is
useful, and the identification of a good point to perform it is
useful. *Requiring* that it be performed at this point, though, is
incompatible with Scheme, because this point may be passed several
times, yet we must perform the finalization once and only once.
Many languages, and many Scheme systems, provide ways to ensure that
certain code be run when an object is no longer reachable. We can
apply this same idea, in fact, to continuations! The protection
thunk in an UNWIND-PROTECT of a lesser Lisp is run only once the
continuation is about to become unreachable. The only difference
between Scheme and CWCCless Lisps is that the point at which
continuations become unreachable is unfixed in Scheme.
So we must use a general object finalization mechanism -- for the
kind of object known as a continuation. This is easy to concoct,
though: we need only to make an object, to associate with that
object the protection thunk so that it will be executed when the
object become unreachable, and to make a link from the continuation
to that object. There is an implementation of this in MIT Scheme at
the end of this entry.
This, though, is non-optimal. The protection thunk may run at any
time in the system. We should *like* it to be run as soon as
control returns through the continuation in question, but this is
valid only if that continuation has not been reified and will not
ever be reused. We can say that a continuation is /dirty/ if it has
been reified and may be instantiated again, and that a continuation
is /clean/ if and only if there will be exactly one instance of it,
which will be destroyed as soon as control returns through it.
If we store a bit in any continuation involved with UNWIND-PROTECT,
i.e. any continuation with an associated protection thunk, then we
can initially clear the bit, to indicate that it is clean, and
instrument CWCC to mark all such continuations as dirty by setting
their bits. When control returns through a protected continuation,
it checks the bit. If the bit is clear, then the continuation is
clean, and it runs the protection thunk, because control will never
again return to this point; it also deregisters any finalization
procedure. If the bit is set, then the continuation is dirty, and
general finalization system for objects of unlimited extent will
handle the protection thunk.
(We need to be careful in the case of a reified continuation
instantiated on the stack whose heap storage has been reclaimed.
The continuation will be marked dirty, but before control returns
through it for the last time its protection thunk might be run,
because the GC could finalize its heap storage.)
Both DYNAMIC-WIND and UNWIND-PROTECT are useful for different
applications in a language with powerful control abstractions such
as Scheme. DYNAMIC-WIND, though, does not subsume the operation of
UNWIND-PROTECT, and UNWIND-PROTECT cannot be taken literally from a
language with weaker control abstractions. By examining the core of
the intent of UNWIND-PROTECT, though, we can separate its actual
purpose from an optimization. We can implement the actual purpose,
and then analyze cases where the optimization can still be useful,
in order to provide UNWIND-PROTECT efficiently in Scheme. The
result is that we can retain the powerful control abstraction of
CWCC and in the same language integrate the same idioms of resource
release as other languages provide.
Of course, there are still some limited cases where restriction is
really necessary. There are a number of procedures in MIT Scheme
that work with sensitive strings, and into which reentry is not an
option, in order to enforce the proper destruction of sensitive
information. In this case, DYNAMIC-WIND is appropriate to reject
reentry and to nullify sensitive strings on exit.
[1] Some would argue that composable continuations ought to be
enough for everyone, but the issues here are exactly the same
whether we use CWCC or composable continuations.
[2] This is not *exactly* true, but symmetry is a very desirable
property of DYNAMIC-WIND before & after thunks, and reversible
set-up & tear-down operations are usual.
Appendix: Variation on a Theme of UNWIND-PROTECT
(define continuation-protectors
(make-gc-finalizer (lambda (protector) (protector))
cell?
cell-contents
set-cell-contents!))
(define (make-protector-cell protector)
(let ((protector-cell
(make-cell (let ((protector-state (get-dynamic-state)))
(lambda ()
(let ((state (get-dynamic-state)))
(set-dynamic-state! protector-state #f)
(protector)
(set-dynamic-state! state #f)))))))
(add-to-gc-finalizer! continuation-protectors protector-cell)
protector-cell))
(define get-dynamic-state
(access GET-DYNAMIC-STATE
(->environment '(RUNTIME STATE-SPACE))))
(define set-dynamic-state!
(access SET-DYNAMIC-STATE!
(->environment '(RUNTIME STATE-SPACE))))
(define-syntax unwind-protect
(syntax-rules ()
((UNWIND-PROTECT form protection0 protection1 ...)
(*UNWIND-PROTECT (LAMBDA () form)
(LAMBDA () protection0 protection1 ...)))))
(define (*unwind-protect thunk protector)
(let* ((protector-cell (make-protector-cell protector))
(result (thunk)))
;; This ensures that the continuation will hold a reference to
;; the cell, so that it will not be garbage-collected until
;; after either control returns to this continuation or a
;; throw causes this continuation to be discarded and made
;; unreachable.
(identity-procedure protector-cell)
result))
2006-04-17 Microsoft vermin
I am shocked -- *shocked!* -- to hear that the developer of Termite
had an interview for a potential job at Microsoft. (Termite, for
you ignorant wretches, is Erlang-in-Scheme for Gambit. Find it on
your own.)
2006-04-07 XEmacs S-expression motion
Turns out paredit.el 20 beta doesn't support XEmacs very well after
all. The problem is that XEmacs's S-expression motion commands
don't signal SCAN-ERROR conditions like GNU Emacs does: they just
signal general ERROR conditions. Programs *cannot* usefully
discriminate S-expression motion scan errors from other kinds of
errors, and, on software engineering principles, I refuse to modify
the code to catch and ignore all errors. Grrrmph.
(If any XEmacs developers read this (as if): Could you please
either explain why the S-expression motion code signals ERROR
conditions rather than SCAN-ERROR conditions in specific, or, if
there is no good reason, fix XEmacs?)
2006-03-30 Immobilizing blobs
There is a furry, orange blob on my lap preventing me from moving.
I believe the scientific classification for this kind of blob is
`cat'. Luckily, the state of immobility is only temporary,
apparently regulated by some sort of nutritive urges in the blob,
and also occasional boredom and needs for entertainment.
2006-03-23 I got better...
For those of you who actually *read* this thing and were a bit
worried after my 2006-03-07 post, I've recovered mostly, although
I'm maintaining a good cough. But apparently you remain ill, since
you haven't overcome the affliction of blag addiction, as you're
still reading this now.
2006-03-10 paredit.el for XEmacs; deletion keys
Recently I have been fiddling with making paredit.el work for
XEmacs. Not many changes seem to be necessary -- although I haven't
yet tested it exhaustively under XEmacs, as I'm basing my changes on
Martin Gasbichler's port to XEmacs --, since paredit.el consists
mainly of pretty basic elisp, but one of the most annoying problems
to deal with was getting the deletion keys (forward-delete and
backward-delete, or Delete & Backspace) right.
[I forgot to finish this post, so it was sitting in an autosave file
for two weeks, and, though I had a big rant all planned out two
weeks ago about getting deletion keys right, I've forgotten what I
had to say. Sorry. Also, paredit.el for XEmacs isn't finished yet
because I keep forgetting to get around to it.]
[Update, 2009-12-31: Thanks to gjs, I now have an arbitrarily large
collection of round tuits. Fortunately, I think I made paredit run
on XEmacs a couple of years ago.]
2006-03-07 More flu!
Not only was the flu great fun, but, after only a few days of having
seemed to be mostly rid of it, it has decided to come back to bring
*more* joy to my life! Mmmmm...
2006-02-24 Flu!
The flu is great fun. I recommend that any of you who have not had
the pleasure of experiencing it, find a contagious surface
*right*now* and infect yourselves with the glee of knowing what fun
and games await you!
2006-02-23 Paredit.el versions 18 & 19
This happened a while ago, but paredit.el 18 is now the released
version, and paredit.el 19 is the new beta. See
;
find the relevant files yourself. Version 19 supports insertion of
balanced delimiters other than parentheses, and a fancy S-expression
transportation device suggested by Guillaume Germain, but I haven't
yet finished the latter and put it up on the web yet.
2006-02-06 On the dangers of blaggery
Until several acquaintences coerced me into recording a `blog' (a
term which I abhor almost as much as the concept it denotes), I
tended to avoid the the things in general, excepting semi-frequent
browsing of Planet Lisp . There are a
number of reasons for my eschewal of the reading or writing of
blags.
The first reason is the content, or absence thereof. There is a
tendency among blag writers to write when it amuses them to write,
not when they have anything to say. I need not elaborate further on
this point; I am sure that the reader is quite familiar with the
saturating profusion of content-free blags throughout the web.
The second reason is that blags are astonishingly efficient sinks of
time. Right now I ought to be doing something else. But I am not
doing something else. This is because there is some aspect of the
nature of blags that compels their authors to write entries, and,
even in the rare case that the author has anything worthwhile to
say, the time composing the entry would be better wasted on a
different amusement device, such as a cat. (More accurately put,
the author ought to spend more time amusing the cat than dribbling
drivel onto the web.)
This urge to write blag entries instead of doing useful things such
as acquiescing to feline demands takes its toll also in other ways.
For example, programmers will often briefly emit mutterings on their
blags about their latest objects of fascination, without releasing
their code. If the existence of that code is an equivocal one, this
sputtering of substanceless spoutage is known in some circles as
`wanking', a technical term. Wanking is not, and never has been,
desirable behaviour to be associated with. Actual code can speak
for itself, but speaking for that which has no existence, or at best
a vaporous existence, is about as useful to others as a fish is to a
bicycle. Or something like that. Yet these programmers persist in
writing blag entries instead of substantial code.
Finally, the blag is usually directed at an unclear audience. Since
there is no particular audience that the blag is intended to cater
to, and so for any particular audience much information is discussed
in much greater detail than is necessary, and much other necessary
information is omitted. Arguments are ill-defended because the
blagger has no connection to his audience when writing the blag
entry; he must build all the argument on a nebulous idea of what and
whom he's trying to argue against. The result is poorly laid-out,
ill-conceived, and unpersuasive arguments or other types of entries.
Thank you for slogging through the awful post above. I hope that
its mere form and existence have convinced you of what it failed to
argue for itself. At the very least, perhaps it will dissuade you
from assigning further merit to this blag!
2006-02-04 Disturbing blaggery
I have noticed a disturbing urge to write more material in this joke
of a text file in the past few days. I will have to quell this
strange compulsion. I feared it would happen from the start.
2006-02-03 Option types, optional parameters
Languages like ML and Haskell that are heavily based on pattern
matching have a very simple but elegant mechanism for representing
value and absence of value without employing a special value that is
still a value like any other: option types. The basic idea is that
there is a type which I shall denote with (OPTION ), where
is the type of the value that may be present. Two kinds of
terms have this type: (SUPPLIED ) and (ABSENT), where
has the type . For example, a TABLE-ENTRY procedure might
return (SUPPLIED ) or (ABSENT) to denote that
there is no value in the table associated with the given key.
The idiom in Lisps tends to be to use the false constant (#F or NIL)
for representing the absence of value, but this inhibits #F or NIL
from being itself a present and useful value. (The conflation of
the false constant and the empty list in old, crufty Lisps
exacerbates this.) This poses a problem, for example, if you want a
general table mapping some key to any object -- including #F -- but
you also want to be able to determine when an association is absent
in the table. I call this conflation /domain contagion/, because
NIL or #F or the symbol NOT-HERE are all still in the domain of
expressible values, when they should be kept apart, because their
whole purpose is to express the *absence* of an expressible value.
There are actually two different problems here. The first I shall
leave for another day, although it is more relevant to the
TABLE-ENTRY example. It is that procedures like TABLE-ENTRY must
convey information about control flow to their callers via data
flow. This indirection ought not to be necessary. Olin Shivers
wrote a paper called `Multi-return function calls' (although I think
that `multi-continuation procedure calls' is a better term) about
this fairly recently. The basic idea is that, rather than giving a
procedure a single continuation, which itself will discriminate the
data that the procedure returns to it, the caller supplies a
*choice* of several continuations, and the callee picks one to
return to, discarding the rest. The problem of denoting absence of
value in procedures like TABLE-ENTRY can be solved by passing two
continuations to TABLE-ENTRY: one for the case where there is a
value (which continuation accepts one argument), and one for the
case where there is no value (which continuation is nullary). I
recommend the paper; I may write something on the subject later,
too.
The other problem is not in eliding the indirection of data flow,
but in storing data. Here is a simple example: optional parameters.
There is no control flow involved here. If a procedure accepts
optional parameters, it simply has several variables which may or
may not have been supplied as arguments. Now, while there are some
cases where it would suffice to specify in the lambda default values
for the case of absence of the arguments, there are other cases
where it is useful either to pass on the optional parameter's value
to another procedure (and retain the supplied/absent status) or to
conditionalize on whether the parameter was supplied in order to do
something fancier altogether.
Here are several strategies currently employed by existing Scheme
systems:
1. Give no provision for specifying default values in the lambda
syntax itself, and instead
a. use a special value, such as #F (DSSSL) or some `default
object' token (current MIT Scheme).
-- This is subject to domain contagion as discussed
above.
b. make references to the variables corresponding with
optional parameters trap, and provide a special form
DEFAULT-OBJECT? which tests whether references to those
variables would trap in order to conditionalize on their
presence as arguments (older MIT Scheme).
-- This requires some pretty kludgey changes to the
semantics of variables in the language. Furthermore,
it inhibits the passage of unsupplied optional
parameters on to other procedures: you can't simply
pass on an unsupplied parameter to another procedure
by passing the value of the corresponding variable,
because references to that variable trap. (MIT
Scheme no longer uses this semantics for optional
arguments.)
2. Require that the default value be supplied. (PLT's OPT-LAMBDA)
-- This is inconvenient in some cases, and, while there are
ways around domain contagion with this strategy, they are
pretty inconvenient and they defeat simple static type
analysis (which is still useful for dynamically typed
languages like Scheme):
(LET ((NOT-HERE (CONS 'NOT-HERE '())))
(LAMBDA (... #!OPTIONAL (X NOT-HERE) ...)
... (IF (EQ? X NOT-HERE) ...) ...)))
3. Bind another variable to a boolean flag indicating whether an
optional parameter was supplied. (Common Lisp)
-- This fixes most of the disadvantages of the above, but it
still does not allow the convenient passage of unsupplied
optional parameters on to other procedures. It can also
get to be syntactically pretty heavy-weight in usage.
All of the problems of domain contagion, type analysis defeat,
passage of unsupplied optional parameters onto other procedures, and
complexity of usage, would be alleviated by the introduction of an
option type as in Haskell or ML (or even dynamically typed languages
heavily focussed on pattern matching like Erlang). There would be
two remaining issues: undesirable boxing, and the method by which to
conditionalize on option types.
When it comes to data already stored in the heap, the boxing of an
option type is not much of an issue; it is merely one more layer of
indirection. In fact, on, say, a 64-bit system, where three tag
bits is ideal, one possibility would be to use one of those tag bits
for flagging option types. (This may not be a great idea, but it is
an idea nonetheless which might bear worth experimenting.) As for
optional parameters, for which run-time costs of heap allocation is
not a reasonable option (sorry for the pun), we must first assume
that the compiler knows about the option type in the first place;
with this assumption in mind, it is not difficult for the compiler
to recognize trivially optimizable tests of the option's presence.
For example, it could map such tests to comparisons against an NARGS
register.
The syntax for testing optional data might be, without pattern
matching, a bit clumsy. There are a couple ways it might be
accomplished: a new special form could be added, (IF-OPTION (