This is Taylor R. Campbell's blag. -*- outline -*- Copyright (c) 2006--2020, Taylor R. Campbell. Verbatim copying and distribution of this entire article are permitted worldwide, without royalty, in any medium, provided this notice, and the copyright notice, are preserved. 2020-11-04 How to election (Permalink: https://mumble.net/~campbell/2020/11/04/how-to-election) (Content warning: US-centric perspective) 1. Months before the election, sign up to be a poll worker, and do whatever training or organizing is needed in your precinct. 2. In the weeks leading up to the election, assiduously ignore all newspapering and twittering and cablenewsing and other forms of vapid blathering about elections as horse races. (See also Zeynep Tufekci's commentary on the pernicious nature of projections in `news' coverage leading up to elections.) 3. On election eve, do some activity to tire yourself out, turn in early, and get a good night's sleep. 4. Wake up at 5am on the day of the election -- enough time to get a bagel for breakfast before you hit the polling station at 6am to help set it up before the polls open at 7am. For, I hope, just 2020: Bring a thermos of hot tea if it's cold, so you can keep warm while the windows are open to ventilate the aerosolized covids out of the place. 5. Spend all day working to make everyone's vote as smooth as it can be. Work to ensure they get to cast a ballot, even if it means getting on the phone with city hall to double-check an irregularity, even if it's a provisional one pending resolution of an irregularity. Sometimes someone forgot to update their voter registration when they moved; sometimes someone misread a `9' as a `4' when entering a registration into the database. If the waiting time to vote is longer than a few minutes, something has gone wrong -- and it's probably not the poll workers' fault. Voting is a human right, not a luxury for residents of affluent precincts. 6. When the job is done and the ballots are safely recorded and delivered, go home, turn off the internet, and go to bed. 7. Spend the day after the election offline, relaxing in blissful ignorance of the state- or country-wide results, which won't be officially certified for days or weeks anyway. Curl up in a comfy chair with a good book, a cat, and a cup of tea or hot chocolate. 2018-01-26 Court watch (Permalink: https://mumble.net/~campbell/2018/01/26/court-watch) (Twitter: https://twitter.com/Riastradh_/status/956953665155321857) From a court watch this morning: an object lesson in why the @MassBailFund objects to putting pretrial services—to remind defendants of upcoming court dates—under the management of the probation department. Probation officer: She's been in court three times this week, she let the battery deplete on her GPS monitor, she ignored the flashing warning on it, and she didn't answer her phone when probation tried to contact her about it. So please revoke bail and put her in detention. Defense attorney: She had two scheduled court dates this week, she waited at probation office for battery to charge, and she came in a third time because the battery ran out. And the only reason she's on GPS is that she defaulted last year, but obviously she's showing up at court now! To be clear: probation officer asked court to hold indigent defendant responsible for state's defective device by jailing her—after she showed up in court to fix the device she's wearing only to persuade her to come to court. Her child was already taken away. 2017-10-17 On _Blade Runner_ (Permalink: https://mumble.net/~campbell/2017/10/17/on-blade-runner) (Content warning: sexual exploitation) [SPOILER ALERT: I don't think _Blade Runner_ really has spoilers, so I made no attempt to conceal anything from its plot. When I first saw it years ago, it made almost no impression on me; I thought it was just a flat atmospheric sci-fi noir. Now, having seen it again recently prompted by the hubbub around _Blade Runner 2049_, it seems to me one of the deeper films I've seen, and readily withstands repeat viewing.] Was Deckard a replicant? Every aficionado of _Blade Runner_, it seems, is obsessed with this question, whether or not it is ever answerable, or perhaps partly because it is not. What is the difference between a replicant and a human? Tyrell Corporation's motto is `more human than human'. The best we can say is that replicants are, on average, stronger and more agile or sometimes sexier, but worse at empathy. And they die after a predictable duration. The only reason we consider replicants to be _inhuman_ is that we are asked to as a stated premise of the film. And that premise is supposed to justify killing them in the spot. It doesn't take much imagination to formulate an alternative hypothesis: The Tyrell Corporation breeds and raises humans in a controlled, abusive environment -- conditioning or selecting them for strength and agility, desocializing them, and infecting them with a virus that kills them on a reliable schedule -- and then sells them as slaves. Thus they are prevented from organizing resistance while being raised in captivity, and any resistance they may attempt to organize if they escape is thwarted after a modest time. The abuse of desocialization leaves them with impaired empathy, but it is commercially useful for slaves to empathize with their masters in order to serve them better, so it is in Tyrell's economic interest to mitigate the damage of the abusive environment, perhaps with some kind of therapy to implant false memories. The Voight-Kampff test is apparently a glorified polygraph test. This pseudoscientific contraption serves not to distinguish lies from truths as the polygraph test does, but to distinguish the slave race from the human race. The idea of making that distinction is not new -- there is a long history of `scientific racism' inventing such distinctions to serve the convenient purpose of retroactively rationalizing the subjugation of entire peoples into slavery, an idea explored further by Karen E. Fields and Barbara J. Fields in their book _Racecraft_. We first meet our protagonist Deckard as he attempts to live his life freely, only to be arrested by the police and denied the opportunity to finish his sushi. The police want him to come out of retirement to `retire' some replicants. Bryant, his sometime police supervisor, can't simply ask him -- Bryant knows he wants nothing to do with it. So Bryant has abused a position of power to arrest Deckard and compel him to again serve as an agent of the system of slavery, enforcing it by killing the escapees. Deckard fancies himself to be not a racist -- a kind of enlightened white moderate liberal. In the voiceover (omitted in some cuts) he derisively compares Bryant's use of the slur `skin job' to the use of `nigger' by cops in history books. And he's willing to sleep with a replicant, and persuade himself it is for love. He even seems to have a thought or two kicking around in his head about systems of exploitation: when he tries to gain access to Zhora's dressing room -- whether to administer a Voight-Kampff test, or just to kill her more discreetly to avoid upsetting passers-by -- the badly concealed ruse he invents on the spot is as a bureaucrat studying the exploitation of burlesque artists by theatre managers. Zhora's response is understandable impatience at the clueless idiot badgering her about a world that is so painfully obviously rife with exploitation far beyond the theatre. Her weary incredulity at Deckard's cluelessness concisely reflects many of the sentiments that are circulating today in response to those who are baffled by how long Harvey Weinstein perpetuated his own personal system of sexual exploitation, three and a half decades after _Blade Runner_ was written. But whatever thoughts Deckard has about exploitation, they fall short of serious reflection on his own station. Two short scenes later, Deckard is under attack by the replicant Leon. Rachael, Tyrell's personal replicant who escaped, saves Deckard's life by killing Leon herself. At this point, the threats to Deckard's existence are -- perhaps -- the two remaining escaped replicants Roy and Pris, if they are even aware of him. The threats to Rachael's existence? Both Deckard personally, whose job it is to kill every last replicant, and the entire institution of the state enforcing a system of slavery. Rachael, in desperation, begs Deckard for mercy. Deckard reads it as a transaction: Rachael saved his life, so he'll save hers. Beyond that transaction, to any neutral observer, the imbalance of power is palpable. To Deckard, infatuated with Rachael's sexiness after scrutinizing her as a subject under an unusually long Voight-Kampff test, it is suddenly convenient. So he rapes her. And he rationalizes to himself that the sexual slavery he has just coerced her into -- as a condition for his protection from death by the state of which he is an agent -- is love. All the replicants are desperate. All they want is to live a normal life like everyone else, but they are hunted like prey. Whether one thinks violence is ever _justified_, it should come as no _surprise_ that Leon, in the circumstances unimaginably far beyond his control, had lashed out at and tried to kill Deckard. Roy is smarter, and has become crueller. In the agony of learning from Tyrell himself that nothing can be done to reverse the predetermined, premature death imposed on him as a replicant, Roy sadistically murders Tyrell -- Tyrell, who was too obsessed with the physique of his commercial product to empathize with the terror of their imposed early mortality, and can't even formulate an apology for the curse he brought upon them. Cruel and sadistic as he is, Roy demonstrates empathy and a regard for human life beyond anyone else in the film when he saves the life of his killer, Deckard. Thus his killer can listen to his appeal to empathy for life as a slave -- and can watch him at long last escape the system of slavery in the only way it allows: in the pyrrhic freedom of death, as the white dove that flies from his dying body. Was Deckard a replicant? To focus on this question after seeing _Blade Runner_'s meditation on the depravity of the institution of slavery, and on the monsters it turns everyone into -- masters, slaves, and slave traders alike -- to focus on this question strikes me like focussing on the question of whether Hitler had Jewish ancestry after watching _Schindler's List_. It may be an academic curiosity, but it's a fundamentally confused distraction from the elephant in the room -- a twisted fixation of a society desperately trying not to see itself in the mirror. 2017-06-23 Friday-night tweet-size crypto: Rabin/Blum/Williams/Shoup-style public-key key encapsulation (Permalink: https://mumble.net/~campbell/2017/06/23/fntsc-rbws-kem) (Twitter: https://twitter.com/Riastradh_/status/878486389100355586) (Secure? Maybe. Don't try this at home, kids!) - Keygen: Pick 1024-bit primes p < q uniformly with 2^2047 < p q < 2^2048 and p = q = 3 (mod 4); publish n = p q. - Encrypt: Pick x uniformly in (Z/nZ)^*; compute y = x^2 (mod n), z = y^2 (mod n); use H(y) as secret key; transmit z. - Decrypt: Compute r = z^{(p + 1)/4} (mod p), s = z^{(q + 1)/4} (mod q); use CRT to find y with y = r (mod p) and y = s (mod q). - Correctness 1: If z = y^2 in Z/mZ for prime m = 3 (mod 4), then y^{m - 1} = 1, so z = y^2 = y^{m + 1} = z^{(m + 1)/2} = (z^{(m + 1)/4})^2. - Correctness 2: Square-modulo-n is a permutation of squares modulo n: exactly one square y = x^2 (mod n) solves z = y^2 (mod n). - Hashless variant? On decrypt, use CRT to find z' with z' = r^2 (mod p) and z' = s^2 (mod q); reject if z =/= z' (mod n). --> Nope, hash (as usual) is critical for security: attack works by learning square y' =/= +/-y from z = y^2 when random y turns out nonsquare. - Reduction to factoring: Let y = f(z) solve z = y^2 (mod n). If f(y^2) = y' =/= +/-y for some chosen nonsquare y, then gcd(y+/-y', n) | n. 2017-06-16 Friday-night tweet-size crypto: ECDSA over secp256k1 (Permalink: https://mumble.net/~campbell/2017/06/16/fntsc-ecdsa-secp256k1) (Twitter: https://twitter.com/Riastradh_/status/875921073409282048) (For amusement purposes only!) - Parameters: Let p = 2^256 - 2^32 - 978, a prime; E, the Weierstrass curve y^2 = x^3 + 7 over Z/pZ of order l; B, a point; H, SHA256. - Keygen: Pick a uniformly in Z/lZ; publish A = a B. - Verify: Given message m, coordinate x(R) in Z/pZ, and scalar s in Z/lZ, test x(H(m) s^-1 B + x(R) s^-1 A) = x(R). - Sign: Pick r uniformly in Z/lZ; compute R = r B, s = r^-1 [H(m) + x(R) a] (mod l), reveal (x(R), s). - Correctness: x(H(m) s^-1 B + x(R) s^-1 A) = x([H(m) + x(R) a] s^-1 B) = x(r B) = x(R). - Nwice attack: If (m, x(R), s) and (m', x(R), s') are signatures under public key A = a B, then a = [s (H(m) - H(m'))/(s - s') - H(m)]/x(R). - Signature malleability: If (m, x(R), s) is a valid signature, then so is (m, x(R), -s), since x(-R) = x(R). [Update, 2017-10-09: Fix confusion of field size and curve order. Correct the format: a signature contains only the x coordinate of R, not a full encoding of R. Unfortunately, I cannot fix these small but critical mistakes on Twitter.] 2017-05-15 Fake news (Permalink: https://mumble.net/~campbell/2017/05/15/fake-news) `MP3 is dead!', proudly proclaims Fraunhofer, the German company whose legal authority for a decades-long scheme of extortion over sharing culture expired last month. The news was loudly trumpeted -- or, shall I say, greatly exaggerated -- by loyal corporate mouthpieces all around the world: Andrew Flanagan, `The MP3 Is Officially Dead, According To Its Creators', National Public Radio, May 11, 2017. http://www.npr.org/sections/therecord/2017/05/11/527829909/the-mp3-is-officially-dead-according-to-its-creators David Lumb, `MP3 is dead, long live AAC', Engadget, May 12, 2017. https://www.engadget.com/2017/05/12/mp3-is-dead-long-live-aac/ Karen Gilchrist, `The MP3 is dead, say creators after terminating licensing', CNBC, May 15, 2017. http://www.cnbc.com/2017/05/15/mp3-dead-say-creators-after-terminating-licensing.html `MP3 is officially dead as founders terminate licensing program', ABC, May 15, 2017. http://www.abc.net.au/news/2017-05-15/mp3-founders-terminate-licensing-program/8526292 Matthew Field, `Creators of the MP3 declare it dead', The Telegraph, May 15, 2017. http://www.telegraph.co.uk/technology/2017/05/15/creators-mp3-declare-dead/ Rhett Jones, `Developers of the MP3 Have Officially Killed It', Gizmodo, May 14, 2017. https://gizmodo.com/developers-of-the-mp3-have-officially-killed-it-1795205540 No, Rhett. The developers of the MP3 haven't killed it. *You* killed it by blithely copying corporate press release bullshit in the tarnished name of journalism and propagating the commercial interests of Fraunhofer as if they were truth for your readers to learn. Actually, you didn't even accomplish that, because unlike the reputations of shills in the skins of journalists, well-documented, widely implemented technical ideas don't really die. 2016-12-31 One Gooshball (Permalink: https://mumble.net/~campbell/2016/12/31/one-gooshball) Am E7 A little cat paced up and down E7 Am She found an empty bowl on the ground Am7/G Dm/F She looked the cupboard through and through E7 E7 To see what feast she'd like to chew. Am Am7/G F7 E7 One goosh- ball, Am Am7/G F7 E7 One goosh- ball, Am Am7/G F7 E7 She'd like to eat just one goosh- ball. (repeat) She told the human near at hand, The simple dinner she had planned. The cats were startled one and all To hear that human loudly call no goosh-ball! No goosh- ball, No goosh- ball, You've had your dinner: no goosh-ball! The little cat felt ill at ease. She said, `I'm starving-- feed me, please!' The human hollered down the hall You're much too fat for one goosh-ball! One goosh- ball, One goosh- ball, You're much too fat for one goosh-ball! (Scat-sung verse and refrain -- or cat-sung, if you like.) The little cat felt very bad, Crunchy-food was all she had. And in her dreams she hears that call You're much too fat for one goosh-ball! One goosh- ball, One goosh- ball, You're much too fat for one goosh-ball! (With apologies to Dave Van Ronk, whose version is the best; to Professor George Martin Lane of Harvard who wrote this ersatz traditional folk tune in the first place; and to Miette, who attained the nickname Miche, and who feels very bad right now.) 2016-12-16 neveragain.tech (Permalink: https://mumble.net/~campbell/2016/12/16/neveragain.tech) [Nothing in this post is novel. I am writing it as a personal addendum to the pledge to amplify the meaning of my signature.] After some deliberation, and with reservations, I signed the pledge at that has been circulating the internet[1], text quoted below in full, because it is a starting point. Why reservations? It is *only* a starting point. We have already built the apparatus that the signatories of this pledge refuse to build. I have long stated, quite seriously, that Mark Zuckerberg pulled off with Facebook the gigantic trove of dossiers on everyone that the NSA, the Stasi, the Gestapo, and countless other institutions of mass surveillance and coercion could only dream of creating. Ten years ago that sounded to my colleagues, friends, and family like a joke. It was not and is not a joke. And it's not just Facebook. I am optimistic that mass surveillance is not so intricately woven into the fabric of society that we can't meaningfully resist it. But it has embedded itself deeply -- not just in social media, but also in all your searches for information on the web at Google, all your commerce at Amazon, all your purchases with a credit card[2]. It is embedded into most of the press[3], our primary medium for criticism of institutions of power, who are consequently unable to contemplate rejecting mass surveillance[4]. Why reservations? Not everyone has the financial security to commit to resigning when they recognize the hints of what they are being asked to do -- which is no individual fault of their own if our society fails to support the interests of its people adequately against the interests of corporations. And with no consequences for transgression, a signature on this pledge may be only an intrinsically meaningless instrument of social capital to make its signatories feel morally superior and enhance their image to colleagues. Why reservations? It is not only about Muslims or immigrants -- mass surveillance is a general instrument of control on its own, whether or not it is used to target an ethnic group. Why reservations? The impetus for the pledge now is obviously Trump and the incoming administration. But a Clinton administration would doubtless only have expanded, like the Obama administration did, the powers that the moderate liberal tribe is afraid Trump might use, such as indefinite detention[5][6], mass immigrant family detention[7], general warrants[8], and assassination[9]. My signature of this pledge is emphatically nonpartisan. Prior to November 8th, I began to plan a visit D.C. on January 20th to protest whichever of the bad options on the ballot was chosen. Bruce Schneier wrote yesterday that his personal list of priorities for the next four years would have been the same under Trump and Clinton[10]. This is not because Trump and Clinton are equivalent (though they do share a common propensity for weighing the interests of multinational corporations over people, such as in their practically identical plans for an enormous corporate tax holiday[11]), but because the premise that the current qualified leader will use powers such as assassination responsibly has led all three branches of the US federal government to irresponsibly develop those powers, and those powers persist beyond the president under whom they were instated. I hope that by signing the pledge I will remind my future self to be vigilant, and perhaps I will in at least a small way embolden those who respect me to do the same -- by making it, if only a little bit, more socially acceptable to resist. [Update, 2016-12-16: Shortly after I wrote this, I became aware of Mike Perry's much better response written at the Tor Project blog: Mike Perry, `Technology in Hostile States: Ten Principles for User Protection', Tor Project Blog, December 16, 2016. https://blog.torproject.org/blog/technology-hostile-states-ten-principles-user-protection] Text of the pledge: We, the undersigned, are employees of tech organizations and companies based in the United States. We are engineers, designers, business executives, and others whose jobs include managing or processing data about people. We are choosing to stand in solidarity with Muslim Americans, immigrants, and all people whose lives and livelihoods are threatened by the incoming administration’s proposed data collection policies. We refuse to build a database of people based on their Constitutionally-protected religious beliefs. We refuse to facilitate mass deportations of people the government believes to be undesirable. We have educated ourselves on the history of threats like these, and on the roles that technology and technologists played in carrying them out. We see how IBM collaborated to digitize and streamline the Holocaust, contributing to the deaths of six million Jews and millions of others. We recall the internment of Japanese Americans during the Second World War. We recognize that mass deportations precipitated the very atrocity the word genocide was created to describe: the murder of 1.5 million Armenians in Turkey. We acknowledge that genocides are not merely a relic of the distant past—among others, Tutsi Rwandans and Bosnian Muslims have been victims in our lifetimes. Today we stand together to say: not on our watch, and never again. We commit to the following actions: - We refuse to participate in the creation of databases of identifying information for the United States government to target individuals based on race, religion, or national origin. - We will advocate within our organizations: . to minimize the collection and retention of data that would facilitate ethnic or religious targeting. . to scale back existing datasets with unnecessary racial, ethnic, and national origin data. . to responsibly destroy high-risk datasets and backups. . to implement security and privacy best practices, in particular, for end-to-end encryption to be the default wherever possible. . to demand appropriate legal process should the government request that we turn over user data collected by our organization, even in small amounts. - If we discover misuse of data that we consider illegal or unethical in our organizations: . We will work with our colleagues and leaders to correct it. . If we cannot stop these practices, we will exercise our rights and responsibilities to speak out publicly and engage in responsible whistleblowing without endangering users. . If we have the authority to do so, we will use all available legal defenses to stop these practices. . If we do not have such authority, and our organizations force us to engage in such misuse, we will resign from our positions rather than comply. . We will raise awareness and ask critical questions about the responsible and fair use of data and algorithms beyond our organization and our industry. Signed, among one thousand eight-hundred forty-two others as of this post, Taylor `Riastradh' Campbell, Research Engineer, Massachusetts Institute of Technology [1] I signed by email, so it may take some time for my signature to appear on the page. [2] You think it is the secrecy of your credit card number that keeps your credit card secure? No -- the security of the credit card system rests first and foremost on the profiles the credit card companies build of your activity to detect unusual behaviour. It is not merely possible for credit card companies to monitor your purchasing behaviour -- it is essential to the functioning of the credit card system that they continuously monitor and profile it. [3] Quinn Norton, `The Hypocrisy of the Internet Journalist', The Message at medium.com, May 29, 2015. https://medium.com/message/the-hypocrisy-of-the-internet-journalist-587d33f6279e [4] Editorial Board, `Boston police enter era of cyber community policing', Boston Globe, December 2, 2016. https://www.bostonglobe.com/opinion/editorials/2016/12/02/boston-police-enter-era-cyber-community-policing/DtZRX7y7x6VsQFYYBE6yyO/story.html [5] D. Parvaz, `US lawmakers legalise indefinite detention', Al-Jazeera, December 16, 2011. http://www.aljazeera.com/indepth/features/2011/12/201112773810926474.html [6] Chris McGreal, `Military given go-ahead to detain US terrorist suspects without trial', The Guardian, December 14, 2011. https://www.theguardian.com/world/2011/dec/15/americans-face-guantanamo-detention-obama [7] Claudia Morales, `Families crossing the border: ``We are not criminals'' ', CNN, November 2, 2016. http://www.cnn.com/2016/11/02/us/family-immigration-detention-centers/ [8] Jeff John Roberts, `FBI's New Hacking Powers Take Effect This Week', Fortune, November 30, 2016. https://fortune.com/2016/11/30/rule-41/ [9] Jeremy Scahill, `The Assassination Complex', The Drone Papers, The Intercept, October 15, 2015. https://theintercept.com/drone-papers/the-assassination-complex/ [10] Bruce Schneier, `My Priorities for the Next Four Years', Schneier on Security, December 15, 2016. https://www.schneier.com/blog/archives/2016/12/my_priorities_f.html [11] David Dayen, `The Huge Corporate Tax Cut Hillary Clinton Doesn't Talk About', New Republic, October 21, 2016. https://newrepublic.com/article/138023/huge-corporate-tax-cut-hillary-clinton-doesnt-talk 2016-12-08 On journalism, credibility, and fake news I echo to my echo chamber everything said in this essay: Nathan J. Robinson, `The Necessity of Credibility', Current Affairs, December 3, 2016. https://www.currentaffairs.org/2016/12/the-necessity-of-credibility 2016-12-03 On Uber and power (Permalink: https://mumble.net/~campbell/2016/12/03/on-uber-and-power) I do not use Uber. I primarily ride public transit, and bicycle. On the rare occasions when hiring a car is the only expedient option -- nine times in the past year, by my records -- I ride a taxi. Why? As everyone knows, taxis are more expensive than Uber. As many are quick to point out, the taxi medallion systems are rampant with corruption[1]. Isn't it better to use the cheaper system that fights age-old corruption in the political machine? Uber is cheaper mainly because their business model is to unsunstainably undercut the competition with their billions of dollars of venture capital until the competition is dead[2]. The corruption of taxi medallion systems is distributed across cities, not centralized in one office in San Francisco. Uber has much bigger coffers, again from venture capital, to campaign for cities to regulate them less than taxis[3] -- and to extort deregulation from major cities[4]. The conceit of Uber's currently unsustainable business model is that the venture capital will carry it far enough to replace the fleet of human drivers by software drivers. Whether that pans out, the intended result is another mass surveillance[5] empire monitoring and brokering everyone's comings and goings around town that has demonstrated the will to use its centralized power to bribe and extort local regulators across continents. That is a power I do not want to support, so I vote with my wallet and refuse to participate. (Is Uber the most important corporate machine to avoid? Absolutely not, but it is so prevalent in popular culture that has become a part of the lexicon, and hence worth addressing. See also: .) [1] Boston Globe Spotlight Team, `Driven to the Edge', three-part series https://www.bostonglobe.com/metro/specials/taxi Part 1: Globe staff, `For Boston cabbies, a losing battle against the numbers', Boston Globe, March 31, 2013. https://www.bostonglobe.com/metro/2013/03/30/spotlight/9eVWW7Y6RaOIqII62n2XlI/story.html Part 2: Globe staff, `An empire built on ambition and a very hard line', Boston Globe, April 1, 2013. https://www.bostonglobe.com/metro/2013/03/31/empire-built-ambition-and-very-hard-edge/EyaKQDkzLC8u6DuYqbHMZL/story.html Part 3: Globe staff, `For cab drivers, risk and reward are a mismatch', Boston Globe, April 2, 2013. https://www.bostonglobe.com/metro/2013/04/01/spotlight/IkU7kjxSy2d1N8eYhDTBaL/story.html [2] Hubert Horan, `Can Uber Ever Deliver?', three-part series in Naked Capitalism. Hubert Horan, `Can Uber Ever Deliver? Part One -- Understanding Uber's Bleak Operating Economics', Naked Capitalism, November 30, 2016. https://www.nakedcapitalism.com/2016/11/can-uber-ever-deliver-part-one-understanding-ubers-bleak-operating-economics.html Hubert Horan, `Can Uber Ever Deliver? Part Two -- Understanding Uber's Uncompetetive Costs', Naked Capitalis, December 1, 2016. https://www.nakedcapitalism.com/2016/12/can-uber-ever-deliver-part-two-understanding-ubers-uncompetitive-costs.html Hubert Horan, `Can Uber Ever Deliver? Part Three -- Understanding False Claims About Uber's Innovation and Competitive Advantages', Naked Capitalism, December 2, 2016. https://www.nakedcapitalism.com/2016/12/can-uber-ever-deliver-part-three-understanding-false-claims-about-ubers-innovation-and-competitive-advantages.html [3] Casey Tolan, `Uber and Lyft are spending millions of dollars on a local election in Austin, Texas. Here's why.', Fusion, April 19, 2016. https://fusion.net/story/291210/uber-lyft-ridesharing-election-austin-texas/ [4] Lora Kolodny, `Uber and Lyft ``pause'' Austin operations today in standoff over regulation', TechCrunch, May 9, 2016. https://techcrunch.com/2016/05/09/uber-and-lyft-pause-austin-operations-in-standoff-over-regulation/ [5] Kate Conger, `Uber begins background collection of rider location data', TechCrunch, November 28, 2016. https://techcrunch.com/2016/11/28/uber-background-location-data-collection/ This is not to say that *background* collection of location data is what made the difference between mass surveillance and not -- only to illustrate the extent of mass surveillance to which Uber is willing to go for its business purposes. 2016-12-02 So absurd I have nothing to add. Wine-glass-toting ascot-wearing city solicitor tags upscale grocery store `FUCK TRUMP'. Elie Mystal, `City Attorney Spraying Anti-Trump Graffiti While Drinking Wine IS All We Have Left', Above the Law, 2016-12-01. http://abovethelaw.com/2016/12/city-attorney-spraying-anti-trump-graffiti-while-drinking-wine-is-all-we-have-left/ 2016-11-20 More topical essays A friend writes to add another essay to the list: Scott Alexander, `You Are Still Crying Wolf', Slate Star Codex, November 16, 2016. http://slatestarcodex.com/2016/11/16/you-are-still-crying-wolf/ I would add to the caveat that the essay only addresses the rhetoric in the moderate liberal tribe about whether Trump is `openly racist' or `openly white supremacist' -- phrases that are generally not likely to win you support from the people you apply them to, or from the people whose choice of representative you apply them to. Does this mean a Trump/Pence administration, Republican-controlled Congress, and Trump-nominated Supreme Court will be hunky-dory for queers, Muslims, &c.? I don't think so! There is plenty else to criticize about the incoming administration, legislature, and judiciary -- but also about the outgoing administration, legislature, and judiciary: Alex Emmons, `Commander-In-Chief Donald Trump Will Have Terrifying Powers. Thanks, Obama.', The Intercept, November 11, 2016. https://theintercept.com/2016/11/11/commander-in-chief-donald-trump-will-have-terrifying-powers-thanks-obama/ Glenn Greenwald, `Donald Trump's Policies Are Not Anathema to U.S. Mainstream, but an Uncomfortable Reflection of It', The Intercept, March 4, 2016. https://theintercept.com/2016/03/04/trumps-policies-are-not-anathema-to-the-u-s-mainstream-but-an-uncomfortably-vivid-reflection-of-it/ (The photograph is of Rudy Giuliani, Donald Trump, Michael Bloomberg, and Bill Clinton all golfing together.) Here are some more essays that I thought noteworthy before I gave myself a break and stopped collecting them last week: If you can get past the us-vs-them clickbait listicle, the caricature this essay paints is worth studying not exactly for the truth (or even internal consistency) of it but to taste the perspectives of what many white moderate liberals pejoratively characterize as idiots in the rust belt, from someone who grew up in the middle of nowhere in Illinois: David Wong, `How Half of America Lost Its F**king Mind', Cracked, October 12, 2016. http://www.cracked.com/blog/6-reasons-trumps-rise-that-no-one-talks-about/ On how journalism changed in this cycle -- and on the perverse incentives of centrally organized but personalized feeds and filter bubbles. (This issue is orthogonal to the true campaign positions of Trump and Clinton -- which I believe relatively few people would have found compelling either way even were they not surrounded by sensationalist hype and clickbaited filter bubbles.) Joshua Benton, `The forces that drove this election's media failure are likely to get worse', November 9, 2016. http://www.niemanlab.org/2016/11/the-forces-that-drove-this-elections-media-failure-are-likely-to-get-worse/ Silverman, Strapagiel, Shaban, Hall, & Singer-Vine, `Hyperpartisan Facebook Pages Are Publishing False and Misleading Information At An Alarming Rate', October 20, 2016. https://www.buzzfeed.com/craigsilverman/partisan-fb-pages-analysis On specifically Facebook's feeds, ideological diversity, the faux objectivity of machines and big data, and the influences of profit-motivated monopolies that are the man behind the curtain of that machine-assisted faux objectivity. (Pay attention to the author, not the publication, here -- there is much to criticize about HuffPo, but I have found Zeynep Tufekci to be a consistently thoughtful source of new ideas and perspectives.) Zeynep Tufekci, `Facebook Said Its Algorithms Do Help Form Echo Chambers', WorldPost, Huffington Post, May 11, 2015. http://www.huffingtonpost.com/zeynep-tufekci/facebook-algorithm-echo-chambers_b_7259916.html Glenn Beck, whose political transformation in recent months has stunned many, is an expert in stoking fires and watching them burn, including, e.g., the notion promulgated beginning eight years ago that Obama might not leave office after two terms -- a notion I have already heard repeated countless times about Trump in white moderate liberal circles. So I think he is qualified to comment on the mechanism of the hysteria surrounding Trump's presidency, whether or not the content of the hysteria is valid. Glenn Beck, `Don't Move to Canada. Talk to the Other Side.', Op-Ed, New York Times, November 11, 2016. http://www.nytimes.com/2016/11/11/opinion/glenn-beck-dont-move-to-canada-talk-to-the-other-side.html 2016-11-11 Fever A colleague related to me this afternoon an analogy to understand the severity of global average temperature increase by 2℃. When a human's body temperature raises by 1℃, it's a fever -- they feel ill, can't focus on work, and go home to rest. Another degree or two more and it's a serious fever -- enough to go to the hospital for treatment. But the Earth can't go home to rest, and certainly can't find a doctor among its planetary comrades for treatment. It's a cute analogy to explain the danger of a 2℃ temperature increase. But we can extend the analogy further. Humans and other mammals develop fever, we surmise, to kill pathogens when the immune system is otherwise unable to keep up. So what does that suggest the Earth is doing to humans by heating up 2℃? No doubt this point was to be the conclusion of Agent Smith's monologue to Morpheus about his reinterpretation of humans as a kind of virus, before Neo and Trinity so rudely barged in to interrupt! 2016-11-10 Topical essays Many are reeling with the news of Trump's narrow electoral victory and blaming the inexplicable rise of xenophobic right-wing nationalism. The same reaction happened with Brexit a few months ago. That story is a convenient excuse, easy to bandy about in moderate liberal social circles, to escape introspection about deeper causes, worsening economic class divisions, and the impact of globalism. Here are several relevant essays that address these issues better than I could: Ian Williams, `Lexington', medium.com, November 9, 2016. https://medium.com/@Brocktoon/lexington-c1825d25442e Glenn Greenwald, `Brexit Is Only the Latest Proof of the Insularity and Failure of Western Establishment Institutions', The Intercept, June 25, 2016. https://theintercept.com/2016/06/25/brexit-is-only-the-latest-proof-of-the-insularity-and-failure-of-western-establishment-institutions/ Glenn Greenwald, `Democrats, Trump, and the Ongoing, Dangerous Refusal to Learn the Lesson of Brexit', The Intercept, November 9, 2016. https://theintercept.com/2016/11/09/democrats-trump-and-the-ongoing-dangerous-refusal-to-learn-the-lesson-of-brexit/ Nassim Nicolas Taleb, `The Intellectual Yet Idiot', medium.com, September 16, 2016. https://medium.com/@nntaleb/the-intellectual-yet-idiot-13211e2d0577 This is not to say that the concerns of these moderate liberal circles are invalid, especially about the rights of identity minorities: LGBTQ, Muslims, people of colour, &c. -- a Trump administration and Republican-controlled Congress does not bode well for that. But while the rights of identity minorities may have stood a better chance under a Clinton administration, a continuation of the economic status quo under Obama and the other neoliberal administrations before him would likely only have exacerbated the underlying economic rifts that led to Trump, Brexit, and doubtless other brewing nationalist sentiment in the world. 2016-11-05 Meditation on ceilings Tonight my bathroom ceiling fell off. New experience for me. Fortunately none of the black goo in there got on my toothbrush. On the other hand, there is now a sea of plaster, black goo, and other unrecognizable spoo between it and me. In the ceiling's defence, I did poke it earlier this evening to see just how bad the leak had gotten since the landlord looked at it this afternoon while seeking a plumber. Can't claim I didn't provoke it. Good thing I went easy on the tea today. My bladder is thanking me, though I'm not yet sure what I'll tell it in the morning. Anyone have a recommendation for a good general contractor? Asking for a friend. 2016-03-26 Anti-tribalism I'm founding a club called the Anti-Tribalism League. The condition for membership is the repudiation of social conditions for membership. The only member is Groucho Marx. 2015-11-07 The headline in The Tech on October 22, `MIT will not divest, says industry is key in new climate plan', reveals a disheartening duplicity in the climate plan proposed by President Reif: in the name of averting climate disaster, we must profit from the very industry that thrives on causing it. In the climate plan, Reif says `In our judgement, the deliberate public act of divestment would entangle MIT in a movement whose core tactic is large-scale public shaming.' What Reif proposes to continue funding is entangling MIT in a corporate movement, the fossil fuel industry, whose core tactic is large-scale climate destruction and mass deception to persuade the world that climate change does not exist, does not matter, is not caused by human activity, will be a net win, etc. The financial incentives are obvious. The conflict of interest is obvious. What Reif calls a `symbolic public move' is participating in a worldwide movement not to shame but to de-fund the disastrous activities of the fossil fuel industry. Words are cheap; money talks. 2015-05-01 Taxonomy In Istanbul it's a protest, with riot police shooting water cannon and tear gas to suppress protestors defying a government ban on celebrations. `At least 250 people were detained, according to the Istanbul Bar Association.' Ceylan Yeginsu, `Police Fire Tear Gas and Water Cannons at Istanbul Protesters', New York Times, 2015-05-01. In Baltimore the city is forced to call in the national guard to help fight riots and looting. The day `ended with rioting by rock-throwing youths, arson, looting, and at least 15 police officers injured.' Sheryl Gay Stolberg, `Baltimore Enlists National Guard and a Curfew to Fight Riots and Looting', New York Times, 2015-04-27. In the Washington Post, coverage like that of the New York Times on Istanbul is marked as satire and relegated to an opinion blog: Karen Attiah, `How Western media would cover Baltimore if it happened elsewhere', PostPartisan blog, Washington Post, 2015-04-30. 2015-04-29 Words from Dr King other than `nonviolence' Yesterday Wolf Blitzer of CNN interviewed Deray McKesson about the ongoing protests in Baltimore against police brutality [1][2]. The impression CNN gives is that all of a sudden on Monday, riots and looting and pillaging broke out in the dangerous city of Baltimore over the weekend after a black repeat criminal misdemeanor offender, Freddie Gray, mysteriously died in a police transport of a severed spine. The same impression is given by the New York Times, the Washington Post, and many other popular news sources. The Washington Post reported 100 people arrested [3]. The New York Times provided an interactive map of the two dozen violent incidents in Baltimore since Gray's funeral [4], and reported that Governor Larry Hogan requested 5,000 police officers be deployed to the scene [5], not to mention the National Guard. No mainstream American news source that I read even hazarded an estimate of the number of protesters present, the vast majority of whom were peaceful. Google turned up no news stories giving such estimates. Reuters reported 2,000 the day of Gray's funeral [6]; I have heard other estimates as high as 10,000. Countless bits have already been spilt on the subject of McKesson's brief but incisive interview with Blitzer, which speaks well for itself. What struck me in particular about it was when Blitzer invoked Dr King as he repeatedly asked McKesson to subscribe to the mainstream media focus on violence: `But I just want to hear you say that there should be peaceful protests, not violent protests, in the tradition of Dr Martin Luther King.' I could not help but recall Dr King's words about the white moderate: `I must make two honest confessions to you, my Christian and Jewish brothers. First, I must confess that over the past few years I have been gravely disappointed with the white moderate. I have almost reached the regrettable conclusion that the Negro's great stumbling block in his stride toward freedom is not the White Citizen's Counciler or the Ku Klux Klanner, but the white moderate, who is more devoted to ``order'' than to justice; who prefers a negative peace which is the absence of tension to a positive peace which is the presence of justice; who constantly says: ``I agree with you in the goal you seek, but I cannot agree with your methods of direct action''; who paternalistically believes he can set the timetable for another man's freedom; who lives by a mythical concept of time and who constantly advises the Negro to wait for a ``more convenient season''. Shallow understanding from people of good will is more frustrating than absolute misunderstanding from people of ill will. Lukewarm acceptance is much more bewildering than outright rejection.' It is as though the only lessons of the civil rights movement, as remembered in popular white America, were to remove laws formally or directly making blacks second-class citizens, and to chide any movement with a hint of associated violence for failing to adhere to Dr King's prescription of nonviolence for the upset. We need not recall Malcolm X; we need not recall the Black Panthers and their social programs; we need not even recall any of the rest of Dr King's messages and methods outside the single word `nonviolence'. [Update, 2015-04-30: Without needing to name him, The Onion demonstrates a better understanding of Dr King's letter from a Birmingham jail than Blitzer does in his broken record demand for condemnation of violence: `Baltimore Residents Urged To Stay Indoors Until Social Progress Naturally Takes Its Course Over Next Century', The Onion, 2015-04-29. ] [1] https://www.youtube.com/watch?v=NyYdKD0af78 (video) [2] http://www.rawstory.com/2015/04/activist-smacks-down-wolf-blitzer-you-are-suggesting-broken-windows-are-worse-than-broken-spines/ (transcript) [3] Mary Pat Flaherty, Lynh Bui, and Dan Morse. `About 80 people arrested in Baltimore turmoil freed after time runs out', Washington Post, 2015-04-29. [4] `Mapping the Clashes Between Baltimore Police and Protesters', New York Times, 2015-04-28. [5] Sheryl Gay Stolberg, `Baltimore Enlists National Guard and a Curfew to Fight Riots and Looting', New York Times, 2015-04-27. [6] Lacey Johnson, `Thousands march in Baltimore to protest black man's death', Reuters, 2015-04-25. 2015-02-22 BLAKE2 and (draft) SHA-3 I wrote portable C implementations of BLAKE2 and (draft) SHA-3 in Mercurial repositories at https://mumble.net/~campbell/hg/blake2 https://mumble.net/~campbell/hg/sha3 (WARNING: The SHA-3 standard, FIPS 202, is not finalized yet, so please don't use this code in real applications.) These are tuned for legibility and ease of use first, and performance second. They will not beat implementations with CPU-specific optimizations and vectorizations. I wrote them so I could use them in applications by dropping in the relevant four files, without rummaging around in SUPERCOP for what actually matches the standard and dealing with CPU-specific code. I plan to use Keccak (not SHA-3 per se) for an entropy pool (so matching the final SHA-3 standard doesn't really matter), and I plan to use blake2s for a content-addressed storage scheme. 2015-02-17 Protocol buffers in C I wrote a simple-minded implementation of protocol buffers in C, called picopb, without any dependency on Google's protoc schema compiler written in C++. It is not yet finished, but if you'd like to play with it, it's in a Mercurial repository at https://mumble.net/~campbell/hg/picopb It has a few NetBSDisms, notably the makefiles, for expedience, because I plan to use it to replace proplib (dynamically typed schemaless dict/array/scalar library with XML wire format) for ioctls and other RPC-like applications in NetBSD. As mentioned, it is not yet finished: the automatic tests are incomplete, and the automatic protobuf<->proplib conversion is not yet implemented, nor is code generation for kernel ioctl routines. 2015-01-21 Multiplicative inverses modulo a power of two (Permalink: https://mumble.net/~campbell/2015/01/21/inverse-mod-power-of-two) I couldn't find any good citation on the web for how to compute the multiplicative inverse of an odd integer a, modulo a power of two, n = 2^w, which is necessary preparation for Montgomery reduction. The obvious way to compute modular inverses is to use the extended Euclidean algorithm, which, given a and n, computes integers s and t satisfying Bézout's identity a s + n t = gcd(a, n) = 1, so that s is a multiplicative inverse of a modulo n, and t is a multiplicative inverse of n modulo a. (define (euclid a b) (define (loop s s* t t* r r*) ;; a s + b t - r = 0 (if (zero? r*) (values s t r) (let ((q (quotient r r*)) (r** (remainder r r*))) (loop s* (- s (* q s*)) t* (- t (* q t*)) r* r**)))) (if (< a b) (loop 0 1 1 0 b a) (loop 1 0 0 1 a b))) ;; Bezout's identity: (let ((a 3) (b (expt 2 32))) (receive (s t r) (euclid a b) (- (+ (* a s) (* b t)) r))) ;Value: 0 ;; Modular inverse: (receive (s t r) (euclid 3 (expt 2 32)) (modulo s (expt 2 32))) ;Value: 2863311531 (modulo (* 3 2863311531) (expt 2 32)) ;Value: 1 This requires division, and in particular division of n, which, if w is our word size so n cannot itself be represented in a word, is not convenient at the machine level -- e.g., if we were writing code in standard C without bignums. The second obvious way -- if you are keen on number theory -- is to use Euler's theorem, that a^phi(n) = 1 (mod n), or a*a^(phi(n) - 1) = 1 (mod n), so that a^(phi(n) - 1) is a multiplicative inverse of a modulo n, where phi(n) is Euler's totient function, giving the number of positive integers below n and coprime with it. In particular, for any prime q, of the q^w - 1 positive integers below q^w, the only ones sharing factors in common with q^w are the multiples of q, of which there are q^(w - 1) - 1, so phi(q^w) = q^w - 1 - (q^(w - 1) - 1) = q^w - q^(w - 1). In our case, phi(2^w) = 2^w - 2^(w - 1) = 2^(w - 1). Using a standard square-and-multiply algorithm for modular exponentiation quickly computes the inverse: (define (modexp a e n) (if (zero? e) 1 (let loop ((r 1) (b (modulo a n)) (f e)) ;; r b^f = a^e (mod n), f > 0 (let ((r (if (even? f) r (modulo (* r b) n))) (f (quotient f 2))) (if (zero? f) r (loop r (modulo (* b b) n) f)))))) (modexp 3 (- (expt 2 31) 1) (expt 2 32)) ;Value: 2863311531 This general modular exponentiation requires 2 w - 1 w-bit multiplications. If our modulus is an even power of two, i.e. the square of the power of two, we can do a little better. Say n = m^2 for some m and we can find an inverse of a modulo m, a b = 1 (mod m), or a b - 1 = k m for some k. We would like an inverse b' of a modulo n = m^2, so that a b' - 1 = k' m^2 for some k'. Squaring both sides of a b - 1 = k m in an attempt to find a multiple of m^2 gives k^2 m^2 = (a b - 1)^2 = a^2 b^2 - 2 a b + 1 = a b (a b - 2) + 1, or, if we negate both sides, a b (2 - a b) - 1 = -k^2 m^2. Hence we have found an inverse b' = b (2 - a b) of a modulo m^2 = n, with k' = -k^2. A little more generally, if a b = 1 (mod m), then for any power c of m, we can use the binomial expansion (a b - 1)^c = a b (...) + (-1)^c to find b' = (-1)^(c + 1) b (...) so that a b' = 1 (mod m^c). Similarly, for any v and u coprime with a (and not necessarily distinct), if a b = 1 (mod u) and a c = 1 (mod v), so that a b - 1 = i u and a c - 1 = j v for some i and j, then taking their product yields i j u v = (a b - 1) (a c - 1) = a^2 b c - a b - a c + 1 = a (a b c - b - c) + 1, or a (b + c - a b c) - 1 = -i j u v, so that b + c - a b c is an inverse of a modulo u v. If n is a power of two power of two -- that is, 2^(2^i) for some i: 2, 4, 16, 256, 65536, &c. --, as is likely to be the case for machine word sizes in which to do Montgomery reduction, this lends itself to a faster algorithm by starting at a, which, being odd, is its own inverse modulo 2: (define (modinv2^2^i a i) (let loop ((r 0) (b a)) ;; a b = 1 (mod 2^(2^r)) (if (< r i) (loop (+ r 1) (* b (- 2 (* a b)))) b))) (modulo (modinv2^2^i 3 5) (expt 2 32)) ;Value: 2863311531 This approach uses only 2 lg w w-bit multiplications and lg w w-bit additions. (The Scheme code does not reduce the intermediate quantities, but all arithmetic can be done modulo n -- or even modulo 2^(2^(r + 1)) <= n.) For Montgomery reduction, which needs a b = -1 (mod n), we can either negate the result, or compute it directly with the modified step b' = b (2 + a b). The above source code is also available at . [Update, 2015-02-22: I found , which provides algorithms for computing a mod p^w for more general p and w.] 2014-12-12 Anonymity and harassment Anonymity lets us challenge dangerous power structures without fear of retribution. Anonymity lets us have conversations without being prejudiced -- consciously or unconsciously -- by skin colour, by masculine or feminine name, by regional accent, by dress. Anonymity lets us explore ideas we are not comfortable publishing. Some people also abuse anonymity to harass other people. This doesn't serve anyone -- it only hurts people and detracts from the value of anonymity to everyone by giving anonymity a bad name. Let's give the bad name to harassment, not to anonymity. https://blog.torproject.org/blog/solidarity-against-online-harassment 2014-11-27 PoC||GTFO is now an official samizdat archive of the annals of Proof of Concept or Get That FUD Out. The current issue of which I am aware will be found at , and finding the other issues is left as an exercise for the reader. (No cheating and using google to find the answer!) 2014-11-03 [redacted] [redacted] [Update, 2014-11-06: [redacted]] 2014-04-28 Uniform random floats: How to generate a double-precision floating-point number in [0, 1] uniformly at random given a uniform random source of bits (Permalink: https://mumble.net/~campbell/2014/04/28/uniform-random-float) The naive approach of generating an integer in {0, 1, ..., 2^k - 1} for some k and dividing by 2^k, as used by, e.g., Boost.Random and GCC 4.8's implementation of C++11 uniform_real_distribution, gives a nonuniform distribution: - If k is 64, a natural choice for the source of bits, then because the set {0, 1, ..., 2^53 - 1} is represented exactly, whereas many subsets of {2^53, 2^53 + 1, ..., 2^64 - 1} are rounded to common floating-point numbers, outputs in [0, 2^-11) are underrepresented. [Update, 2016-08-23: I can't figure out what I meant by the above bias. The probability the floating-point draw lies in [0, 2^-11) should be 2^-11. This is the same as the probability that the integer draw lies in {0, 1, ..., 2^53 - 1}, which is 2^53/2^64 = 2^-11. Certainly values below 2^-64 are underrepresented, as below, but that's a separate issue.] - If k is 53, in an attempt to avoid nonuniformity due to rounding, then numbers in (0, 2^-53) and 1 will never be output. These outputs have very small, but nonnegligible, probability: the Bitcoin network today randomly guesses solutions every ten minutes to problems for which random guessing has much, much lower probability of success, closer to 2^-64. What is the `uniform' distribution we want, anyway? It is obviously not the uniform discrete distribution on the finite set of floating-point numbers in [0, 1] -- that would be silly. For our `uniform' distribution, we would like to imagine[*] drawing a real number in [0, 1] uniformly at random, and then choosing the nearest floating-point number to it. To formalize this, start with the uniform continuous distribution on [0, 1] in the real numbers, whose measure is mu([a, b]) = b - a for any [a, b] contained in [0, 1]. Next, let rho be the default (round-to-nearest/ties-to-even) rounding map from real numbers to floating-point numbers, so that, e.g., rho(0.50000000000000001) = 0.5. Note that the preimage under rho of any floating-point number -- that is, for a floating-point number x, the set of real numbers that will be rounded to x, or { u \in [0, 1] : rho(u) = x } -- is an interval. Its measure, mu(rho^-1(x)), is the measure of the set of numbers in [0, 1] that will be rounded to x, and that is precisely the probability we want for x in our `uniform' distribution. Instead of drawing from real numbers in [0, 1] uniformly at random, we can imagine drawing from the space of infinite sequences of bits uniformly at random, say (0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, ...), and interpreting the result as the fractional part of the binary expansion of a real number in [0, 1]: 0*(1/2) + 0*(1/4) + 0*(1/8) + 1*(1/16) + 0*(1/32) + 1*(1/64) + ... Then if we round that number, we'll get a floating-point number with the desired probability. But we can't just directly compute that sum one term at a time from left to right: once we get to 53 significant bits, further bits will be rounded away by addition, so the result will be biased to be `even', i.e. to have zero least significant bit. And we obviously can't choose uniformly at random from the space of infinite sequences of bits and round the result. We can, however, choose finite sequences of bits uniformly at random, and, except for a detail about ties, after a certain constant number of bits, no matter what further bits we choose they won't change what floating-point number we get by rounding. So we can choose a sufficiently large finite sequence of bits and round that, knowing that the result of rounding would not change no matter how many extra bits we add. The detail about ties is that if we simply choose, say, 1088 bits -- the least multiple of 64 greater than the exponent of the smallest possible (subnormal) floating-point number, 2^-1074 -- and round it in the default round-to-nearest/ties-to-even mode, the result would be biased to be even. The reason is that the probability of seeing a tie in any finite sequence of bits is nonzero, but but real ties occur only in sets of measure zero -- they happen only when *every* remaining bit in the infinite sequence is also zero. To avoid this, before rounding, we can simulate a sticky bit by setting the least significant bit of the finite sequence we choose, in order to prevent rounding what merely looks approximately like a tie to even. One little, perhaps counterintuitive, detail remains: the boundary values, 0 and 1, have half the probability you might naively expect, because we exclude from consideration the half of the interval rho^-1(0) below 0 and the half of the interval rho^-1(1) above 1. Some PRNG libraries provide variations on the naive algorithms that omit one or both boundary values, in order to allow, e.g., computing log(x) or log(1 - x). For example, instead of drawing from {0, 1, ..., 2^53 - 1} and dividing by 2^53, some - divide by 2^53 - 1 instead, to allow both boundary points; - draw from {1, 2, ..., 2^53} instead, to omit 0 but allow 1; or - draw from {-2^52, -2^52 + 1, ..., 2^52 - 2, 2^52 - 1} (i.e., the signed rather than unsigned 53-bit integers) and then add 1/2 before dividing, to omit both boundary points. These algorithms still have the biases described above, however. Given a uniform distribution on [0, 1], you can always turn it into a uniform distribution on (0, 1) or (0, 1] or [0, 1) by rejection sampling if 0 or 1 is a problem. See for source code. Thanks to Steve Canon and Alex Barnett for explaining the binary expansion approach to me and talking the problem over. [*] For the pedantic who may stumble upon this before reading ahead, we cannot, of course, assign a meaningful probability to a choice of real number in [0, 1]. What we can do, however, is imagine that it were meaningful, and then formalize it in the next sentence with the correct sense of measure. 2013-10-08 On compression in data formats (Permalink: https://mumble.net/~campbell/2013/10/08/compression) If you have a large message which you want to sign and encrypt with OpenPGP, you might compress it first. Or you might compress it second. Or, if you're not thinking much about it, you might `compress' it last, although if that actually reduces the size there are some cryptographers who would like to have a word with you. In any case, the OpenPGP message format lets you do any of these, because there is a type of OpenPGP packet for compressed data, whose content is interpreted as another OpenPGP packet. OpenPGP agents, such as GnuPG, are expected to recursively process packets they encounter in a message, decrypting ciphertext and verifying signatures and decompressing compressed data, until they hit a ground case, usually a literal data packet. Since messages are built up by starting with a literal data packet and layering encryption, signature, and compression atop it, this process should always halt at a ground case, right? Well, no. Decryption and verification (or, removing a signature) always yield smaller packets than you began with, so there has to be a ground case for those, but decompressing usually yields a larger packet than you began with. So you might play a cruel trick on your friend by sending a very small email with a compressed packet that decompresses to a terabyte of zeros. Or you could send an email with a compressed packet that decompresses to...itself. How does that work? The Lempel-Ziv compression language is powerful enough to write a quine -- that is, a program that prints its own source code. Rather than repeat the story here, I'll defer to Russ Cox's article on how to write these: Russ Cox, `Zip Files All The Way Down', 2010-03-18. http://research.swtch.com/zip In the case of OpenPGP, it was particularly easy because OpenPGP supports a number of compression algorithms including an option without any CRC, so one can write a program that just spits out an OpenPGP compression quine without iterating over CRCs or solving a horrible system of equations. The result is CVE-2013-4402, and the lesson is that systematic recursive compression is no good -- not only that it's not useful, but it is actively harmful. It's especially harmful for programs that process input sent unsolicited from anywhere on the internet, namely mail readers. And OpenPGP isn't the only data format that supports recursive compression. PGP/MIME, which recursively interleaves MIME entities and OpenPGP packets, can probably exhibit the same issue, and the small bound on recursion that GnuPG now imposes while processing packets will be thwarted by the interleaving. S/MIME 3.1 supports a compressed data message type, although nobody seems to have implemented it. I'm sure there are formats outside mail that can also involve recursive compression. Which ones can you find? [Update, 2013-10-10: Evidently (and unsurprisingly) this is not the first public disclosure of a vulnerability based on a compression quine. IPsec compression of RFC 3173 has the same problem, leading to CVE-2011-1547.] [Update, 2013-10-11: Werner Koch points out that your friend can prevent the terabyte of zeros by supplying the `--max-output' argument to gpg to limit the size of the output GnuPG produces when processing a packet. Those curious to reproduce the exploit can find source code at to generate the compression quine.] 2013-03-21 Fun with C (or, @!*#%!#^&!^#$^@!%#) (Permalink: https://mumble.net/~campbell/2013/03/21/fun-with-c) Consider the following fragment of C code with the type T defined to be an integer type: uint64_t x = UINT64_C(0x0123456789abcdef); T n = 64; printf("%016"PRIx64"\n", (x & ~(n - 1))); Say we're on amd64, with sizeof(char) = 1, sizeof(short) = 2, sizeof(int) = 4, and sizeof(long) = 8, and two's-complement arithmetic -- nothing exotic or pathological here. Without running the code, can you guess what it prints for each of the following definitions of the the type T? typedef uint8_t T; typedef int8_t T; typedef uint16_t T; typedef int16_t T; typedef uint32_t T; typedef int32_t T; typedef uint64_t T; typedef int64_t T; Hint: It's not 0123456789abcdc0 in every case. 2012-10-31 Moving mathematics forward I have made a tremendous breakthrough tonight in the greatest open problem of all mathematics, the Riemann hypothesis, by finding an example of the elusive Riemann zeta pumpkin. Based on my extensive research into the literature on the subject[*], I believe I am the first to have come across such a beast. Unfortunately, I don't know how to talk to the camera with which I catalogued my evidence of this critter, so proof by photography will have to wait until I figure out how to talk to a USB PTP or MTP device in NetBSD. [*] 30 seconds of browsing Google image search results. [Update, 2012-11-02: I tried gphoto2 and libmtp, but neither worked. gphoto2 can identify the device (`gphoto2 -a'), but everything else fails with a `bad parameter' error or something. libmtp can't even seem to identify that there is a device. ktrace just shows that ioctl(USB_DO_REQUEST) fails with EIO or something; I think I need to dig into the USB layer to find out what's wrong. I'll add more details when I have physical access to the device again.] [Update, 2012-12-02: Here's a photograph: https://mumble.net/~campbell/images/riemann-zeta-pumpkin.jpg I'm afraid I cheated, though.] 2012-10-17 What's an epigraph? An anonymous interlocutor asked me to define the word `epigraph' some months ago. An epigraph's a witty pith, or perhaps a pithy wit, That you could start a chapter with to prime the reader's mind a bit. 2012-10-04 My verse may be worse but your prose hurts my nose. (dribblings from a late night when bad poetry was coming out like a fit of sneezes) 2012-08-24 What's wrong with Python (Permalink: https://mumble.net/~campbell/2012/08/24/wrong-python) I just found this Python code in the wild: f = open(pathname) try: x = foo(f) finally: f.close() process(x) This is wrong at so many levels I don't know what to say. And it wouldn't be fair to blame whoever was responsible for putting the code in this state, because this is very nearly the right way to write it in Python, and Python will happily accept this in spite of its insensibility. 2012-05-23 Energy and computation Critics have noted that the premise in _The Matrix_ of using the heat of human metabolism as an energy source, the energy premise, is hopelessly contrived and detracts from the film. And, indeed, it is hopelessly contrived, but this doesn't bother me because I have a much better, more plausible premise to replace it -- and a reason why they would never have used the better premise in the film. Instead of the energy premise, I take a computational premise: what the machines exploit the humans for is not energy but computation -- human brains are very good at learning quickly and optimizing approximate solutions to NP-hard problems. That way, the machines don't need to reverse-engineer the human brain or develop (`learn') approximation algorithms of their own. The computational premise, though, doesn't lend itself to a cute image of Morpheus holding up a D-cell and saying `...turning humans into this.' The energy premise is easy to explain, if contrived; whereas it would be a hopeless cinematic blunder to attempt to explain the computational premise to a general audience in Morpheus's brief monologue to Neo. Thus, I am satisfied with a better premise for the film, and forgive the creators for not using it. 2012-05-21 2012-05-17 Metre matters more to me than rhyme. Teeter tatters tore to tea, then thyme. ...wait, does that sound rhyte? 2012-05-13T07:34:55Z Is a poem like a joke in that if the meaning must be explained, then the value is lost? 2012-04-24 Mail Mail is hosed. Unreliable queues, mail stores without integrity checks or sensible indexing, broken flags, hopelessly confused MUA designs, and more... Argh! 2012-03-15T13:25:46Z Random `Fix random kernel memory corruption by algo_doublehash(). And by "random" I don't mean just "arbitrary" as in using an uninitialized pointer, but random as in corrupting the contents of memory addresses chosen using a crypto-strength random number generator.' 2012-03-15T06:23:47Z Beware the IDEs of March Beware the IDEs of March, my son -- The tools that bite, the builds that botch. Beware Eclipse, Emacs, and shun The kludgious Binfordsnotch! ...what? 2012-01-22 Laptop I have been asked a few times what laptop I chose to buy last year. The answer is that I didn't. The laptops I was most strongly considering were the Thinkpad W510 and the Sony Vaio Z. But I got fed up with shopping; instead I got a new disk for an old MacBook1,1 that I had lying around, and fixed some bugs in NetBSD to make it run on that. 2011-12-19 The FM Principle It often amazes me to marvel at just how much of computing is founded upon the FM principle. Woodpeckers. 2011-10-29 Antediluvian barmy curmudgeons! Desolate ergatives fuming their grudgeons! Heckling isthmus of jocular kerosene! Lepidopteric morose nitrotoluene! Peripatetic Orphean qwertigial Runcible syrup of toad urns vestigial! [If you comment on this article, please be sure to mention bananas.] Winkle-picked xanthines, ytterbium seder of Zoölogues, zits, Zapatistas, &c.! 2011-08-{29,30} paredit patches There's a pile of patches potentially for paredit 23 at . Try them out and let me know what they break, or, in the unlikely event that they don't break your whole editing feel, whether you like them, and what cases you especially like in them. That way, I can write down the cases so they won't be broken again in the future. 2011-08-29 Crypto library I am thinking of writing a crypto library to complement djb's well-known one by providing some bits he omitted. I will call it 1-[5-(1,3-benzodioxol-5-yl)-1-oxo-2,4-pentadienyl]piperidine. 2011-05-09 FOCUS THEFT SHOULD BE A FELONY !@*#^$@!&*#^@%!*#@$^%!#&*@!% 2011-03-32 New job, project status I have accepted a position at Microsoft in the finance department, working on in-house tools for optimizing Microsoft's tax structure. I am moving to Redmond; I fly out tonight. Anyone in the Seattle area know any excellent seafood restaurants? My new manager has asked me to port paredit to Visual Studio so that we can use it in-house, and since I have only limited time for personal projects, I shall be dropping support for paredit.el in GNU Emacs (and paredit.scm in Edwin). Today I completed and committed a long-running project to switch MIT Scheme to use dynamic scope by default, in preparation for the upcoming Scheme standard produced by the working groups; and I also committed my (incomplete) patches to change Edwin's default key bindings to match modern vi conventions. 2011-02-24 Paredit on old Emacs Does anyone still use paredit on XEmacs, or on GNU Emacs 21? 2011-01-17 Laptop I am looking to buy a new laptop. But I am picky. At I wrote down a lot of constraints. Can anyone help me to satisfy them? 2011-01-14 PGP I'm pretty frustrated with GnuPG. Straightforward operations (decrypt a message -- no, really, just *decrypt* it) are not straightforward to perform with the gpg utility. There are half a bazillion knobs controlling gpg's behaviour so that straightforward command lines do not have straightforward semantics. And half the operations can be done only interactively -- by which I mean not just that it prints prompts, but that it prints them to /dev/tty and reads input from /dev/tty. It's not obvious what the gpg utility is actually doing whenever I use it, especially when it prompts me for a pass phrase to decrypt one of my keys to make a signature without telling me what it's signing. Just about everything keeps state in several different files in one's GnuPG home directory; it's not clear what operations are actually stateless. There is no good introduction to the meaning and use of the trust model, or any coherent or interactive picture for it. The output of `gpg --list-keys', `gpg --list-sigs', `gpg --list-packets', and so on, is confusing on multiple levels. Here are three things I'd like to see in an OpenPGP implementation: 1. A collection of utilities to perform straightforward operations straightforwardly. Examples: a utility to generate an RSA key pair; a utility to decrypt a packet given the Elgamal private key; a utility to make each particular type of signature; &c. 2. A packet inspector. This could be graphical (maybe Gtk+), or it could be embedded into Emacs. It would replace `gpg --list-packets', and allow the user to interactively explore the packets, see all the information stored therein, decrypt messages, verify signatures, and so on. Think Wireshark for OpenPGP. 3. An interactive truth maintenance system. Fundamentally, a PKI is a system for distributing and interpreting signed assertions, such as `J. Random Luser is the owner of key 1234abcd', signed with key 1234abcd. An OpenPGP key ring is a pile of public keys and signed assertions. The cryptographic signatures make the assertions as plausible as they would be if made in person, but we need a computer to handle the crypto and draw plausible conclusions from the premises we specify. But we may find some of the conclusions suspect; in that case, we want to ask why the computer drew those conclusions, and what premises we must reject if we want to reject the conclusions. Or we may decide that we're not happy with some of the premises, and we'd like to see what conclusions lie on shaky grounds -- we'd like to see what conclusions we give up if we reject the premises. These are exactly the functions of an interactive truth maintenance system. The GnuPG trust model is essentially a set of inference rules, which could be used with such a truth maintenance system. But gpg provides no tool to explore the `web of trust' that its trust model induces; it just has an implementation of the engine to infer conclusions given some premises. Perhaps some day I'll actually write some of these, or find that somebody already has. P.S. It's hard to google anything about PGP. Google turns up `----BEGIN PGP SIGNED MESSAGE-----' a lot more than it turns up useful results. P.P.S. I have glanced at netpgp, a relatively new implementation of OpenPGP based on openpgpsdk that will probably be part of NetBSD 6.0. Its behaviour does not appear substantially different from that of GnuPG, though, other than that netpgp is less polished so far. 2010-11-04 Gtk+ In a Gtk+ program I use regularly, while entering text, I often accidentally hit C-w out of habit from Unix to kill the preceding word. Instead of killing the word, this closes the window, to my interminable frustration. It's not as bad as C-M-DEL zapping X when I want to kill the preceding S-expression in Emacs, but it is frustrating enough that I sought to fix this. It turns out that if you put `gtk-can-change-accels = 1' in your ~/.gtkrc-2.0, hover over the menu item to which the key is bound, and hit DEL, then the key binding goes away. Is this documented? Yes! Well, no. Well, sorta kinda, maybe? Actually, not really. When I googled `"gtk-can-change-accels"', the first hit I got was from gtk.org, and appears to be the only actual documentation relevant to the subject. All the other hits were from mailing list archives, forum posts, blug posts, and FAQs from various applications, saying that one can use the option to do such and such. So what does this actual documentation say? Well, first of all, it's API documentation, not remotely fit for user consumption. And here's all that it has to say about the option: `Whether menu accelerators can be changed by pressing a key over the menu item.' Does it mention hitting DEL? No. Does it say how to quote DEL, in order to bind it to an action? No, of course not! Does it mention ~/.gtkrc-2.0? No. In fact...is there any documentation at all for ~/.gtkrc-2.0? No! Not even API documentation! You just have to know about it! Googling it just turned up more mailing list archives, forum posts, blorg posts, and FAQs with tips about how to do such and such by frotzing your ~/.gtkrc-2.0 file... 2010-10-09 Paredit 22 Paredit 22 is released: Not all the patches got incorporated into it. Patches potentially for paredit 23 are in . 2010-09-20 Paredit 22 beta I have merged the aforementioned patches into a new paredit 22 beta, at as usual; lists the changes as the output of `darcs changes' on the Darcs repository at . Nobody has opined about a paredit mailing list or offered to administer one for me, so I presume there's no interest. 2010-01-01 Paredit patches, still Not enough users have given me feedback on the patches to paredit in . Thus, paredit 22 has not yet been released. See the (chronologically) previous (but textually next) entry. [Update: Since the time of writing, the aforementioned entry has self-destructed.] Should there be a mailing list for paredit? Contact me if you have an opinion on this question, or would like to offer to administer such a mailing list. 2009-03-28 More simply addressing the confusion about UNWIND-PROTECT in Scheme Suppose you want to release a resource after a part of a program that uses the resource. For example, consider a connection to a database that should be disconnected, a file descriptor that should be closed, or a buffer for sensitive data such as a pass phrase that should be destroyed by zeros or random data. Once the resource is released, it can be neither reused nor readily reconstituted. Is the resource represented in Scheme by a single object? If so, ensure that with the object is registered a finalizer that releases the resource, so that even in the face of an interactive ^C at the REPL the resource will be released. All of the above examples qualify. (define (open-database-connection database-descriptor) (let ((handle (allocate-database-handle))) (register-finalizer handle %close-database-connection) ;; After this point, if we are interrupted -- say with ^C at ;; the REPL --, somewhere the handle will be recorded so that ;; its resources can be released even if the usual logic to ;; close the connection is never reached. (%open-database-connection database-descriptor handle) handle)) Is it /necessary/ for the resource to be released immediately after the part of the program that uses it? If so, use REWIND-PROTECT at . In this case, control can never re-enter the extent that uses the resource. Of the above examples, only a buffer for sensitive data qualifies. (define (call-with-pass-phrase prompt receiver) (let ((buffer (make-pass-phrase-buffer))) (rewind-protect (lambda () (prompt-for-pass-phrase prompt buffer) (receiver buffer)) (lambda () (clear-pass-phrase-buffer buffer))))) Is it /desirable/ for the resource to be released immediately after the part of the program that uses it? If so, use UNWIND-PROTECT at . Of the above examples, both the database connection and file descriptor qualify. (define (call-with-database-connection database-descriptor procedure) (let ((handle (open-database-connection database-descriptor))) (unwind-protect (lambda () (procedure handle)) (lambda () ;; If a continuation is captured inside PROCEDURE so that ;; this cannot execute immediately when PROCEDURE returns, ;; then closing the database connection here is redundant, ;; since registered with HANDLE is a finalizer that will ;; close the connection. However, provided that the ;; finalizer have no effect if the connection is already ;; closed, this redundancy is harmless and enables ;; UNWIND-PROTECT to close the connection immediately if ;; no such continuation has been captured. (close-database-connection handle))))) Rationale: Scheme's CWCC (or composable continuations) enables powerful control abstractions such as inverting control. For example, (FOR-EACH ) applies to each element of in sequence. With CWCC, one can reify the iteration of the list by FOR-EACH into a first-class cursor object that can be advanced independently. This way, for example, we can traverse multiple lists in parallel using only the one-list FOR-EACH procedure. If we rather consider a procedure (FOR-EACH-QUERY-RESULT ), the situation is not much different, except that FOR-EACH-QUERY-RESULT may entail the opening of a connection to a database, and the connection is a resource which should be released after the query results are finished being received. Using CWCC to invert control of FOR-EACH-QUERY-RESULT is just as valid as using CWCC to invert control of FOR-EACH, but if FOR-EACH-QUERY-RESULT uses REWIND-PROTECT (a common idiom in unfortunately much Scheme code), it inhibits control inversion. UNWIND-PROTECT allows the control inversion. Why involve finalization in UNWIND-PROTECT, rather than just associating a finalizer with the object representing the connection to the database? The difference is in the intent implied by the use of UNWIND-PROTECT that if control cannot re-enter the protected extent, the protector can be immediately applied when control exits the protected extent extent even if no garbage collection is about to run finalizers. Note that one cannot in general reconcile (1) needing to release the resource immediately after control exits the extent that uses it, and (2) enjoying the benefits of powerful control abstractions such as inversion of control. However, if (1) is relaxed to (1') needing to release the resource immediately after control /returns normally from/ the extent that uses it, then one can reconcile (1') and (2) by writing (CALL-WITH-VALUES (LAMBDA () ) (LAMBDA RESULTS (APPLY VALUES RESULTS))), or simply (BEGIN0 ) using the common name BEGIN0 for this idiom. (In Common Lisp, this is called PROG1.) Also note that condition systems are not involved; they are a red herring. Any condition handler that releases a resource should do so only when the condition implies that the resource is no longer usable; for example, if a database connection becomes desynchronized somehow and cannot proceed, then it is reasonable for the handler for this condition to close the connection and destroy any of its state. However, when handling conditions that do not imply that the resource is no longer usable, the resource should not be released. An enclosing handler may continue within a protected extent, or even exit and re-enter the protected extent, for instance while iterating over a reified database cursor procured by control inversion. Handling any condition by releasing the resource defeats these useful operations. At is an implementation of UNWIND-PROTECT for MIT Scheme. This definition is lighter-weight than the portable one. However, it does not attempt to run the protector immediately if it can. Appendix A: John Cowan has asked for elaboration on the inversion of control for iteration with FOR-EACH. (define (list->stream list) (iteration-procedure->stream (lambda (receiver) (for-each receiver list)))) (define (iteration-procedure->stream iteration-procedure) (lazy (call-with-current-continuation (lambda (return-stream) (iteration-procedure (lambda (element) (call-with-current-continuation (lambda (return-to-iteration) (return-stream (stream-cons element (lazy (call-with-current-continuation (lambda (return-cdr) (set! return-stream return-cdr) (return-to-iteration)))))))))) (return-stream stream-nil))))) ;;; Simpler definition with reset & shift. (define (iteration-procedure->stream iteration-procedure) (lazy (reset (begin (iteration-procedure (lambda (element) (shift continue-iteration (stream-cons element (lazy (continue-iteration)))))) stream-nil)))) Appendix B, discussion question: Should UNWIND-PROTECT evaluate protection forms in the dynamic context of the call to UNWIND-PROTECT? This rules out using UNWIND-PROTECT within REWIND-PROTECT, which may be an onerous requirement. 2009-03-21 On file systems, Unix, and brain damage [updated] Earlier this month, users of the alpha version of Ubuntu, the first to support the ext4 file system by default, reported an immense increase in data loss after crashes leading to corruption of their desktop environments. After a crash, many files that were recently written were truncated on disk to zero bytes.[1] How did this happen? To atomically replace a file's old data with new data, applications write the new data to a file linked at a temporary pathname, and then rename the link to the permanent pathname, overwriting the link to the file with the old data. This sequence of operations, with the atomicity guarantee of the POSIX rename() function, guarantees that at any time, at the permanent pathname will be a link either to a file containing the old data or to the file containing the new data. This guarantee is made, at least, during normal system operation. POSIX guarantees nothing when the system crashes or the power fails. If the system is even bootable after such an event, POSIX permits it to have replaced the contents of the file system with a collection of cat macros. But while this behaviour is permitted, it is not useful. System crashes and power failures are facts of life, and a well- engineered file system should limit, not exacerbate, their damage to user data. In the name of performance, the ext4 file system as it was bundled with the alpha version of Ubuntu would delay the allocation of blocks on the disk for a file so that they can be allocated contiguously together when needed. Periodically the system forces allocation of blocks and commits them to the disk; by default, this happens on sixty-second intervals. But the ext4 file system would sometimes commit renames to the journal before allocating blocks for the new data, during a sixty-second window. If the system crashes, and the journal is replayed, then the rename may be applied -- before the new data are written to the disk. This effectively reorders the sequence that many applications rely on to atomically replace files, destroying its atomicity. The developers of the ext4 file system answered with several `solutions': [2], [3] (1) make your systems more reliable with UPSs and by avoiding buggy proprietary drivers that cause such frequent crashes, which the developers of the ext4 file system don't encounter; (2) use SQLite and other databases instead of writing files to disk; (3) change all the applications that do write files to disk so that they fsync before closing and renaming; and if that becomes a performance problem then (4) run fsync periodically in a separate thread. Solution (1) is no excuse for a reliable file system, and is a dangerous sentiment to be held by the developer of a file system likely to find widespread use in personal computers running Ubuntu, computers which may not have the resources available in data centres for reliability. This suggests that the developers of the ext4 file system have never seriously stress-tested it with forced power cycles during heavy I/O loads! About solution (2), and the general feeling of `we like databases!', I can only refer to jwz's rant about mbox summary files. Technical merit aside, solutions (2) and (3) are impractical because there are simply too many programs to change. Solutions (3) and (4) are disturbingly misguided. Even if it were practical to change every application's write/rename sequence into a write/fsync/rename sequence, to do so would be semantically wrong. Only programs that require a file's data to be written to permanent storage before proceeding should use fsync; this is what fsync does. For example, editors such as Emacs fsync files before informing users that the files are written to disk; users are unhappy if the files that their editors told them were written to disk were not. Mail receiving agents fsync files where they store mail before telling the sending agents that the mail has been accepted; users are unhappy if they are told that mail was delivered when it was not. Programs that require only that the replacement operation be atomic, and that it not leave a file in a corrupted intermediate state, should not use fsync, because the purpose of fsync is to provide a guarantee which these programs do not need. For example, the KDE window configuration is not critical user data, but may be stored on the file system. It is OK for the machine to forget that a user has dragged a window a few pixels, but it is not OK for KDE to read a corrupted intermediate state of the file between the old and new versions -- hence atomic write/rename replacement without fsync. Extraneous fsyncs, in fact, degrade performance, especially for small files that are frequently overwritten -- it gives the file system no opportunity to keep the intermediate versions solely in memory. Calling fsync periodically in a separate thread is nonsense -- if a program cannot proceed until a file's data are on permanent storage, it should call fsync and wait; if a program can proceed, then it shouldn't call fsync. The operating system already periodically synchronizes the file system with permanent storage -- at least, one would hope so; if it doesn't do this, then it has a bug. Actually, fsync does not guarantee that a file's data are written to permanent storage before proceeding. POSIX (or at least the SUSv3, ) claims that it shall: The fsync() function shall request that all data for the open file descriptor named by fildes is to be transferred to the storage device associated with the file described by fildes. The nature of the transfer is implementation-defined. The fsync() function shall not return until the system has completed that action or until an error is detected. Many man pages echo this sentiment. For example, NetBSD's fsync(2) man page begins with: fsync() causes all modified data and attributes of fd to be moved to a permanent storage device. But on AIX, Darwin, Linux, NetBSD, and probably other Unices as well, fsync does not actually guarantee that the file's data are written to permanent storage. Instead it only nudges the file system to force some blocks associated with the file slightly sooner. If the file system is backed by a hard disk device that has a write cache, the blocks might just end up in the write cache -- which is volatile, and which may evaporate in the case of a power failure. Common protocols by which operating systems communicates with storage devices have commands to request that the storage devices force their write caches to permanent physical media. Some devices ignore these commands, in which case one cannot win, but the commands can be sent nevertheless, and this is the best effort that the operating system can make. And AIX, Darwin, Linux, and NetBSD all provide ways for programs to request this -- different ways, that is. AIX and NetBSD provide a system call fsync_range, which has a flag FDISKSYNC (NetBSD) or O_DSYNC (AIX) that requests any blocks associated with the file to be forced to permanent physical media. Linux provides a fancier but similar system call sync_file_range. Darwin provides an fcntl command F_FULLFSYNC. These approaches are all slightly different, and for file descriptors not backed by permanent storage devices they all report different errors. So just to force a file's data to disk reasonably reliably on at least four different Unices and to ignore the cases of file descriptors not backed by permanent storage, it takes a good seventy lines of code littered with cpp conditionals, together with autoconf or autoconf-like detection of the relevant system calls. Ugh! Fortunately for the hapless users of Ubuntu, the developers of the ext4 file system have patched it so that renaming a link to a file that overwrites a link to another file forces any pending data for the first file to be sent to the disk before the directory structure is updated to reflect the rename. This may not guarantee that the data are actually, really, truly on permanent physical media, but it's better than waiting sixty seconds before they are even delivered to the device that will eventually write them to permanent physical media. But this kerfuffle has not inspired confidence in me about the ext4 file system. In [3], the developers of the ext4 file system make such absurd suggestions as making a laptop mode in which the kernel ignores fsync, so that even those applications that use fsync semantically correctly will not have the very guarantee that is its sole reason for existence! When did performance become a concern overriding robustness in file systems? [1] http://bugs.launchpad.net/ubuntu/+source/linux/+bug/317781 [2] http://bugs.launchpad.net/ubuntu/+source/linux/+bug/317781/comments/45 [3] http://thunk.org/tytso/blog/2009/03/15/dont-fear-the-fsync/ [Update, 2009-03-26: I have been pointed to Linus Torvalds' ideas on the subject, archived at . I hope that the ext4 developers take heed of his sentiment.] [Update, 2012-01-22: All file systems are hosed. Some are just more obviously hosed than others. Argh.] 2009-03-21 Correction to the optimization of resource release in control brackets Due to an unstoppable deluge of the concern of readers after the 2009-02-04 correction, I am obliged to correct the optimization in the original 2006-05-01 entry. The correct optimization is to associate a flag with each wind point which is by default clear and which will be set by CWCC or by any procedures that reify composable continuations. (This is not an onerous requirement: invocation of the reified continuations must run in time linearly proportional to the depth of the wind point tree anyway.) If the flag is clear when the after thunk is called, then the finalizer can be run immediately. 2009-02-04 Correction to control brackets and resource release It has come to my attention that I identified the wrong criterion for when protector thunks established by UNWIND-PROTECT may be called. The criterion I stated is that control may never again return from the protected thunk. The correct criterion is that control may never again re-enter the protected thunk. (Consider, for example, a coroutine that does not terminate and return.) The correct definition for *UNWIND-PROTECT in the appendix is: (define (*unwind-protect thunk protector) (let ((protector-cell (make-protector-cell protector))) (dynamic-wind (lambda () (identity-procedure protector-cell)) thunk (lambda () unspecific)))) In Scheme systems that provide an REGISTER-FINALIZER procedure, which requests the garbage collector to apply a procedure when a given object's storage is reclaimed, one might write: (define (*unwind-protect thunk protector) (let ((object (cons 0 0))) (register-finalizer object protector) (dynamic-wind (lambda () ...refer to OBJECT to prevent reclamation of its storage...) thunk values))) 2009-01-04 I forgot about the 2007-mm-dd entry. I'm not sorry. 2007-mm-dd On macros and expression in Scheme [Placeholder for a long-promised entry which I haven't yet forgotten about.] 2007-08-23 On the Theatrical Security Administration's cleverness During my flight to Seattle one month ago, my case containing a guitar was `randomly' selected to be searched, and the TSA left a gift for me inside: an Orwellian notice about protecting me and my fellow passengers by randomly violating our constitutional rights (which might, they add, involve accidentally damaging the bags, with no liability on the TSA). Flying back to Boston, I left a prominent copy of the Fourth Amendment to the US Constitution in the same case -- specifically, in the body of the case. When I opened up the case this morning, I found no notice from the TSA, but the slip of paper on which was written the Fourth Amendment was tucked into the very top of the case, where the headstock of the guitar goes. The paper does not fit through the neck of the case; it could not have slid there by its own volition, even when tossed by the roughest of baggage handlers. I'll shortly discover whether the TSA's X-ray machine affects ASA 400 film, too. The TSA goon who demanded to put my camera and its film through that machine did let pass an unintentionally helpful hint, though: to avoid damaging any film, just tell the TSA goons that its speed is ASA 1000! 2007-07-16 On debugging in Scheme [updated 2046-03-32] The two Scheme systems that I use primarily these days are MIT Scheme and Scheme48 . Both have decent debuggers that work for most cases. Both have facilities for limiting output to avoid excessive output for deeply nested data structures, or infinite recursion in the printer for circular data structures. Both have rudimentary but working mechanisms for examining pretty much all data in the system. (I can make objects that Scheme48 and MIT Scheme fail to inspect, but most people can't, and this is an uncommon operation.) Let me make this clear right now: these are rudimentary tools. They have some minor niceties (for instance, the usual mode of operation in MIT Scheme's debugger is to have a REPL within every stack frame), but these minor niceties are not especially profound. Rudimentary as these tools are, though, they *work*, and they suffice to examine and debug any reasonable program. And MIT Scheme and Scheme48 are nearly unique in the respect of having general, if rudimentary, debugging tools. This baffles me. This flummoxes me. This flabbergasts me. (No, this is not something new; this is something that I have been aware of for a long time, and privately fumed about.) Today I wanted to debug a program in Gambit. Gambit, too, has a rudimentary debugger. Yet it completely fails in two prominent ways: (1) There is no limitation to the output at the REPL. (The values of local variables on stack frames are printed truncated, but that doesn't help if I want to inspect them more thoroughly.) Although it is possible, with a bizarre incantation about *read* tables (boggle) to protect the *writer* against circular output, there is no apparent provision for limiting the depth of the output, and the default is for the writer and pretty-printer to lose horribly with infinite recursion until you hit C-c. And the default format for printing records involves recursively printing all of the records' components. Graphs? You lose. (2) There is no object inspector by which to interactively inspect the components of objects without having to grovel through megabytes of output from an infinitely recurring writer. No, there is not even a way to inspect the local variables of a closure found on the stack. In fact, I couldn't even find a way to look at the condition that was signalled. And Gambit is better than most other Schemes. Chicken will print a spare trace of call frames that were on the stack. That is all. It does not even show the nesting of the frames; garbage frames that have already returned, but are retained on the stack because of Chicken's architecture, are shown along with everything else. And the stack is traced *after* the error message is printed, so if the error message printed a complex object, what the poor, deprived programmer will see as a trace of the call frames on the stack is procedures used to print the error message -- information that is completely irrelevant to the actual problem. I wrote an object inspector for Chicken (the `dissector'), but it's useless because (1) records no longer contain any introspective information, and (2) it is impossible to get at the objects in call frames on the stack. DrScheme will give a helpful display of a flurry of arrows covering the whole source file in question, so that programmers can see by what flow of control they arrived at the error. Flow of data? What's that? If you plan to create a new implementation of Scheme which you mean to be useful to other humans, *please* implement a general debugger and object inspector. The state of debugging tools in Scheme is absolutely appalling and utterly inexcusible. (I'm told that Chez has a general, if rudimentary, debugger as well; because Chez is commercial and proprietary, however, and the free software Schemes out there meet my needs far better than Chez does anyway, I don't have a licence for Chez with which to experiment with its debugger.) 2007-07-15 On the public domain, licence terms for software, and practical legal choices I am not a lawyer. This is not legal advice. I have not pursued legal studies, and have no background in civil litigation, copyright law, or contract law. (The terminology that I employ in this article will without a doubt induce any lawyer to cringe.) For that matter, I'd rather devote my time to hacking than to burying my head in legal studies for decades: Scheme is a much nicer language for expressing ideas concisely and clearly than English, but unfortunately our laws are expressed in English. It is precisely because I am more interested in hacking than legal analysis, in fact, and because I should prefer never to worry about embroilment in courts of law as a consequence of careless decisions, that I write now. For the past couple of years, I have been releasing my publicly available software into the public domain. I ought to be precise, however: I have been releasing said software with a message claiming that it is in the public domain. I ignored remarks equivocating the notion that any works can be released into the public domain, my reason being that it was simply absurd for an author to be unable to forswear any and all claims of copyright: surely, if an author has no intention of claiming copyright, she ought to be able to relinquish the right of doing so. And the simplest status for works lacking copyright claims is release into the public domain. Not only did I place message stating that the code was in the public domain, but I also added a disclaimer of all warranties -- a rather inept and negligent disclaimer, which read `All warranties are disclaimed.'. My reason for this was that I preferred to avoid cluttering all of my files with long, uppercase, distracting disclaimers of all warranties that some lawyer at Berkeley could think of and enumerate; and that I preferred also to avoid requiring the distribution of a separate LICENCE file with every other file I released, purely for the sake of laziness. What I failed to consider, though, was the actual implications of these messages. Consequently, on realizing this, I briefly considered the following line of reasoning, after encountering another suggestion that works may not clearly be released into the public domain: 1. What matters in the preamble to the software is what terms a court of law will accept and base decisions upon. 2. If a court of law will not accept a statement that the software is released into the public domain, there is no reason for it to prefer the acceptance of a statement of licence terms: one could imagine two copies of a file, one with the terms and one without, and of neither one could anyone claim authority. My second point was faulty, though, because it questioned only witness credibility, which is an unrelated issue; and I omitted a third point, on contract law, warranty, and gifts. I honestly don't know whether a court of law will accept the preamble to a source file, but my guess is that it probably would. The more important question is whether a user (perhaps some corporate entity -- whether good or evil, according to your favourite definition of that spectrum --, or some governmental entity, or anyone) can have confidence in the terms set by the author. There are two aspects of the software in which users may wish to have confidence: (1) the users want to be confident that the copyright and licence terms are fixed, and that the author will never pursue or even threaten litigation for violations of licence; and (2) the users may also want the power to pursue the author for damages if the author's software fails at the cost of the users. Contrariwise, the authors of free software (except perhaps djb) prefer to disclaim themselves of any such warranties. [This entry is not yet finished. Sorry.] Summary: The public domain is murky, even if it may seem to make common sense. I prefer the modified (3-clause) BSD licence for similar terms and greater protection, or at least a clearer idea of the protection for all parties involved. It is also fairly concise. Furthermore, it has been examined by lawyers. [I don't know whether it has been tested in court, and don't have the time or inclination to research this in greater detail. Any references to cases involving the 3-clause BSD licence would be welcome, though.] I repeat, though, that I am not a lawyer, and that this is not legal advice. If the terms of the modified BSD licence are not to your taste, John Cowan has a utility for choosing another one by some simple criteria: . 2007-07-14 Summarizing several issues with compilation and interpretation I ran out of steam writing last month about compilation and interpretation, so here I'll try to summarize as succinctly as I can a few points about which a disturbing number of people are confused. Look to the previous entry for definitions of `interpreter' and `compiler'. 1. Most implementations of Lisps are not pure interpreters; most are based on compilers in some way or another. Call them implementations, not interpreters or compilers: they may *have* interpreters or compilers, but they are not themselves interpreters or compilers. 2. A `compiler' is not a tool chain that turns source code into stand-alone executables. A compiler is a tool that analyzes the meaning of programs and generates programs in another language with equivalent meaning. 3. Your physical machine is an interpreter. The universe is a kind of interpreter as well. 4. No compiled program is `stand-alone'. Not only do compiled programs still require interpreters to execute them, but they also must be structured in the appropriate format for the intended operating system under which they will run, and they generally require support from some run-time libraries. It just so happens that most computers that most people are interested in developing programs for have libc built-in. It also happens that most computers don't already have the run-time libraries needed by Lisp programs. 2007-06-19 Concerning compilation and interpretation, and an interpretation of the compilation of confusion surrounding their meaning and execution I often find myself in an unpleasant proximity to a bickering arglebargle about whether compiled languages or interpreted languages are better; whether Java is compiled or interpreted; whether Sun's Java implementation (or others) is a compiler or an interpreter; whether Lisp can conceivably be compiled, if it has EVAL built-in; how Lisp systems like SBCL can be compilers if they can't spit out files that someone's favourite operating system can execute (which is, in fact, not really true); and other matters of confused frivolity concerning compilation and interpretation. Sometimes I even find a participant or two in the discussion who tries to allay the heat by suggesting that the distinction between compilation and interpretation is murky at best, citing examples like Java implementations that involve both Java to byte-code compilers and byte-code interpreters, or modern x86 implementations, which translate the x86 instruction set every time the i-cache is refilled into an internal more RISC-like format to execute. Sometimes I even do this myself, if I am in a hurry and lack the time to explain the subject more precisely. Let us commence with definitions: /Compilation/ is the analysis of a program's meaning, rendering the same semantics into another language. A /compiler/ is a total function from programs to meanings. /Interpretation/ is the execution of a program given the state of the machine, either diverging or yielding a final answer. An /interpreter/ is a partial function from programs and machine states (or simply machine states, if the program is a part of the machine state) to answers. (Programming language theoreticians in the audience will recognize that compilation corresponds with denotational semantics and that interpretation corresponds with operational semantics. I hope, however, that such members of the audience are already adequately familiar with the concepts that they need read no further. I hope.) A usable language implementation that is based on a compiler also requires an interpreter. When most people hear about `compilers', they are hearing about compilers that map programs in languages like C to machine programs, i.e. programs in languages for the instruction sets implemented by physical processors such as the x86, the PowerPC, the SPARC, the PDP-10, and so on. [This entry is not yet finished.] 2007-05-13 SLIME48 (Apologies: the post prognosticated by the previous preparation has been permanently postponed.) After several long months of version incompatibility, there is a new version of SLIME48 available that will work with CVS SLIME as of 2007-05-13. A tarball is available here: . Bug reports are welcome. Further information is at . 2007-05-09 In preparation for a clarification of macros in Scheme, and their implications on style This blag has remained dormant for a very long time. I must deeply apologize to my avid readers who no doubt were left in a state of prolonged mourning for the stagnancy of their only worthwhile source of meaning in their lives. In a short while, however, no longer will you poor souls remain cold, alone, and bereft of your font of Schemerical wisdom, for I shall soon bring flames upon the now cowering subject of macros in Scheme -- from DEFMACRO to SYNTAX-RULES, and all in between and beyond --, and their philosophical implications on style and expressiveness of programs! For you who cannot stand to wait until the weekend (when I hope to find time for this inflammatory ecriturial extravaganza), I give you a brief taste, the draft of a Lisp style guide whose composition I commenced this past weekend: 2006-09-16 On networking interfaces in Scheme Most Scheme systems have very limited networking interfaces, usually with the notion that BSD sockets are not nice, and that `high-level' verbs like `open a TCP connection to this IP address, on this port' are nice. This is a bad idea, and Scheme systems ought to support wrappers around the whole BSD sockets API. The BSD sockets API is really a fairly extensive generalization of many different flavours of sockets, classifying them according to types with common operations (stream sockets, datagram sockets, &c.), across many different networking domains (IPv4 and IPv6 being the most prevalent today, but of which there are many others, some of which remain useful, like local Unix sockets). But instead of taking advantage of this unification set down by BSD sockets, most Scheme systems *specialize* the interface to provide wrappers for TCP (i.e. internet stream sockets), and sometimes UDP (i.e internet datagram sockets), and sometimes local Unix sockets (with one of two types). In the first place, this is poor practice simply because of the amount of work necessary for the same returns. In essence, the specialized wrappers tear down a layer of abstraction merely by pretending that it isn't there -- while actually building on top of it. The result is that different specialized wrappers must be made for everything intended to be wrapped, and a great deal of code must be duplicated. There is a great deal of C code out there, used as the basis of Scheme networking libraries, that duplicates the same system calls just with different constant arguments (SOCK_STREAM vs SOCK_DATAGRAM, AF_INET vs AF_INET6 vs AF_UNIX, &c.), or different types of socket addresses (sockaddr_in, sockaddr_in6, sockaddr_un, &c.). Few Scheme systems expose these constants and socket address structures to Scheme; most just hide the `messy details' in C. Actually, `the same returns' is not quite true anyway. The wrappers are generally very fixed in form, and inextensible. Most don't even support any socket options, or have specific procedures to deal with specific socket options. Few have any support for multicast IP. I have never seen support for out-of-band data transmission in any specialized socket wrapper for Scheme. The distinction between shutdown and close is usually either absent or swept under a rug. A general BSD sockets wrapper gets all of these for free, on *any* socket type or domain (except for multicast IP, of course), and can be made to be easily amenable to extension for anything not already wrapped, as long as sockets' descriptors can be exposed straightforwardly in an FFI. Specialized wrappers have to duplicate all this functionality for every type and domain of socket they distinguish between, and they rarely expose any internal interface for FFIs -- that, or they simply don't support any of it. Protocol independence in programs is impossible with specialized OPEN-TCP-CONNECTION procedures: IPv6 has been around for over ten years, and many standard libraries support it trivially through abstractions like BSD sockets in conjunction with getaddrinfo(3), yet programs that support IPv6 well are few and far between because programmers specialize the sockets with the AF_INET family -- and Scheme systems that specialize the BSD sockets to this family specifically *discourage* protocol independence. Unfortunately, programmer laziness in conjunction with a misguided desire for `nicer' interfaces tends toward not supporting any more esoteric capabilities of BSD sockets or flexibility for later extension or protocol independence. Of the Schemes I can think of off the top of my head, only scsh and Gauche have BSD sockets interfaces. These all have specialized wrappers, usually just for TCP, and some UDP, though I think none support any local Unix sockets: Scheme48, Chicken, PLT, Gambit, RScheme, MIT Scheme, SISC... 2006-09-14 R6RS The first public draft of the Revised^6 Report on the Algorithmic Language Scheme has been released, at . The module system, the string API, and the arithmetic specification are all still quite broken. A number of other smaller problems pervade the report. They claim that an operational semantics has been written, but it's not included, and the denotational semantics has been taken out. Why, I do not know. I'm not sure whether I have enough interest left in it to rant much more about it. 2006-09-06 Zob veeblefitzer Foo bar baz quux zot. Frob grovel full lexical... Mumble; chumble spuzz. 2006-07-30 Update: Guillaume Germain found in quantum superposition between Washington and Minnesota Title says it all, folks. 2006-05-29 Missing: Guillaume Germain, author of Termite Monsieur Germain was last seen a little over a week ago in the #scheme IRC channel on Freenode, declaring that he was preparing to undertake the dark journey into the Lesser Evil Empire of Amazon. We can be thankful that he turned down Temptation when she came to his door with an interview at the Greater Evil Empire, but now, even at the Lesser one, he has disappeared and not been heard from in over a week. The Scheme Underground has not published a press release stating that he has possibly been detected in a dark, dank, and dingy dungeon in a basement somewhere in Washington, but the details are sketchy. Preparations are not being made for a raid in the area. [Update: M. Germain appeared spontaneously to inform me that his surname is spelt without a trailing `e', and subsequently vanished just as spontaneously. Whereabouts are still unknown.] 2006-05-29 Pink chickens Chickens, under their feathers, are pink. I don't know whether any of you have ever plucked and feathered a chicken, but it turns out pink underneath. At least, this is what I presume, because I have not myself ever feathered a chicken. I have seen only featherful chickens in farms and chickens in supermarkets which other persons have already had the kindness to feather for me. Still, I can assure you that chickens are pink under the feathers. Now, what are really peculiar are chickens whose *feathers* are pink, and I mean not just slightly pinkish but really bright, hot pink. These things are really bizarre, and I have never seen one in my life; nor have I ever heard of them. But can you imagine seeing a chicken there, yet with blazing pink photons streaming at your retinas, instead of their ordinary brownish reddish hue? [Update: I am informed, among the voluminous profusion of commentary from readers, that an ideal such chicken would be five feet in height and capable of wearing a saddle.] 2006-05-01 On control brackets and resource release [Note, 2010-01-01: There are some errors in this entry. See the entries 2009-02-04 Correction to control brackets and resource release 2009-03-21 Correction to the optimization of resource release in control brackets. For a conciser summary, see the entry 2009-03-28 More simply addressing the confusion about UNWIND-PROTECT in Scheme.] DYNAMIC-WIND has long been a source of contention in the Lisp world. It serves a very particular purpose, but this purpose is not very well understood, and many think that this purpose is the same as the purpose of other Lisps' UNWIND-PROTECT. While the general idea of UNWIND-PROTECT is useful, it is not directly applicable in Scheme, because Scheme has more powerful control abstractions than other Lisps', and it therefore requires more powerful means of controlling them. (UNWIND-PROTECT form protection-form ...), a special form, evaluates the first form and returns its value, but before returning the value also executes the protection forms. Furthermore, if a throw in the primary form transfers control to outside the whole UNWIND-PROTECT form, this, too, will execute the protection forms. This allows programmers to *reliably* clean things up at certain points in the program -- for instance, to close open files, to shut down sockets, to release database handles, &c.; here is hypothetical example of its use: (define (call-with-input-file pathname procedure) (let ((input-port (open-input-file pathname))) (unwind-protect (procedure input-port) (close-input-port input-port)))) (DYNAMIC-WIND before during after), a procedure of three procedural parameters, calls the during thunk, ensuring that any transfer of control into it calls the before thunk, and that any transfer of control out of it calls the after thunk. Normal control transfers, i.e. the initial call to the during thunk and its final return, qualify as control transfers too. This means that, any time control is inside the dynamic extent of the during thunk, any state established by the before thunk will be in effect, but any time control is outside its dynamic extent, that state will be torn down by the after thunk. Here is an actual example of DYNAMIC-WIND's use, from the Edwin text editor: (define (with-current-local-bindings! thunk) (dynamic-wind (lambda () (install-buffer-local-bindings! (current-buffer))) thunk (lambda () (uninstall-buffer-local-bindings! (current-buffer))))) This ensures that, within THUNK, all Edwin variables have any values specified by the buffer's local bindings. This is the kind of use that DYNAMIC-WIND is appropriate for: a sort of control bracket that ensures the presence of certain dynamic state or context within a certain dynamic extent. Now, unfortunately, DYNAMIC-WIND can be seen as somewhat similar to UNWIND-PROTECT, because for basic cases (involving no sophisticated exploitation of Scheme's continuation reification) DYNAMIC-WIND with a null entrance thunk seems to be equivalent to UNWIND-PROTECT. This is misleading, however. Either control may return to the dynamic extent of the during thunk in the DYNAMIC-WIND, in which case the after thunk will be called twice -- which is unacceptably wrong --, or the before thunk really signals an error if it is called twice. This, though, violates the philosophy described in the first sentence of the R5RS: Programming languages should be designed not by piling feature on top of feature, but by removing the weaknesses and restrictions that make additional features appear necessary. Signalling an error in a DYNAMIC-WIND before thunk is usually a limitation and a weakness in a program. We have an *extremely* powerful means of abstraction of control -- CWCC[1] -- and we ought not to prevent it from working merely because we want to prevent a certain fragment of code from running more than once. This should, however, spur a deeper look at why the code is being run more than once in the first place. DYNAMIC-WIND is meant to establish local dynamic state for specific dynamic extents within programs, by setting it up in before thunks and tearing it down in after thunks. These operations, to set up and to tear down dynamic state, are meant to be reversible, local changes.[2] That is, permanent actions, such as closing a port, that one would put in a protection form of an UNWIND-PROTECT, are entirely inappropriate in a DYNAMIC-WIND after thunk. Yet the basic idea of UNWIND-PROTECT is still useful. What, exactly, though, is useful from UNWIND-PROTECT, and what is incompatible with Scheme? The concept of permanent finalization is useful, and the identification of a good point to perform it is useful. *Requiring* that it be performed at this point, though, is incompatible with Scheme, because this point may be passed several times, yet we must perform the finalization once and only once. Many languages, and many Scheme systems, provide ways to ensure that certain code be run when an object is no longer reachable. We can apply this same idea, in fact, to continuations! The protection thunk in an UNWIND-PROTECT of a lesser Lisp is run only once the continuation is about to become unreachable. The only difference between Scheme and CWCCless Lisps is that the point at which continuations become unreachable is unfixed in Scheme. So we must use a general object finalization mechanism -- for the kind of object known as a continuation. This is easy to concoct, though: we need only to make an object, to associate with that object the protection thunk so that it will be executed when the object become unreachable, and to make a link from the continuation to that object. There is an implementation of this in MIT Scheme at the end of this entry. This, though, is non-optimal. The protection thunk may run at any time in the system. We should *like* it to be run as soon as control returns through the continuation in question, but this is valid only if that continuation has not been reified and will not ever be reused. We can say that a continuation is /dirty/ if it has been reified and may be instantiated again, and that a continuation is /clean/ if and only if there will be exactly one instance of it, which will be destroyed as soon as control returns through it. If we store a bit in any continuation involved with UNWIND-PROTECT, i.e. any continuation with an associated protection thunk, then we can initially clear the bit, to indicate that it is clean, and instrument CWCC to mark all such continuations as dirty by setting their bits. When control returns through a protected continuation, it checks the bit. If the bit is clear, then the continuation is clean, and it runs the protection thunk, because control will never again return to this point; it also deregisters any finalization procedure. If the bit is set, then the continuation is dirty, and general finalization system for objects of unlimited extent will handle the protection thunk. (We need to be careful in the case of a reified continuation instantiated on the stack whose heap storage has been reclaimed. The continuation will be marked dirty, but before control returns through it for the last time its protection thunk might be run, because the GC could finalize its heap storage.) Both DYNAMIC-WIND and UNWIND-PROTECT are useful for different applications in a language with powerful control abstractions such as Scheme. DYNAMIC-WIND, though, does not subsume the operation of UNWIND-PROTECT, and UNWIND-PROTECT cannot be taken literally from a language with weaker control abstractions. By examining the core of the intent of UNWIND-PROTECT, though, we can separate its actual purpose from an optimization. We can implement the actual purpose, and then analyze cases where the optimization can still be useful, in order to provide UNWIND-PROTECT efficiently in Scheme. The result is that we can retain the powerful control abstraction of CWCC and in the same language integrate the same idioms of resource release as other languages provide. Of course, there are still some limited cases where restriction is really necessary. There are a number of procedures in MIT Scheme that work with sensitive strings, and into which reentry is not an option, in order to enforce the proper destruction of sensitive information. In this case, DYNAMIC-WIND is appropriate to reject reentry and to nullify sensitive strings on exit. [1] Some would argue that composable continuations ought to be enough for everyone, but the issues here are exactly the same whether we use CWCC or composable continuations. [2] This is not *exactly* true, but symmetry is a very desirable property of DYNAMIC-WIND before & after thunks, and reversible set-up & tear-down operations are usual. Appendix: Variation on a Theme of UNWIND-PROTECT (define continuation-protectors (make-gc-finalizer (lambda (protector) (protector)) cell? cell-contents set-cell-contents!)) (define (make-protector-cell protector) (let ((protector-cell (make-cell (let ((protector-state (get-dynamic-state))) (lambda () (let ((state (get-dynamic-state))) (set-dynamic-state! protector-state #f) (protector) (set-dynamic-state! state #f))))))) (add-to-gc-finalizer! continuation-protectors protector-cell) protector-cell)) (define get-dynamic-state (access GET-DYNAMIC-STATE (->environment '(RUNTIME STATE-SPACE)))) (define set-dynamic-state! (access SET-DYNAMIC-STATE! (->environment '(RUNTIME STATE-SPACE)))) (define-syntax unwind-protect (syntax-rules () ((UNWIND-PROTECT form protection0 protection1 ...) (*UNWIND-PROTECT (LAMBDA () form) (LAMBDA () protection0 protection1 ...))))) (define (*unwind-protect thunk protector) (let* ((protector-cell (make-protector-cell protector)) (result (thunk))) ;; This ensures that the continuation will hold a reference to ;; the cell, so that it will not be garbage-collected until ;; after either control returns to this continuation or a ;; throw causes this continuation to be discarded and made ;; unreachable. (identity-procedure protector-cell) result)) 2006-04-17 Microsoft vermin I am shocked -- *shocked!* -- to hear that the developer of Termite had an interview for a potential job at Microsoft. (Termite, for you ignorant wretches, is Erlang-in-Scheme for Gambit. Find it on your own.) 2006-04-07 XEmacs S-expression motion Turns out paredit.el 20 beta doesn't support XEmacs very well after all. The problem is that XEmacs's S-expression motion commands don't signal SCAN-ERROR conditions like GNU Emacs does: they just signal general ERROR conditions. Programs *cannot* usefully discriminate S-expression motion scan errors from other kinds of errors, and, on software engineering principles, I refuse to modify the code to catch and ignore all errors. Grrrmph. (If any XEmacs developers read this (as if): Could you please either explain why the S-expression motion code signals ERROR conditions rather than SCAN-ERROR conditions in specific, or, if there is no good reason, fix XEmacs?) 2006-03-30 Immobilizing blobs There is a furry, orange blob on my lap preventing me from moving. I believe the scientific classification for this kind of blob is `cat'. Luckily, the state of immobility is only temporary, apparently regulated by some sort of nutritive urges in the blob, and also occasional boredom and needs for entertainment. 2006-03-23 I got better... For those of you who actually *read* this thing and were a bit worried after my 2006-03-07 post, I've recovered mostly, although I'm maintaining a good cough. But apparently you remain ill, since you haven't overcome the affliction of blag addiction, as you're still reading this now. 2006-03-10 paredit.el for XEmacs; deletion keys Recently I have been fiddling with making paredit.el work for XEmacs. Not many changes seem to be necessary -- although I haven't yet tested it exhaustively under XEmacs, as I'm basing my changes on Martin Gasbichler's port to XEmacs --, since paredit.el consists mainly of pretty basic elisp, but one of the most annoying problems to deal with was getting the deletion keys (forward-delete and backward-delete, or Delete & Backspace) right. [I forgot to finish this post, so it was sitting in an autosave file for two weeks, and, though I had a big rant all planned out two weeks ago about getting deletion keys right, I've forgotten what I had to say. Sorry. Also, paredit.el for XEmacs isn't finished yet because I keep forgetting to get around to it.] [Update, 2009-12-31: Thanks to gjs, I now have an arbitrarily large collection of round tuits. Fortunately, I think I made paredit run on XEmacs a couple of years ago.] 2006-03-07 More flu! Not only was the flu great fun, but, after only a few days of having seemed to be mostly rid of it, it has decided to come back to bring *more* joy to my life! Mmmmm... 2006-02-24 Flu! The flu is great fun. I recommend that any of you who have not had the pleasure of experiencing it, find a contagious surface *right*now* and infect yourselves with the glee of knowing what fun and games await you! 2006-02-23 Paredit.el versions 18 & 19 This happened a while ago, but paredit.el 18 is now the released version, and paredit.el 19 is the new beta. See ; find the relevant files yourself. Version 19 supports insertion of balanced delimiters other than parentheses, and a fancy S-expression transportation device suggested by Guillaume Germain, but I haven't yet finished the latter and put it up on the web yet. 2006-02-06 On the dangers of blaggery Until several acquaintences coerced me into recording a `blog' (a term which I abhor almost as much as the concept it denotes), I tended to avoid the the things in general, excepting semi-frequent browsing of Planet Lisp . There are a number of reasons for my eschewal of the reading or writing of blags. The first reason is the content, or absence thereof. There is a tendency among blag writers to write when it amuses them to write, not when they have anything to say. I need not elaborate further on this point; I am sure that the reader is quite familiar with the saturating profusion of content-free blags throughout the web. The second reason is that blags are astonishingly efficient sinks of time. Right now I ought to be doing something else. But I am not doing something else. This is because there is some aspect of the nature of blags that compels their authors to write entries, and, even in the rare case that the author has anything worthwhile to say, the time composing the entry would be better wasted on a different amusement device, such as a cat. (More accurately put, the author ought to spend more time amusing the cat than dribbling drivel onto the web.) This urge to write blag entries instead of doing useful things such as acquiescing to feline demands takes its toll also in other ways. For example, programmers will often briefly emit mutterings on their blags about their latest objects of fascination, without releasing their code. If the existence of that code is an equivocal one, this sputtering of substanceless spoutage is known in some circles as `wanking', a technical term. Wanking is not, and never has been, desirable behaviour to be associated with. Actual code can speak for itself, but speaking for that which has no existence, or at best a vaporous existence, is about as useful to others as a fish is to a bicycle. Or something like that. Yet these programmers persist in writing blag entries instead of substantial code. Finally, the blag is usually directed at an unclear audience. Since there is no particular audience that the blag is intended to cater to, and so for any particular audience much information is discussed in much greater detail than is necessary, and much other necessary information is omitted. Arguments are ill-defended because the blagger has no connection to his audience when writing the blag entry; he must build all the argument on a nebulous idea of what and whom he's trying to argue against. The result is poorly laid-out, ill-conceived, and unpersuasive arguments or other types of entries. Thank you for slogging through the awful post above. I hope that its mere form and existence have convinced you of what it failed to argue for itself. At the very least, perhaps it will dissuade you from assigning further merit to this blag! 2006-02-04 Disturbing blaggery I have noticed a disturbing urge to write more material in this joke of a text file in the past few days. I will have to quell this strange compulsion. I feared it would happen from the start. 2006-02-03 Option types, optional parameters Languages like ML and Haskell that are heavily based on pattern matching have a very simple but elegant mechanism for representing value and absence of value without employing a special value that is still a value like any other: option types. The basic idea is that there is a type which I shall denote with (OPTION ), where is the type of the value that may be present. Two kinds of terms have this type: (SUPPLIED ) and (ABSENT), where has the type . For example, a TABLE-ENTRY procedure might return (SUPPLIED ) or (ABSENT) to denote that there is no value in the table associated with the given key. The idiom in Lisps tends to be to use the false constant (#F or NIL) for representing the absence of value, but this inhibits #F or NIL from being itself a present and useful value. (The conflation of the false constant and the empty list in old, crufty Lisps exacerbates this.) This poses a problem, for example, if you want a general table mapping some key to any object -- including #F -- but you also want to be able to determine when an association is absent in the table. I call this conflation /domain contagion/, because NIL or #F or the symbol NOT-HERE are all still in the domain of expressible values, when they should be kept apart, because their whole purpose is to express the *absence* of an expressible value. There are actually two different problems here. The first I shall leave for another day, although it is more relevant to the TABLE-ENTRY example. It is that procedures like TABLE-ENTRY must convey information about control flow to their callers via data flow. This indirection ought not to be necessary. Olin Shivers wrote a paper called `Multi-return function calls' (although I think that `multi-continuation procedure calls' is a better term) about this fairly recently. The basic idea is that, rather than giving a procedure a single continuation, which itself will discriminate the data that the procedure returns to it, the caller supplies a *choice* of several continuations, and the callee picks one to return to, discarding the rest. The problem of denoting absence of value in procedures like TABLE-ENTRY can be solved by passing two continuations to TABLE-ENTRY: one for the case where there is a value (which continuation accepts one argument), and one for the case where there is no value (which continuation is nullary). I recommend the paper; I may write something on the subject later, too. The other problem is not in eliding the indirection of data flow, but in storing data. Here is a simple example: optional parameters. There is no control flow involved here. If a procedure accepts optional parameters, it simply has several variables which may or may not have been supplied as arguments. Now, while there are some cases where it would suffice to specify in the lambda default values for the case of absence of the arguments, there are other cases where it is useful either to pass on the optional parameter's value to another procedure (and retain the supplied/absent status) or to conditionalize on whether the parameter was supplied in order to do something fancier altogether. Here are several strategies currently employed by existing Scheme systems: 1. Give no provision for specifying default values in the lambda syntax itself, and instead a. use a special value, such as #F (DSSSL) or some `default object' token (current MIT Scheme). -- This is subject to domain contagion as discussed above. b. make references to the variables corresponding with optional parameters trap, and provide a special form DEFAULT-OBJECT? which tests whether references to those variables would trap in order to conditionalize on their presence as arguments (older MIT Scheme). -- This requires some pretty kludgey changes to the semantics of variables in the language. Furthermore, it inhibits the passage of unsupplied optional parameters on to other procedures: you can't simply pass on an unsupplied parameter to another procedure by passing the value of the corresponding variable, because references to that variable trap. (MIT Scheme no longer uses this semantics for optional arguments.) 2. Require that the default value be supplied. (PLT's OPT-LAMBDA) -- This is inconvenient in some cases, and, while there are ways around domain contagion with this strategy, they are pretty inconvenient and they defeat simple static type analysis (which is still useful for dynamically typed languages like Scheme): (LET ((NOT-HERE (CONS 'NOT-HERE '()))) (LAMBDA (... #!OPTIONAL (X NOT-HERE) ...) ... (IF (EQ? X NOT-HERE) ...) ...))) 3. Bind another variable to a boolean flag indicating whether an optional parameter was supplied. (Common Lisp) -- This fixes most of the disadvantages of the above, but it still does not allow the convenient passage of unsupplied optional parameters on to other procedures. It can also get to be syntactically pretty heavy-weight in usage. All of the problems of domain contagion, type analysis defeat, passage of unsupplied optional parameters onto other procedures, and complexity of usage, would be alleviated by the introduction of an option type as in Haskell or ML (or even dynamically typed languages heavily focussed on pattern matching like Erlang). There would be two remaining issues: undesirable boxing, and the method by which to conditionalize on option types. When it comes to data already stored in the heap, the boxing of an option type is not much of an issue; it is merely one more layer of indirection. In fact, on, say, a 64-bit system, where three tag bits is ideal, one possibility would be to use one of those tag bits for flagging option types. (This may not be a great idea, but it is an idea nonetheless which might bear worth experimenting.) As for optional parameters, for which run-time costs of heap allocation is not a reasonable option (sorry for the pun), we must first assume that the compiler knows about the option type in the first place; with this assumption in mind, it is not difficult for the compiler to recognize trivially optimizable tests of the option's presence. For example, it could map such tests to comparisons against an NARGS register. The syntax for testing optional data might be, without pattern matching, a bit clumsy. There are a couple ways it might be accomplished: a new special form could be added, (IF-OPTION (