A New Text API for Scheme -*- outline -*- * Copying Copyright (C) 2007 Taylor R Campbell. All rights reserved. This document is a draft. Please do not copy or distribute it, except to identify it by URI. It will remain indefinitely at . * Issues ** Binary Formats Procedures are needed for encoding and decoding the binary representations of text to and from octet vectors of some nature. ** Bounds on Running Time of Indexing Will Clinger wants stronger guarantees for the running times of the indexing operations. I am not willing to prohibit space-efficient internal representations, like UTF-8, for texts by offering these guarantees. This will be an issue of some contention, I predict. ** Counting Code Points Should there be a TEXT-LENGTH procedure, which returns the number of code points in the text? I hope for a better name, if so. ** Character Data Type John Cowan wants to use the data type that Scheme calls `characters' as simply a wrapper for code points (and consequently to specify the domain of characters to be that of the Unicode code points (or scalar values)). I am not sure whether I want this, or whether I'd rather simply use exact, non-negative integers in a particular range. ** `Code Point' versus `Scalar Value' Throughout this document I use the term `code point', not `scalar value'. The Unicode standard uses `scalar value' only when discussing Unicode encoding forms, and nowhere else. I do not, however, want texts to be permitted to contain surrogate code points exposed to programs. (Internal representations based on UTF-16 are permissible, however, as long as TEXT-FORWARD skips over two code units if they are surrogates, and TEXT-REF always returns the composition of a surrogate pair, not either of the two surrogate code points.) ** Grapheme Cluster Motion There should be procedures for grapheme cluster motion of positions (TEXT-FORWARD-DGC and TEXT-BACKWARD-DGC). ** Input Input of text from ports, and text input ports (analogous to SRFI 6: Basic String Ports), should be specified. ** Comparison and Binary versus Canonical Equivalence Text comparison is currently defined by binary equivalence. I am passionately ambivalent about this: on the one hand, I want programmers to be able to write programs that reliably compare text for canonical equivalence; on the other hand, binary equivalence is useful and more primitive. What I absolutely *don't* want to do is to leave it unspecified whether the text comparison routines work with canonical equivalence or not. However, I am not sure I want to require programmers to construct intermediate canonical decompositions of texts that might be very large only to compare them. Avoiding this requires many operations to be duplicated for binary versus canonical equivalence. ** Read Syntax We need a read syntax for texts. #"...", perhaps? In this syntax, we also need syntax for code point escapes. I think that R6RS's -- \xHEXDIGITS; -- is probably worth using; e.g., a CRLF text would be #"\xC;\xA;", and `foo' and `bar' in separate paragraphs would be #"foo\x2029;bar". ** Existing Data Types Perhaps there should be procedures for converting to and from Scheme strings. ** Normalization Normalization is a major issue. It will have to wait until the next draft of this document for an explanation of all the gruesome, subtle details. For these reasons which I have yet to explain, there are not presently any TEXT-NORMALIZE-* routines (for * in {NFD, NFC, NFKD, NFKD}). ** Collation Should anything regarding collation be included? (Real collation, that is, not just the locale-independent lexicographic ordering procedures that we already have here.) ** Implementation The implementation should be finished, and possibly written in a more portable way. (Doing so is not hard given an octet vector abstraction.) ** Naming Issues - POSITION-IN-TEXT? reads well, but sets of all sorts of inconsistency bells in my head. - TEXT-REF is very concise, but I am worried that it is too close to STRING-REF and VECTOR-REF. Also, it does not indicate that it obtains the code point *after* the position; perhaps there ought to be another operation for obtaining the code point before the position, and the two ought to be given symmetric names. Symmetric names would probably be much less concise than TEXT-REF, but then TEXT-REF may be too terse. - TEXT-FORWARD and TEXT-BACKWARD are not very descriptive. My intent was that they would be basic operations that everyone would learn and not need to remember by giving them long names, so concision and flexibility for appending higher-level units to the name -- e.g., TEXT-FORWARD-DGC (default grapheme cluster), TEXT-FORWARD-WORD, &c. -- would be more important, but I'm not sure whether this is a good idea. - TEXT-FORWARD-SEARCH versus TEXT-SEARCH-FORWARD. MIT Scheme uses the latter (STRING-SEARCH-FORWARD). The former makes it more consistent with the convention I imagined for text position motion, but it's not actually a text motion operation. Also, MIT Scheme puts the pattern first in the arguments; at the moment, the procedures in this draft put it second, and the string to search through first. - I have never been able to think of a pattern by which to name the case-mapping routines so that mapping to upper, lower, and title case follows the same pattern as folding the case, without giving a clumsy name to either the case-folding routine or the other case-mapping routines. Here are some possibilities, the first of which follows R6RS's naming scheme (which Will Clinger recommends): TEXT-UPCASE TEXT-DOWNCASE TEXT-TITLECASE TEXT-FOLDCASE TEXT-UPPERCASE TEXT-LOWERCASE TEXT-TITLECASE TEXT-FOLDEDCASE TEXT-UPPER-CASE TEXT-LOWER-CASE TEXT-TITLE-CASE TEXT-FOLDED-CASE TEXT-CASE-UP TEXT-CASE-DOWN TEXT-CASE-TITLE TEXT-CASE-FOLD TEXT-UPPERCASE? TEXT-LOWERCASE? TEXT-TITLECASE? TEXT-FOLDEDCASE? TEXT-UPPER-CASE? TEXT-LOWER-CASE? TEXT-TITLE-CASE? TEXT-FOLDED-CASE? TEXT-UPPER-CASE? TEXT-LOWER-CASE? TEXT-TITLE-CASE? TEXT-CASE-FOLDED? * Specification Two new data types are introduced: - A /text/ is an object that contains text represented as sequences of discrete /code points/. Texts are immutable, and disjoint from all existing data types in Scheme. - A /text position/ is an object that identifies the position between two code points in text, or one at the start or the end of text. Text positions are *not* necessarily disjoint from existing data types in Scheme; they may be identical to exact, non-negative integers, for example. Internally, code points in texts may not be laid out in uniform-width cells in memory. Consequently, random access to code points by their natural number in the sequence may not run in constant time. Positions in texts are therefore identified by objects whose representation is unspecified. Following the convention of text editors, positions identify not the sequence numbers of code points, but positions between code points, or before or after all code points in the text. The latter positions are known as the start and end positions of the text. Texts may be `sliced up' and seen in parts with the TEXT-SLICE procedure or its derivatives. The only distinction between a text that is a slice of another text is the start and end positions; if a text is derived by taking a slice of a larger text, each refers to the same underlying textual structure, but simply narrower or broader views of it. Because operating on a slice of a text is such a common operation, the running time of TEXT-SLICE is guaranteed to be sublinear in the number of code points within that slice of text; programmers are encouraged to use TEXT-SLICE. Any positions in the larger text that fall between the slice's start and end positions will identify the analogous positions in the slice, and any identify positions in the slice will also identify analogous positions in the larger text. A text position is said to be /valid in a text/ if and only if the position was derived from a slice of the text, or the position was derived from a larger text of which the text in question is a slice, and falls between the text's start and end positions. Any text position is valid in an arbitrary number of related texts, all of which are slices of one another; there is therefore no unique text associated with a text position, and no specified way to obtain the text from which a text position was derived. The consequences of using a text position with a text in which it is not valid are unspecified. Some implementations may associate with every text a common identifying datum, such as a pointer to the internal storage that a set of texts are slices of, and associate this with every text position derived from any of these texts. Other implementations may represent text positions as tagged pointers into the contents of the common internal storage. These implementations can detect misuses of text positions with texts in which they are invalid, at the expense of some performance because text positions require heap storage and extra time to fetch information from them, or at the expense of some implementation complexity. Still other implementations may not associate this information with text positions, so they may not be able to detect misuses text positions with texts in which they are invalid; instead they may simply yield unpredictable results. Such implementations are allowed to represent text positions as exact, non-negative integers denoting byte offsets from the start of some internal storage, which may fit in machine registers and require no heap space, and which consequently may operate much faster. For example, a text might be a triple of a pointer to a byte vector, a starting byte offset, and an ending byte offset, and a text position might be a byte offset from the start of that byte vector. (This list of implementation strategies is neither exhaustive nor normative; it is only illustrative of the certain pragmatic implications of different strategies, and may be useful to implementors.) The space of code points in this interface is a subset of the set of Unicode code points. Implementations must support at least the US-ASCII subset of code points, and are encouraged, but not required, to support the full space of Unicode code points. Programs should not be written to assume that random access to code points by natural number index will run in constant time (using TEXT-POSITION->INDEX and INDEX->TEXT-POSITION), as may be the case in implementations that support only US-ASCII, or an 8-bit superset thereof; these programs may perform badly in implementations that support Unicode and use variable-width encodings internally for text, such as UTF-8 or UTF-16. ** Procedures *** Texts (TEXT? ) -> boolean Disjoint type predicate for text objects. (TEXT-EMPTY? ) -> boolean Returns true if there are no code points in , and false if otherwise. This is equivalent to (TEXT-POSITION=? (TEXT-START-POSITION ) (TEXT-END-POSITION ) ), (TEXT-POSITION-AT-START? (TEXT-END-POSITION ) ), and (TEXT-POSITION-AT-END? (TEXT-START-POSITION ) ). (EMPTY-TEXT) -> text Returns a text containing no code points. It is not specified whether this yields an identical object for each invocation. (let ((text (empty-text))) (text-position=? (text-start-position text) (text-end-position text) text)) ;Value: #T (TEXT-COPY ) -> text Returns a copy of that shares no internal structure with , and no extra, invisible internal storage occupied by any sibling slices of . Positions that were valid in are not valid in copies of it. In implementations where text is represented without any invisible internal storage, such as implementations based on trees of content, this may simply return . (TEXT-START-POSITION ) -> position Returns the first position in , before which there are no code points. (TEXT-END-POSITION ) -> position Returns the last position in , after which there are no code points. *** Text Positions (POSITION-IN-TEXT? ) -> boolean Returns true if is a valid position in , and false if not. must be a text position in either case. This procedure might not precisely discriminate validity: positions may appear valid in some text from which they were not derived, and the value may vary from implementation to implementation. (TEXT-POSITION=? ) -> boolean Returns true if and are the same position in . Returns false if they are different positions. It is an error if either of or is not a valid position in . (TEXT-POSITION ) -> boolean (TEXT-POSITION>=? ) -> boolean (TEXT-POSITION>? ) -> boolean (TEXT-POSITION<=? ) -> boolean Procedures imposing a total ordering on text positions. It is an error if and are not valid positions in . (TEXT-POSITION-AT-START? ) -> boolean Returns true if is the position before all code points in , or false if not. It is an error if is not a valid position in . (define (text-position-at-start? position text) (text-position=? position (text-start-position text))) (TEXT-POSITION-AT-END? ) -> boolean Returns true if is the position after all code points in , or false if not. It is an error if is not a valid position in . (define (text-position-at-end? position text) (text-position=? position (text-end-position text))) (TEXT-REF ) -> code point Returns the first code point in after . Signals an error if there are no code points after . It is an error if is not a valid position in . TEXT-REF runs in time sublinear in the size of . (TEXT-FORWARD []) -> position or false Returns a text position in that is code points after , or false if there are fewer than code points after in . If is not supplied, its default value is 1. It is an error if is not a valid position in . TEXT-FORWARD runs in time sublinear in the size of . (TEXT-BACKWARD []) -> position or false Returns a text position in that is code points before , or false if there are fewer than code points before in . If is not supplied, its default value is 1. It is an error if is not a valid position in . TEXT-BACKWARD runs in time sublinear in the size of . *** Text Slices (TEXT-SLICE ) -> text Returns a text that contains the sequence of code points in between and . This slice of the text may refer to storage shared by . The running time of TEXT-SLICE on average must be sublinear in the number of code points between and . For example, TEXT-SLICE may run in (amortized) constant time, if a text is a triple of internal storage, a start bound, and an end bound, and TEXT-SLICE need only construct a new triple with tighter bounds; or TEXT-SLICE may run in logarithmic time, if a text is structured as a tree of content. TEXT-SLICE may *not* simply copy all of the code points in ; programmers may rely on its performance, and should not be afraid to use it frequently. (TEXT-PREFIX ) -> text (TEXT-SUFFIX ) -> text TEXT-PREFIX returns a slice of that contains the sequence of code points before . TEXT-SUFFIX returns a slice of that contains the sequence of code points after . (define (text-prefix text end-position) (text-slice text (text-start-position text) end-position)) (define (text-suffix text start-position) (text-slice text start-position (text-end-position text))) (TEXT-SPLIT-AT ) -> [text text] Returns two values: the prefix of before and the suffix of after . (define (text-split-at text position) (values (text-prefix text position) (text-suffix text position))) *** Code Point Indexing Because text is composed of a sequence of code points, each code point in the sequence may be assigned a natural number index starting from 0. There is an isomorphism between these code point indices and text positions. The following procedures deal with this isomorphism, where indices are exact, non-negative integers. (TEXT-POSITION->INDEX ) -> index Returns the number of code points in that precede , or the number of iterated applications of TEXT-FORWARD to the start position of necessary to find a position equal to . This operation may run in time proportional to the value of the index. It is an error if is not a valid position in . (define (text-position->index position text) (do ((index 0 (+ index 1)) (position* (text-start-position text) (text-forward position* text))) ((text-position=? position* position) index))) (INDEX->TEXT-POSITION ) -> position Returns a text position after the th code point in , or iteratively applies TEXT-FORWARD times to the start position of . This operation may run in time proportional to the value of the index. It is an error if exceeds the number of code points in . (define (index->text-position index text) (do ((index index (- index 1)) (position (text-start-position text) (text-forward position text))) ((zero? index) position))) (TEXT-POSITION-DIFFERENCE ) -> integer difference Returns the number of code points after and before . If precedes in , this is equal to the additive inverse of the number of code points after and before , i.e. (- (TEXT-POSITION-DIFFERENCE )). (define (text-position-difference position-a position-b text) (- (text-position->index position-a text) (text-position->index position-b text))) (define (text-position-difference position-a position-b text) (let ((distance (text-position-distance position-a position-b text))) (if (text-position ) -> absolute value Returns the number of code points between and in . This is the absolute value of the difference between and . (define (text-position-distance position-a position-b text) (abs (text-position-difference position-a position-b text))) *** Text Comparison (TEXT=? ...) -> boolean Returns true if each of the texts supplied as arguments contain identical sequences of code points; returns false if otherwise. (define (binary-text=? text-a text-b) (receive (start end) (text-search-forward text-a text-b) (and start end (text-position=? start (text-start-position text-a) text-a) (text-position=? end (text-end-position text-a) text-a)))) (TEXT-CI=? ...) -> boolean Returns true if each of the texts supplied as arguments contain identical case-folded sequences of code points; returns false if otherwise. This is equivalent to (TEXT=? (TEXT-CASE-FOLD ) ...). (define (binary-text-ci=? text-a text-b) (receive (start end) (text-search-forward/ci text-a text-b) (and start end (text-position=? start (text-start-position text-a) text-a) (text-position=? end (text-end-position text-a) text-a)))) (TEXT-PREFIX? ) -> boolean (TEXT-SUFFIX? ) -> boolean These return true if or contains a sequence of code points identical to some prefix or suffix, respectively, of the sequence of code points in . , , and must all be texts. (define (text-prefix? prefix text) (receive (start end) (text-search-forward text prefix) (and start end (text-position=? start (text-start-position text) text)))) (TEXT-PREFIX-CI? ) -> boolean (TEXT-SUFFIX-CI? ) -> boolean Case-insensitive variants of the above. (define (text-prefix-ci? prefix text) (text-prefix? (text-case-fold prefix) (text-case-fold text))) (TEXT-MISMATCH ) -> [position-a position-b] or [#F #F] Compares the code points in and . If the two texts contain identical sequences of code points, returns the two values #F and #F; otherwise, returns two positions in and , respectively, preceding the first code point that differs between the two texts, or after which one text's sequence of code points ends but the other's does not. (TEXT-MISMATCH-CI ) -> [position-a position-b] or [#F #F] Case-insensitive variant of the above procedure. **** Lexicographic Ordering (TEXT ...) -> boolean (TEXT>=? ...) -> boolean (TEXT>? ...) -> boolean (TEXT<=? ...) -> boolean These impose a total lexicographic ordering on texts by the sequence of code points in them. (TEXT-CI ...) -> boolean (TEXT-CI>=? ...) -> boolean (TEXT-CI>? ...) -> boolean (TEXT-CI<=? ...) -> boolean These impose a total lexicographic ordering on texts by the sequence of code points in them, insensitive to case differences. These are equivalent to passing the results of case-folding the input texts to the case-sensitive routines above. (TEXT-COMPARE ) -> values Compares the sequence of code points in and , and calls either , , or in a tail position with respect to the call to TEXT-COMPARE. TEXT-COMPARE calls with zero arguments if the two texts contain identical sequences of code points. If the two texts contain distinct sequences of code points, then TEXT-COMPARE calls either or with two arguments: two positions in and , respectively, preceding the first code point that differs between the two texts, or after which one text ends but the other does not. TEXT-COMPARE will call if both texts have more code points and the first code point in that differs from the corresponding code point in is numerically less than its its counterpart in , or if is shorter than ; and TEXT-COMPARE will call otherwise. (define (text-mismtach text-a text-b) (define (if-match) (values #f #f)) (define (if-mismatch position-a position-b) (values position-a position-b)) (text-compare text-a text-b if-match if-mismatch if-mismatch)) (define (text-compare text-a text-b if= if< if>) (receive (position-a position-b) (text-mismatch text-a text-b) (if (not (and position-a position-b)) (if=) ((cond ((text-position-at-end? position-a) if<) ((text-position-at-end? position-b) if>) (else (if (< (text-ref text-a position-a) (text-ref text-b position-b)) if< if>))) position-a position-b)))) (define (binary-text) -> text (TEXT-LOWER-CASE ) -> text (TEXT-TITLE-CASE ) -> text (TEXT-CASE-FOLD ) -> text These map to their upper-case, lower-case, title-case, or case-folded equivalents, respectively, according to the Unicode Standard, Chapter 3, Section 13, `Default Case Operations', `Case Conversion of Strings'. (TEXT-UPPER-CASE? ) -> boolean (TEXT-LOWER-CASE? ) -> boolean (TEXT-TITLE-CASE? ) -> boolean (TEXT-CASE-FOLDED? ) -> boolean These test whether is entirely upper-case, lower-case, title-case, or case-folded, respectively, according to the Unicode Standard, Chapter 3, Section 13, `Default Case Operations', `Case Detection for Strings'. *** Text Search (TEXT-SEARCH-FORWARD ) -> [start-position end-position] or [#F #F] (TEXT-SEARCH-BACKWARD ) -> [start-position end-position] or [#F #F] TEXT-SEARCH-FORWARD searches forward through for the first occurrence of the pattern; TEXT-SEARCH-BACKWARD searches backward through for the last occurrence of . If an occurrence is found, returns the start and end positions of the occurrence of in ; if no occurrence is found, returns (VALUES #F #F). may be a text or a cache as returned by TEXT-FORWARD-SEARCH-CACHE or TEXT-BACKWARD-SEARCH-CACHE, respectively. (TEXT-SEARCH-FORWARD* ) -> occurrence list (TEXT-SEARCH-BACKWARD* ) -> occurrence list TEXT-SEARCH-FORWARD* searches forward through for all occurrences of , from start to end; TEXT-SEARCH-BACKWARD* searches backward through for all occurrences of , from end to start. The occurrences are returned as a list of pairs whose cars are the starting positions of the occurrences and whose cdrs are the ending positions of the occurrences. Occurrences may overlap. may be a text or a cache as returned by TEXT-FORWARD-SEARCH-CACHE or TEXT-BACKWARD-SEARCH-CACHE, respectively. (define (text-search-forward* text pattern) (let ((cache (text-forward-search-cache pattern))) (let loop ((text text) (matches '())) (receive (start end) (text-search-forward text cache) (if (and start end) ;; To implement non-overlapping patterns, we could ;; write (TEXT-SUFFIX TEXT END), rather than ;; (TEXT-SUFFIX TEXT (TEXT-FORWARD START TEXT)) ;; here. (loop (text-suffix text (text-forward start text)) (cons (cons start end) matches)) (reverse matches)))))) (TEXT-FORWARD-SEARCH-CACHE ) -> search cache (TEXT-BACKWARD-SEARCH-CACHE ) -> search cache If is a search cache as returned by one of these procedures, they return . If is a text, they perform any useful calculations to search for in texts, and return an object caching the results of these calculations, suitable for passing to any of the above text search routines. For example, an implementation of text searching that uses the KMP algorithm might compute a restart vector for the pattern. TEXT-FORWARD-SEARCH-CACHE may be used only for the forward search routines; TEXT-BACKWARD-SEARCH-CACHE, for the backward search routines. These procedures must be idempotent. This allows programs to insert a sort of optional optimization annotation when they will use a pattern multiple times for search only, while permitting their callers to use the same annotations. See the above sample definition of TEXT-SEARCH-FORWARD* for an example. Text search caches are independent of the instance of searching; they may be reused arbitrarily many times with the above search procedures. The precise nature of the object returned by these procedures is unspecified. The following is a perfectly valid, though discouraged, implementation of these procedures: (define (text-forward-search-cache pattern) pattern) (define (text-backward-search-cache pattern) pattern) (TEXT-SEARCH-FORWARD/CI ) -> [start-position end-position] or [#F #F] (TEXT-SEARCH-BACKWARD/CI ) -> [start-position end-position] or [#F #F] (TEXT-SEARCH-FORWARD*/CI ) -> occurrence list (TEXT-SEARCH-BACKWARD*/CI ) -> occurrence list (TEXT-FORWARD-SEARCH-CACHE/CI ) -> search cache (TEXT-BACKWARD-SEARCH-CACHE/CI ) -> search cache Case-insensitive variants of the preceding six procedures. *** Text Editing: Concatenation, Replacement, Insertion, and Deletion (TEXT-APPEND ...) -> text (TEXT-CONCATENATE ) -> text (TEXT-JOIN ) -> text These return concatenations of text. TEXT-APPEND returns the concatenation of each . TEXT-CONCATENATE returns the concatenation of each element of . TEXT-JOIN returns a text composed by concatenating , each of the elements of with between each element, and . If is empty, the result will be that of (TEXT-APPEND ); any other behaviour must be written explicitly by the programmer. Text positionss that were valid in the input texts are not valid in the output texts. Implementations are permitted to return the sole argument passed to TEXT-APPEND, or the sole element of unary lists passed to TEXT-CONCATENATE, but programs may not rely on this. (define (text-append . texts) (text-concatenate texts)) (define (text-concatenate texts) (text-join texts (empty-text) (empty-text) (empty-text))) (TEXT-REPLACE ) -> text Returns a text composed of the sequence of code points in before , followed by the sequence of code points in , followed by the sequence of code points in after . It is an error if either of and is not a valid position in . (define (text-replace text start-position end-position replacement) (receive (text* position*) (text-delete* text start-position end-position) (text-insert text* position* replacement)) (define (text-substitute text pattern substitution) (receive (start end) (text-search-forward text pattern) (if (and start end) (text-replace text start end substitution) text))) ;;; The following definition of TEXT-SUBSTITUTE* may perform ;;; well in an implementation of text based on balanced trees, ;;; where TEXT-REPLACE can run in logarithmic time, but is ;;; unlikely to perform well in an implementation based on ;;; pointers into contiguous blocks of memory where TEXT-REPLACE ;;; runs in time linearly proportional to the size of the text. ;;; ;;; Further below, there is another definition, which uses ports ;;; to accumulate the output text, and which is likely to ;;; perform well in all systems. (define (text-substitute* text pattern substitution) (let ((cache (text-forward-search-cache text pattern))) (let loop ((text text) (position (text-start-position text))) (receive (start end) (text-search-forward (text-suffix text position) cache) (if (not (and start end)) text (loop (text-replace text start end substitution) end)))))) (TEXT-INSERT ) -> text Returns a text composed of the sequence of code points in before , followed by the sequence of code points in , followed by the sequence of code points in after . It is an error if is not a valid position in . (define (text-insert text position insertion) (text-replace text position position insertion)) (TEXT-DELETE ) -> text Returns a text composed of the sequence of code points in before followed by the sequence of code points in after . It is an error if either of and is not a valid position in . (define (text-delete text start-position end-position) (text-replace text start-position end-position (empty-text))) (TEXT-DELETE* ) -> [text position] Returns two values: a text composed by deleting the sequence of code points in between and , as if with TEXT-DELETE; and the position in the resultant text where the first text was deleted. (TEXT-EXTRACT-AND-DELETE ) -> [text extraction position] Returns three values: a text composed by deleting the sequence of code points in between and , as if with TEXT-DELETE; a text composed of the sequence of code points between and in , i.e. the text that was deleted; and the position in the first text where the second text was deleted, as with TEXT-DELETE*. *** Text I/O (WRITE-TEXT ) Writes the sequence of code points in to , according to the particular encoding associated with . may impose limitations on the code points that it can encode, and consequently WRITE-TEXT may signal an error if contains code points outside those limitations. (OPEN-OUTPUT-TEXT) -> output-text-port Returns an output port that accumulates the text written to it. The accumulated text may be subsequently retrieved by passing the port to GET-OUTPUT-TEXT or GET-OUTPUT-TEXT!. There are no limitations on the text that the port can accumulate. (GET-OUTPUT-TEXT ) -> text Returns the text accumulated by so far. This does not destroy ; that is, it may continue to accumulate text. (GET-OUTPUT-TEXT! ) -> text Returns the text accumulated by and destroys ; the consequences of writing further data to it, or passing it to GET-OUTPUT-TEXT or GET-OUTPUT-TEXT! again, are unspecified. (define (call-with-output-text procedure) (let ((output-port (open-output-text))) (procedure output-port) (get-output-text! output-port))) (define (text-substitute* text pattern substitution) (let ((cache (text-forward-search-cache pattern))) (receive (start end) (text-search-forward text cache) (if (not (and start end)) text (call-with-output-text (lambda (output-port) (let loop ((text text) (match-start start) (match-end end)) (write-text (text-prefix text match-start) output-port) (write-text substitution output-port) (let ((text (text-suffix text match-end))) (receive (start end) (text-search-forward text cache) (if (and start end) (loop text start end)))))))))))