The making of BLUEPHRASE

Doppelmarks and Builders

Advanced features for organizing your writing

by Joe Honton Sep 22, 2019

This is the story behind doppelmarks and landmark builders which support the dynamic reorganization of an evolving manuscript.

For several decades I wrote technical manuscripts as part of my work. Each time, I'd go through the same process: sketch out a rough outline, write a few of the core sections, then discover that I needed to back up and write a few preparatory sections, then reorganize my original composition, then find more details that I hadn't adequately explained, or find places where I had covered the same topic twice. The path was never a straight line.

I reached the conclusion that writing is really all about organizing your thoughts.

Remember those English assignments you had all those years ago, the ones where you had to turn in your story's outline as well as your final composition? Where the instructor was trying to teach you to approach your writing logically? Yeah, right. That's not how it works. Writing is fluid, iterative, serendipitous, and anything but logical.

This was problematic for me. Every time I added preparatory material, or split sections into parts, or rearranged their ordering, I had to adjust my table of contents (OK, no big deal). I had to renumber sections and subsections (tedious, but just part of the process). I had to find and fix all the references to tables and illustrations (error-prone and aggravating). I had to keep track of references and citations without forgetting any (more work). I had to wait until the very last minute to create the topical index — and then hope that the pages wouldn't change their numbering when I did the final proofreading and correcting.

Working on technical documents with others only worsened the problem. Each co-author, editor, reviewer and graphic artist was waiting for someone else to do their part, each wanting to be last.

Remember my original goal for BLUEPHRASE? Way back when I was trying to convert my word processing document into an ePub? Well, a lot happened since then, but I never lost sight of that goal. I still wanted to be able to build an ePub with the same ease as building a printed and bound book.

While I was designing the syntax for shorthand notation, term-marks and graynotes, I was also designing doppelmarks. In keeping with the philosophy that writing should be a hands-on-the-keyboard affair, and reading should be distraction-free, I settled on the use of doppelmarks for internal organization. The word itself was a neologism that combined the German word for duplicate with bluemark, one of the early working codenames for the BLUEPHRASE project. Doppelmarks are matched pairs of delimiters that are used to enclose content that needs to appear in two places. It's the generic term I use to describe term-marks, listmarks, citemarks, notemarks, glossmarks and indexmarks.

After using term-marks for a few months (the marks << ... >> that surround inline phrases such as italics), I found that typing two consecutive characters was ergonomically pleasant. It was light on the fingers and easy to read. So using doubled parentheses, doubled curly braces, and doubled square brackets was an easy decision for me.

Doppelmarks would become part of my solution to the organizational problems that I was chronically facing.

Listmarks

My first use of doppelmarks was a solution to the table of contents problem. As I was previously bemoaning, the constant shifts to the organizational structure of the work-in-progress was regularly triggering the need to renumber and reorder the document's outline. Of course I could wait until the last minute to retroactively create a table of contents based on the final composition, but that wasn't what I wanted. I wanted to see the current state of the document, in table of contents form, at every stage of my writing. I wanted to move sections around, split them into parts, rename them, and so forth, and automatically have a table of contents that reflected that structure.

Enclosing section headers in listmarks gave me this ability. The way it worked was like this. I would mark, with listmark syntax, any section header that I wanted to appear in the table of contents. Then I would place a !build-list pragma somewhere near the beginning of my manuscript. When the manuscript was compiled, the headers wrapped in listmarks would be coalesced and written to the final document as a unified table of contents, with hyperlinks leading to the right place in the document. So the headers wrapped in listmarks would appear in two places: in the table of contents, and in the place where they were typed.

I quickly found the need to enhance this. Sometimes I wanted to be able to use a slight variant of the section header in the table of contents, either for brevity or for clarity. This meant that I needed the listmark to potentially have two textual phrases. I referred to the one that would be appear in the section header as the interscribed expression and the one that would appear in the table of contents as the adjunct expression. If they were identical, there was no need to specify both. The syntax that I settled on was to enclose the interscribed expression in apostrophes '...' and to enclose the adjunct expression in quotation marks "...". Both of these would be typed into the manuscript where the sectional header was to appear, and the whole would be wrapped in parenthetical listmarks ((...)).

This worked out just fine. Later I would use this same interscribed/adjunct pairing in the other doppelmarks that I developed.

The concept of an interscribed expression and an adjunct expression influenced the similar concepts and syntax for glossaries, citations and footnotes. It was important to me to establish a syntax that didn't have irregularities, so my design for each doppelmark, carefully considered the design requirements of the others.

Glossmarks

One of the books I wrote contained a lot of Japanese words, which I knew would be unfamiliar to my readership. I chose to put their definitions at the back of the book in a glossary, so that the flow of my words wouldn't be interrupted. When I wrote the original manuscript, I manually built the glossary, carefully reviewing the entire manuscript to make sure that every foreign word had an entry, and that its definition was appropriate to the context where it was used. Just before press time, I checked that everything was in alphabetical order, and that there were no accidental duplicates.

When I converted that manuscript into an ePub, I put a lot of effort into making sure that the reader could quickly look up definitions using hyperlinks that led to the glossary entry. At the same time, I added back-links from the glossary entry to the word's occurrence in the manuscript, so that readers would never get lost.

I wanted to simplify that work for others who faced the same task. This was the genesis behind glossmarks and the !build-glossary pragma.

Syntactically, I followed the same formalism that I had developed for listmarks: an interscribed term would be enclosed in apostrophes '...' and an adjunct definition would be enclosed in quotation marks "...". Both of these would be typed into the sentence where it made its first occurrence in the manuscript, and the combination of term and definition would be wrapped in vertical-bar glossmarks ||...||.

When the manuscript was compiled, the words wrapped in glossmarks would be coalesced and written to the final document as an alphabetized glossary, with hyperlinks leading from the in-situ term to its definition, and with back-links established from the definition to the place where the term first occurred.

Citemarks

My first two books were co-authored with a PhD ecologist. She provided citations throughout her work, as her profession required. There are several ways to make citations in scholarly works. For those two books we surrounded the author's name and the year of publication with square brackets, placing that reference mark in the body of the article close to the matter that was being explained.

Keeping references and citations organized was the target of the third doppelmark that I designed. In keeping with the syntax that I designed for listmarks and glossmarks, I chose to encapsulate the abbreviated reference mark and the full citation within citemarks. Syntactically, I required the interscribed reference mark to be enclosed in apostrophes '...' and the full adjunct citation to be enclosed in quotation marks "...". Both of these would be typed into the article close to the appropriate subject matter, and they would be wrapped in curly-brace citemarks {{...}}.

The !build-citations pragma would scan the article looking for citemarks, coalescing the adjunct citations into a cohesive references section, which could be organized either alphabetically or in sequence according to their order of appearance within the article. Each interscribed reference mark would be kept in the article where the author placed it.

I was somewhat ambivalent about this design after I had implemented it. It felt right from a consistency point of view: using doppelmarks, interscribed expressions, adjunct expressions, and a builder pragma. But from a readability standpoint, it cluttered up the writing. Citations are ugly: all those abbreviations, punctuation marks and italics make them easy to stumble over. Still, I have resisted the urge to break the pattern, and have kept them as originally designed.

Notemarks

I enjoy reading history. One of the fun aspects of reading history is learning about one topic, only to discover that something else of significance was happening elsewhere at the same time. Authors have a tough choice when they encounter these situations. Should they stick with the main topic, or should they digress and provide their readers with that extra tidbit. Does it really support the main story?

For a long time, footnotes were the solution. When the author's exposition ran through an important cross topic, he would insert an asterisk at the end of the sentence, and continue on with the main topic. The footnote at the bottom of the page would contain the digression. It would be simultaneously out-of-the way, and readily accessible.

Today of course, the Internet is full of digressions in the form of hyperlinks, and we merrily jump from topic to topic without much thought. Footnotes are an anachronism in the Internet age.

Still, I felt that it was premature to give up on footnotes completely, and I wasn't the one to be making that decision anyway. So despite the fact that HTML provides no support for footnotes, I felt that it was my responsibility to honor tradition and provide some means for keeping them viable.

I decided to roll my own. Notemarks and the !build-notes pragma were my solution to this conundrum. In keeping with the tradition of footnotes, I chose to use doppelmarks made with asterisks **...** as the syntactic device for notemarks. But in a departure from the previously designed doppelmarks, I found no need for the interscribed/adjunct distinction. Instead, the author would simply type the digression between the opening and closing pair of notemarks. Everything in between would be moved, by the notes builder, to the footnote area. The notes builder algorithm would generate and insert the footnote marker into the composition itself, at its point of reference.

Footnotes traditionally use asterisks *, daggers † and double-daggers ‡ for the first three footnotes on a page. I kept with that tradition, and automatically generated those marks.

Many authors in the social sciences use endnotes in a manner that is nearly identical to footnotes, only they appear at the end of the article rather than at the bottom of the page. This type of usage nearly always results in more than three notes. The tradition in these cases is to use numbered superscripts instead of the asterisks and daggers. I provided support for this alternate notation in the notes builder with the format attribute, which could be specified as either endnotes or footnotes.

Coalescers

Before moving on to the last of the doppelmarks, which uses a slightly different pattern, let me explain the idea behind coalescers, something that I mentioned above but didn't explain.

For each type of doppelmark, there is a corresponding builder pragma, whose role is to assemble the doppelmark instances into a cohesive group. For example, the list builder algorithm does this for listmarks, assembling them into a table of contents. But many books have other types of lists, such as a list of figures, or a list of illustrations.

Initially I considered having some type of syntactic code to allow each listmark to be tagged as being a member of one of these. I even got to the point of defining the keywords for it: toc for table of contents, lof for list of figures, etc. But before getting carried away with it I stepped back and decided to leave it entirely open-ended, allowing each author to decide if they needed this feature, and deciding what the keywords would be.

I called these arbitrarily defined keywords coalescers. Syntactically the builder algorithm would recognized them if they appeared within the doppelmark enclosure in the first position before the interscribed/adjunct expressions.

Glossmarks could usefully put this same concept to work as well. For example, if a book had two foreign languages, or two separate sets of jargon, a coalescer could be applied and used to build two distinct glossaries. Journals that are composed entirely of scholarly articles, would need this feature for citemarks in order to keep each article's citations separate. And notemarks would need this feature to allow endnotes to be properly generated when they were separately placed at the end of each chapter.

So that there was no ambiguity, I required these keywords to be prefaced by a full-stop .. Indexmarks, which I will describe next, do not need this concept since there is only ever one index per book. Instead, indexmarks need a topic, which for consistency with the others I chose to be identified with this same full-stop syntax.

So, in the BLUEPHRASE language at large, the full-stop is used several ways: as a shorthand symbol for CSS classnames, as a coalescer, and as an index topic. I did not have any hesitancy over this multiplicity of uses. After all, the English language relies on full-stops for a variety of purposes, not just as a sentence terminator. Consider numbered lists, decimal points, abbreviations, and ellipses. Besides their ubiquity, they are also very easy on the eye, more so than any other punctuation mark. I considered that a plus.

Indexmarks

Finally, let's consider the index. This gem is one of the most difficult to create, yet one of the most valuable to have. Catalogers and librarians consider books that have indexes to be more valuable than their counterparts that don't have indexes.

Even in the Internet age where Google search seems to be magic, indexes still play an important role. Professionals, whose job it is to create indexes, go way beyond simple keywords: they consider context and relevancy. They carefully note the defining instance of a topic. And they recognize that ambiguity needs to be addressed, by giving confusing terms a nod with see also entries.

In 2006, when I was finishing work on a lengthy book, I spent nearly a week combing through the final draft finding topics and building an index. Then I submitted it for review. When it came back and final corrections and additions were made, all of the page numbering was thrown off. I needed to painstakingly go through the whole index, renumbering everything.

I wanted to design a way, within the BLUEPHRASE language, to alleviate this tedium while addressing all of the other indexing needs. It was important to me that indexing wasn't an afterthought and wasn't delayed until the very end of the project. (I suspected that indexes were often dropped from consideration only because publishing deadlines couldn't wait any longer.)

The indexmark syntax that I settled on was intentionally designed to be as light as possible, so that reading a manuscript with lots of indexmarks wouldn't be unpleasant. When I designed the other doppelmarks, I purposely set aside square brackets for use with indexmarks, because I felt that visually they were less obnoxious than asterisks or curly braces or vertical bars or any other doubled-up punctuation mark. In terms of simple readability, indexmarks consisting of opening and closing brackets [[...]] were a good choice.

Between each indexmark pair I needed to provide support for four distinct types of entries which the builder algorithm would recognize. These were the defining entry, unmarked entries, secondary entries, and cross-reference entries.

Defining entries would appear as a top-level topic, alphabetically ordered within the final index. The other three types of entries would each be subordinate to it. The syntax consisted of the topic identifier (prefixed with a full-stop), and the word or short phrase for the topic as it would appear in both the body of the book and in the index. What made this a defining entry was that the word or short phrase was enclosed in apostrophes '...'.

Unmarked entries only needed to have a topic identifier between the indexmarks. The builder would create hyperlinks for each unmarked entry from the index to the place in the book where the indexmark occurred.

Secondary entries would also be subordinate to the defining entry, but they would have their own word or phrase which refined the meaning or context of the topic. The builder would recognize a secondary entry by the appearance of the word or phrase enclosed in parentheses (...).

Cross-reference entries, would appear in the final index as a "see also" entry. They consisted of two topic identifiers placed between the indexmarks, with no other word or phrase being needed.

I consider the work I put into these five doppelmarks to be a major step forward for writers. They make it easy to assemble landmark sections such as glossaries, references cited, endnotes, and indexes. They remove much of the tedium surrounding the ordering and numbering and hyperlinking needed to create those landmark sections. And they allow the author to deal with these details when they are fresh, rather than putting them into a pending state, and keeping their assembly in suspense.