The making of BLUEPHRASE

The Case of the Missing Tag

Plugging HTML's biggest gap

by Joe Honton Sep 20, 2019

This is the story behind the meta commands for organizing and assembling BLUEPHRASE compositions.

I have never understood why HTML doesn't have the ability to organize pages into discrete parts. Why can't I include sections of markup from an external file? Why doesn't HTML have any facility to let me assemble a page by pulling in standard headers and footers and menus?

OK, I'm lying. I do understand why. Let me explain what's going on.

The first version of HTML, the one written by Tim Berners-Lee in 1990, comprised just 27 tags, with a third of them being unrecognizable to us today. Ever heard of <isindex>, <nextid> or <plaintext>? The language was focused on the problem of hyperlinking, and the rudimentary nature of it meant that lots of things we take for granted today were missing.

The next two versions of HTML were both working proposals that never gained traction beyond the separate teams who worked on them in a competitive mood, each team having its own stubborn vision, each pulling in a different direction. It wasn't until 1997 that the major players could finally play nicely in a community-backed sandbox. That version improved on the visual aspects of documents, but not on the organizational needs of authors.

The first version of HTML that tried to solve this organizational problem was HTML 4 which introduced the tags <frameset> and <frame>. Their role was to allow a single document to be composed from other documents. It was an ill-conceived solution, reviled by users, and dropped completely from the language in its next version with nary a whimper. A third tag was also added to HTML 4, the <iframe> tag, which is still in use today, but which solves an entirely different problem.

It's fair to say that the big names in the browser industry were busy doing other things. I'll give them a pass. It's also fair to say that the W3C standards body was trying to herd cats. So they can be forgiven too.

But clearly the need was there. Everyone was churning out page after page in explosive fashion. And everyone needed to solve this same problem: how to maintain uniformity across a collection of documents. Server side scripting came to the rescue. The popularity of PHP and all the content management software it spawned (WordPress, Wikimedia, etc.) was in large part due to their ability to work around HTML's omission.

To be fair, there was another factor at play here — security. Bad actors did what they do, and began acting badly. Including a file from an outside source is inherently risky business. But today browsers are hardened against the perils of web pages pulling images and data and scripts from multiple sources. Under the covers, today's web pages are assembled through an interactive discourse that often includes hundreds of back and forth exchanges with servers from all over the planet. Today's sophisticated online apps rely on the assurances provided by browsers that the assembled page won't be hijacked for nefarious purposes. So I can't accept the argument that security is to blame for HTML's omission.

So it's still a disappointment to me that the HTML language doesn't provide this capability, and that I have to rely on JavaScript or server-side scripting to fill this need.

The include pragma

As an author, I wanted to be able to organize my composition into discrete parts, and to assemble the parts into a whole when the time was right. For books this organization was one chapter/one file. Simple. For Web pages it was one file for each functional aspect of the page.

The solution I implemented within BLUEPHRASE was the programming concept of including an external file, an idiomatic expression first created in 1973 by Dennis M. Ritchie for the C language preprocessor. The syntax that I settled on was an exclamation point ! followed by the keyword include and the path to the file. Syntactically it was a rip off from Ritchie's invention.

This was the first pragma I implemented. A pragma is a command that's not really part of the language. It acts as an immediate instruction to the compiler, telling it how to process the work at hand. Over time, many more pragmas would eventually be added to BLUEPHRASE.

Under the covers, what I devised was not the same as its predecessors. Ritchie's preprocessor worked as a separate stand-alone step prior to compilation, injecting text into the programmer's source code. PHP's solution was not too dissimilar from Ritchie's: it injected strings of text containing HTML tags and blocks of scripts. Unfortunately it suffered from the serious problem that opening tags might be in one file while matched closing tags might be in another; it was error-prone and just plain ugly.

The modern approach, used by most websites today, fetches text via the network's request/response protocol. When properly crafted it can download and inject components formatted as HTML elements into the web page, so it works after a fashion. Just not to my satisfaction as an author.

In contrast to its predecessors, the BLUEPHRASE solution is employed at compilation time. Here's what happens.

When the manuscript is being tokenized and built into nested memory structures, any !include pragma is treated as a recursive tokenization step. This means that the content of the included file is not injected as an amorphous block of text, rather the content is treated as more BLUEPHRASE with all its syntactic rules enforced. This neatly solved the problem of mismatched tags that plagues PHP. It also required no interactive scripting or network exchanges the way the modern approach does. Because it's declarative, it's simple and easy to use.

The enclosure pragma

Shortly after perfecting the code to safely recurse BLUEPHRASE manuscripts, I designed a second pragma with capabilities that were the inverse of the !include pragma. It got to the heart of the issue with headers and footers and other boilerplate stuff.

It worked like this. Instead of including the external file into the current manuscript, it would do the opposite, it would inject the contents of the current manuscript into the external file. I called this an enclosure. Syntactically it was declarative, like its sibling, and just as easy to use. The pragma only needed one new attribute to make it work. I called this attribute the selector, because it mimicked the behavior of CSS selectors. It could be in any of three forms: a classname, an identifier, or an HTML tag name.

I designed this new feature to be used in two similar scenarios. First, it could be used as a decorator. For example, every image on a page could be wrapped with a <figure> that had a <figcaption>. The enclosure would contain a BLUEPHRASE fragment with those elements, and the compiler would invoke its inclusion every time an img was encountered in the manuscript. This would keep the manuscript tidy, and provide consistency to the finished document.

The second scenario, was for enclosing an entire document with a page template. For example, the template might include standard layout elements such as <html>, <head>, <meta>, <link>, <body>, <header>, <nav> and <footer>. The manuscript file would contain only the composition itself, relying upon the enclosure to wrap itself around the composition.

To use this new feature, the author would declare an !enclosure pragma at the beginning of a manuscript, specifying a selector and a target filename. The compiler would monitor the tokenization process and take care of the rest, triggering the enclosure algorithm whenever a selector was matched.

This feature quickly became my favorite, and today I use it everywhere.

The "use" pragma

Much later I discovered a related need for a third pragma. This happened when I began pushing the limits of variables and clones. The problem was this: I needed to declare variables and define clone templates without having them appear in the manuscript where they were declared. They needed to be invisible at that point in the process.

I considered other ways to accomplish this, with one way being the introduction of a new tag whose purpose was to be invisible. Ultimately I rejected that idea because I felt that I had no business adding new tags to HTML. Recall that BLUEPHRASE began as a round-trip lossless expression of HTML.

What I finally chose, was a third pragma, which would access an external file containing valid BLUEPHRASE, but whose contents would not be written to the output document. Its contents could be used but not displayed. I designated this to be the !use pragma. Syntactically it looked identical to the !include pragma.

With these three pragmas in place, my organizational needs were largely met. I didn't break any HTML standards by adding new tags. I didn't have to troubleshoot unmatched tag errors. And I had the powerful new enclosure pattern at my disposal.