As I will describe in a later post, I am making another language (hopefully more seriously this time). Part of this process is writing a manual of sorts on how to use the language. I decided to use markdown to start writing a manual. Markdown is like HTML but simplified, and if you want to you can still use (mostly) plain HTML inside. It also has a convenient code highlighting feature.

I started to write the manual, but quickly realised that I could not specify code highlighting for my own code, as markdown would not know of the language.
I quickly decided on a solution – since markdown supported HTML, I made a website which took a url query string as a parameter and formatted it into HTML, in a similar style as to on the site. I did this using regular expressions to substitute HTML span elements directly into text when indicated with a special keyword. Regex has a nifty feature for things like this – when replacing, $&
represents the matches string. This is extremely useful, especialy if you want to wrap something around either side of a match. Initially, I chose backslashes to indicate highlighting keywords, but had to switch after realising that weird stuff happened in URLs with double slashes etc.

Now that I had a site I could use to dynamically serve coloured HTML based on text, it was time to put it to the test in markdown. However, I soon realised I had made a blunder – markdown, for security reasons, doesn’t render or execute script
or iframe
elements (and weirdly, style
elements too – but this could be due to @import
in css or just plain lack of support).
I decided the most practical way to write such a manual would be in HTML, due to the control it gives. I could have left it there, but chose not to. At this point, I lost sight of the original goal and decided to see how far I could take this.
If I made a script for a server that takes a query string as a parameter and then queries my site, retrieves the resulting HTML as an image and then serves it back, I might be able to use the markdown image element to display my code as an image. However, due to the fact that my highlighting site uses javascript, a simple fetch won’t work as the javascript won’t be run. (With the benefit of hindsight, I realise I could have just created the image on the server using different software and then served it back).
Enter Puppeteer! Puppeteer is an amazing library that allows us to simulate a browser on a server side context. It has several easy to use abstractions for loading and screenshotting a page – for example:
const puppeteer = require('puppeteer'); //this is an async function as puppeteer methods are async too async function main() { //initialise browser - don't worry about the settings const browser = await puppeteer.launch({ args: [ '--no-sandbox', '--disable-setuid-sandbox' ] }); //navigate to our page const page = await browser.newPage(); await page.goto("https://www.google.com/"); let resultSize = await page.evaluate(() => { let x = document.getElementById("output"); return [x.offsetWidth, x.offsetHeight]; }); await page.setViewport({ width: resultSize[0], height: resultSize[1], deviceScaleFactor: 5 }); await page.screenshot({ path: 'example.png', omitBackground: true }); await browser.close(); } main();
Now that I had a way to execute the javascript to generate the highlighting, all that was left was to serve it upon a get request. However, for some reason I couldn’t get any HTTP libraries to work on this project – they work fine on others. I think this may be because puppeteer was using the resources it needed.
While testing for issues, I made a simple web server on the same project in python – and it worked. At this point, we now have all of the components, they just needed to be joined together. The way I did this was by having the python web server write to a file upon request which contained the string needed. The node.js script watches this file for changes, and then uses puppeteer to generate and save an image when a change is made. Finally, the web server responds to the user with the file on the server.
The result was satisfying, even if convoluted 🙂
