Puppeteering

As I will describe in a later post, I am making another language (hopefully more seriously this time). Part of this process is writing a manual of sorts on how to use the language. I decided to use markdown to start writing a manual. Markdown is like HTML but simplified, and if you want to you can still use (mostly) plain HTML inside. It also has a convenient code highlighting feature.

Highlighting in action with a C example

I started to write the manual, but quickly realised that I could not specify code highlighting for my own code, as markdown would not know of the language.

I quickly decided on a solution – since markdown supported HTML, I made a website which took a url query string as a parameter and formatted it into HTML, in a similar style as to on the site. I did this using regular expressions to substitute HTML span elements directly into text when indicated with a special keyword. Regex has a nifty feature for things like this – when replacing, $& represents the matches string. This is extremely useful, especialy if you want to wrap something around either side of a match. Initially, I chose backslashes to indicate highlighting keywords, but had to switch after realising that weird stuff happened in URLs with double slashes etc.

Much better

Now that I had a site I could use to dynamically serve coloured HTML based on text, it was time to put it to the test in markdown. However, I soon realised I had made a blunder – markdown, for security reasons, doesn’t render or execute script or iframe elements (and weirdly, style elements too – but this could be due to @import in css or just plain lack of support).

I decided the most practical way to write such a manual would be in HTML, due to the control it gives. I could have left it there, but chose not to. At this point, I lost sight of the original goal and decided to see how far I could take this.

If I made a script for a server that takes a query string as a parameter and then queries my site, retrieves the resulting HTML as an image and then serves it back, I might be able to use the markdown image element to display my code as an image. However, due to the fact that my highlighting site uses javascript, a simple fetch won’t work as the javascript won’t be run. (With the benefit of hindsight, I realise I could have just created the image on the server using different software and then served it back).

Enter Puppeteer! Puppeteer is an amazing library that allows us to simulate a browser on a server side context. It has several easy to use abstractions for loading and screenshotting a page – for example:

const puppeteer = require('puppeteer');

//this is an async function as puppeteer methods are async too
async function main() {
    //initialise browser - don't worry about the settings
    const browser = await puppeteer.launch({
        args: [
            '--no-sandbox',
            '--disable-setuid-sandbox'
        ]
    });

    //navigate to our page
    const page = await browser.newPage();
    await page.goto("https://www.google.com/");
    
    let resultSize = await page.evaluate(() => {
        let x = document.getElementById("output");
        return [x.offsetWidth, x.offsetHeight];
    });
    
    await page.setViewport({
        width: resultSize[0],
        height: resultSize[1],
        deviceScaleFactor: 5
    });
    
    await page.screenshot({
        path: 'example.png',
        omitBackground: true
    });

    await browser.close();
}

main();

Now that I had a way to execute the javascript to generate the highlighting, all that was left was to serve it upon a get request. However, for some reason I couldn’t get any HTTP libraries to work on this project – they work fine on others. I think this may be because puppeteer was using the resources it needed.

While testing for issues, I made a simple web server on the same project in python – and it worked. At this point, we now have all of the components, they just needed to be joined together. The way I did this was by having the python web server write to a file upon request which contained the string needed. The node.js script watches this file for changes, and then uses puppeteer to generate and save an image when a change is made. Finally, the web server responds to the user with the file on the server.

The result was satisfying, even if convoluted 🙂

View project

Leave a Reply

Your email address will not be published. Required fields are marked *