Generating and parsing data URI's in Ruby

An article, posted 10 days ago filed in uri, data, url, data-url, ruby, link, browser, html, json & api.

I’m fond of data-URI’s (MDN Link). 12 years ago I reappropriated a tool that stored a webpage with its related resources in a Microsoft specific format and rewrote it into something that would store it in normal HTML where the related resources were encoded in data URI’s. Recently the topic came up again at a project I was working in, where microservices are still a thing. And while discussing it with colleagues it seemed as if knowledge about this quite useful URI-scheme wasn’t on top of everyone else’s mind. Instead, the original idea was, we could upload the resource to S3, pass the link, download the resource from S3 at the receiving end, and then have some policy that takes care of deleting it… nah…

data-URI: The basics

This is the most simple data-URI:

data:,Hello%2C%20World%21

You can open it in your browser.

The example above is string encoded. For binary images, or more complex documents, base64 encoding is used.

data:text/plain;base64,SGVsbG8sIFdvcmxkIQ==

The above results in the exact same output; open it in your browser.

Next, how to do this with ruby.

Encoding with ruby

Encoding is pretty simple. We take the content type (the mime type), the data (binary or string), and set whether base64 should be selected for encoding.

require "base64" 

def encode_data_uri(content_type, data, encoding = nil)
  metadata = content_type
  metadata += ";#{encoding}" if encoding

  data = if encoding == 'base64'
    Base64.encode64(data)
  else 
    URI.encode(data)
  end

  "data:#{metadata},#{data}"
end

Decoding with ruby

Decoding needs to be a bit more robust. This script is a minimal implementation, and could be improved with encoding for text. For now we hard-assume UTF-8 for all text (ruby itself defaults to UTF-8), but ideally it is retrieved from the document data (e.g. meta headers). That’s up for a follow up exercise.

require "base64" 

def parse_data_uri(uri)
  return false unless uri.start_with?("data:")

  # Remove the "data:" prefix & split the metadata and data parts
  metadata, data = uri[5..-1].split(',', 2)

  # Extract content type and encoding
  content_type, encoding = metadata.split(';', 2)

  data = URI.decode(data)

  # Decode the data part based on the encoding
  if encoding == 'base64'
    data = Base64.decode64(data)
  end

  if content_type&.start_with?('text/') || content_type&.end_with?('xml')
    data = data.force_encoding('UTF-8')
  end

  {
    src: uri,
    content_type: content_type,
    encoding: encoding,
    data: data
  }
end

Wrapping it up

As shown above, using of a simple data URI encoding mechanism. The data urls are text. They can be sent even be used as data carriers in JSON (despite that you’d be able to agree on base encoding the contents directly, it is nice agreed upon encoding scheme including mime type data), or as I did before, use it to store HTML file with all JS, images and stuff embedded.

Op de hoogte blijven?

Maandelijks maak ik een selectie artikelen en zorg ik voor wat extra context bij de meer technische stukken. Schrijf je hieronder in:

Mailfrequentie = 1x per maand. Je privacy wordt serieus genomen: de mailinglijst bestaat alleen op onze servers.