Pretty Persistent IDentifiers (PPID)

An article, posted 5 months ago filed in pid, links, isbn, science, archive, url, uri, web & semantic.

If you’re into archival stuf, you’ve probably come across the concept of PIDs. PIDs help organisations attribute data to consistently identified objects. There are different PID-schemes. Books can be persistently be identified by their ISBN. In science, DOIs are popular to identify scientific articles. And there are plenty of other persistent identifiers.

What most of them share is the following: they need registration. And while that could be a good thing, I’ve seen well meant attempts at creating a PID where the central entity went rogue, links are dependent on some centralised resolver and it all falls apart.

The requirements

When I was tasked to create a long lasting QR label the requirements were clear:

Exit: DOI, DAI, ARK, ISBN, ISSN, ORC-ID, ROR and UUID. And even PURL. But the basic design of PURL is what I am aiming for…

Introducing PPID

The PPID is an URL. Which always redirects to another URL, that can be less permanent.

(I’m not sure why I’m writing this down, but the plethora of systems that have failed me in the past made me realise it needs to be this simple.)

To create a PPID create a ppid-subdomain. This should not be on a personal domain, which might be discarded when you die, but do it on a domain that is owned by an organisation. This because even though URLs are cheap, you do need them to be renewed. For example:

https://ppid.example.com

Then, create redirect rules. You can start with a simple one:

The PID that you print as QR-codes, or reference to will be the PPID link, but it will swiftly redirect to your management tool. Your management tool should have a receiver that takes the url path as input. I suggest to split it into a collection and object_id to simplify splitting of responsibilities, but it will work either way. It doesn’t matter.

https://ppid.example.com/1232/ABC-123321 =307 (temporary) redirect=> https://managementsystem.example.com/resolver?q=1232/ABC-123321 =30x (permanent or temporary) redirect=> https://managementsystem.example.com/collections/1232/works/2321

Shouldn’t these be registered by an independent party?

Why? A few things can happen: you can stop to exist. They can stop to exist. If you stop to exist someone may still take over all domains to make sure the identifiers keep working (see below). If there is a third party you are relying on ceases to exist (e.g. a project funded by government funds and not set up to be as lean as possible), good luck. Maybe archive.org will have a copy, maybe you have copied to PIDs into your own database, but still: chaos. Worst case: these numbers might mean nothing anymore to anyone.

But mutations!

One day (a part of) the collection might move. If you manage distinct collections, it is advised to compartmentalise these by grouping into collection ids. But it will work without.

When some or all of your collection moves to a different management system, add new redirect rules. You may have to call your IT-service desk to make this change for you, but because it is a redirect, they will know the drill. In my specific case we have clients, so we can easily redirect everything from a certain client to a system we don’t know, but making the redirect is cheap, we can add the redirect and forget about it. It will be maintainable enough to work forever.

But really big mutations (changing hosts)

Heck, even if you change providers, you will be able to set up the redirects in no time even when changing webserver software.

But changing the domain (well, don’t)

Look, domain names are cheap. If you’ve invested time in a labeling works, the €5 to €10 per year cannot be the issue. Some domain registrars don’t require you to have a server to register redirects. Maybe you decide to change your companies name. Just keep the old url (it is also good to keep some of that link juice, you know SEO, ask marketing department for funding). Link it to the new server, ask your IT-expert, it is REALLY simple for them. You can even do ppid.example.com => ppid.example.org => and from there on do your redirects.

All the objects went somewhere else

Worst case: you have a list as big as the number of original objects as redirects on your ppid domain. It will be stable forever, because it is out of your control. Suggestion: you can even link it to another PPID domain before the link ends up at a more volatile domain.

Different representations

I’ve mentioned that the PPIDs can deep link to your collection management system. But it is flexible. You can choose to support adding a suffix, e.g. .public and link that to a public website. Or before you redirect to somewhere, you make a very minimal app that offers the user to select where they want to see the work. But all that is already out of scope of what a PPID is.

Op de hoogte blijven?

Maandelijks maak ik een selectie artikelen en zorg ik voor wat extra context bij de meer technische stukken. Schrijf je hieronder in:

Mailfrequentie = 1x per maand. Je privacy wordt serieus genomen: de mailinglijst bestaat alleen op onze servers.