Note on Choosing and Using URIs

DRAFT 2007-10-21

Current version:
http://sw.neurocommons.org/2007/uri-note/
This version:
http://sw.neurocommons.org/2007/uri-note/uri-note-2007-10-21.html
One of many previous versions:
http://esw.w3.org/topic/HCLSIG_BioRDF_Subgroup/Tasks/URI_Best_Practices/Recommendations/ShorterSputnikDraft?action=recall&rev=20
Editors:
Jonathan Rees, Science Commons
David Booth, Hewlett-Packard

Abstract

[TBD]

How to comment on this draft

Please put your comments on the DraftTalk page, or if commenting on a "major issue," join the fray at one of the pages reserved for this purpose - see the list here. I will attempt to address all concerns and record dissenting views fairly.

Status of this document

This is an editors' draft with no official standing.

I am planning to ask HCLS at an early November teleconference for approval to publish some form of this note on w3.org. The note will still be just an editor's draft at that time.

Before publication on w3.org, the copyright notice will be replaced by the standard W3C copyright notice, and the CC version of the document will be attributed.

Changes since the previous version:
10/19 simplified and reworked the section on web documents
10/21 changed 'definition' to 'DEED' (see endnote: what-to-call-these-things).
10/21 moving toward W3C IG-NOTE document style

[Insert W3C disclaimers re nonendorsement and patent notice here]

Introduction

This document is a request to the community of readers and writers of RDF to follow certain practices around the use and definition of URIs. The advice is formulated with the goal of promoting meaningful exchange and recombination of RDF, and the proposed solutions are meant to be robust in the face of the passage of time. It is hoped that much of the document will be advice that makes intuitive sense. However, certain novel aspects will likely be challenging, and these are meant to stimulate debate.

The focus is the use of URIs as names, or terms that denote. Terms that denote may be used sensibly in declarative statements, and RDF statements in particular. This is in contrast with the conventional use of many URIs as specifying communication endpoints. Sometimes a URI is used both ways, in which case there is either a pun or overloading.

The intended audience is workers in the life science and health care, but it is hoped that it will be useful to others.

Most of the prescriptions may be followed without additional infrastructure, but some of the advice (support for alternative URI resolution) requires a URI resolution rule ontology. This should be consulted as necessary.

The document discusses first the choice and use of existing terms, then when and how to establish new terms. The exposition ends with the special case of web documents. To make the main line of the document more concise, details on a number of topics are relegated to endnotes.

Definitions

term
a name; specifically, a URI, in the role of something that can denote something else.
denote
to name, refer to, or designate.
referent (of a term)
the thing (individual, class, property, etc.) denoted by the term.
DEED (to a term)
a document that says what a term means (denotes); more precisely, an RDF graph giving necessary and sufficient conditions for use of the term (see endnote: what-to-call-these-things). The graph may include prose that is also to be treated as constraining.
spelling (of a term)
the term taken as a string (i.e. stripped of its role of referring); a URI.
locator
a URI whose scheme is http, https, or ftp
nose-follow (a URI)
a heuristic method for finding DEEDs (see endnote: how-to-nose-follow)
URI owner
someone who is a "URI owner" according to rules described here
See endnote: why-new-terminology

[Need an overview: more details on DEEDs, explain that we're not talking logic here, some examples of each technical-term defined]

Le mot juste: Choosing a term

For each term that is used in an RDF document:

Humptydumptyism: Establishing a new term

Establishing a new term requires these steps:

  1. Justify any decision not to use an extant terms
  2. Invent the term (i.e. its spelling - the way it's written)
  3. Compose a DEED that says how the term is supposed to be used
  4. Publish the DEED so that others will know what it means

In the below, by "the URI" I mean "the URI that is the spelling of the term".

Minting a term

Composing a DEED

Publishing a DEED

Grandfathering the Web

Loosely speaking, the WWW is grandfathered into the semantic web (as described here) by considering web documents to be named by their URIs. If a document is obtained when dereferencing the term (or rather its spelling), then by convention the term is taken to refer to the document. The document is not a DEED to the term, but it serves a similar role. While DEEDs to for such terms may be helful, they are not required.

This only works if responses are consistent with one another (not in a formal sense, but in terms of their use or implications). Server responses vary over time and over variation in request details such as requested language or format. Any particular server response should be taken to say that the term denotes what the response communicates, not any particular byte sequence or sequences. By using the term in RDF we express an expectation of consistency - that all responses communicate the same thing, whatever that is.

If a resource (in the sense of RFC 2616) must be denoted that behaves inconsistently or has any interest other than what is communicated, it should be denoted by a term (other than its own URI) that has a DEED and associated 303-response server behavior.

This convention only applies to non-# URIs.

Endnotes

These endnotes are meant to provide motivation, examples, and further discussion.

Endnote: what-to-call-these-things

I'm still searching for a term I feel comfortable. I have used "definition," "defining description," "defining document," "declaration," "declaration document," and many variations. The idea is almost the same as "declaration page" in Booth's article, except that here it is required to be RDF, and it is really more of a document-essence than a "page". Currently I am using the pseudo-acronym DEED, which if it had to might stand for "a DEclaring Essence of a Document".

Endnote: why-new-terminology

Here is why, in some cases, we didn't reuse existing terms from RFC 2616 or Architecture of the WWW:

term
(instead of "URI") - I really do just mean a URI, syntactically, but "term" helps to evoke the right problems and solutions better. Also in this context URIs don't necessarily "identify" anything nor are they restricted to use in association with "resources".
thing
(instead of "resource") - I really mean anything, not just the resources considered in RFC 2616
denote
(instead of "identify") - talk to Pat Hayes
locator
(instead of "URL") - URL is defined quite broadly; I mean to restrict it to the least common denominator among deployed web agents
what is communicated
(instead of "information resource") - in Architecture of the WWW IR means something whose "essence" can be communicated; we need to talk about that essence itself. "Message" is a good alternative word here
DEED
(instead of Booth's "declaration page") - see endnote: what-to-call-these-things.

Endnote: how-to-nose-follow

Nose-following is a heuristic process, similar to dereference, for obtaining a deed to a URI. There a various ways to nose-follow depending on the syntax of the URI.

[Compare DereferenceURI.]

Endnote: neo-DEEDs

In the unlikely event a term is in wide use but its DEED is unpublished or only ephemerally published - for example, if you know it only from use, but not from a DEED - and the URI owner cannot or will not publish a DEED, an expert might compose and publish a DEED if they believe it to correspond to community practice. This neo-DEED has no "authority" according to the URI ownership rule, but may be of use to the community. The neo-DEED might be publicized using a URI resolution rule.

[Is this "eminent domain" for URIs?] [David B isn't comfortable calling such a document a DEED. Eric P is.]

Endnote: minting-nonlocators

Locators have the advantage of nonlocators in that they are more likely to lead to documents. Clients that do not understand a nonlocator natively, and that either do not understand URI resolution rules or have URI resolution rules that lead to a DEED or other document, may still be able to access the document if the URI is a locator. (Of course this is of no help if the link is broken.)

Rather than mint a non-locator URI, you can use a proxy service prefix to create a locator from the non-locator URI. Arrange, somehow, for everyone performing this transformation to use the same proxy prefix. State an equivalence (e.g. owl:sameAs) in case anyone uses of the bare non-locator URI in RDF - or as a way of defining what you mean by the proxy-relative form.

For a concrete case study see TDWG Life Sciences Identifiers Applicability Statement.

For summary of this issue see AttitudeTowardNonlocators.

Endnote: NCName-pragmatics

We encourage NCName suffixes, or at least SPARQL-liberalized-NCName suffixes, for all terms. This helps make Turtle and SPARQL queries more concise. [explain] [explain bug in the RDF/XML spec, SPARQL's extension, etc.]

Endnote: what-is-RDF

For the purposes of this document, "RDF" means either Turtle or an established RDF standard.

Do RDFa documents qualify as RDF documents? I.e. should we recommend using them as DEEDs? Problems: (1) they don't have their own MIME types, so can't be recognized or requested, and (2) they don't work with # URIs.

Endnote: assertions-that-are-not-definitional

If definitions are to supposed to be precise, where do we put RDF statements about a term's referent that are not supposed to be restrictive on the referent of the term?

See issue DefinitionDelineation, to which an answer is given here.

Endnote: about-inconsistency

Assume that anything that can go wrong, will.

We've suggested various sources of DEEDs:

  1. Via URI resolution rules, when available
  2. Nose-following, when it works
  3. Other sources (e.g. documents found via search)

DEEDs can be in conflict only if someone has introduced an inconsistency - that is, the present guidelines haven't been followed. There is no formulaic way to resolve such conflicts. Priority goes to the original URI owner's first version published for nose-following, or some other published version for nonlocators, but there are rare circumstances in which it is preferable to use a revised DEED. On the other hand, a DEED found via a resolution rule included in RDF may be more likely to reflect the author's intent. Ultimately it is up to the community of users of the term to figure this out.

Endnote: response-codes-explained

[TBD: explanation of 2xx and 303 response codes, after httpRange-14]

The following only apply to servers that observe the present recommendations:

Endnote: how-to-get-persistence

[Issues AttitudeTowardMigration and Purls]

A way to help protect against accidental collisions over time (publication of an inconsistent document by a future URI owner) is to have the path component of the URI contain site version information in the form of a year or more precise date. Future owners following this convention will either use no date or will put a different date in the URIs they mint. See [cite RFC 4151 tag: URI] for further information on this convention.

Endnote: convert-nonlocators-to-locators

Tools that care about accessing things (endpoints, DEEDs, etc.) should understand use of URI resolution rules, so that they can properly implement relocation and redundant sourcing.

In particular, there is often occasion to display RDF in a web browser or other user-facing interface. When arranging this, be prepared to link to a DEED or other appropriate document using a browser-friendly URI, e.g. by routing through a proxy. The term's spelling may be an inadequate locator for most browsers (e.g. urn:lsid:) or it may not lead to the correct DEED. Use URI resolution rules to obtain a locator that can be used for hyperlinking.

Endnote: versioning

[TBD: A complete versioning story: database records, databases, ontologies. This is made critical by the requirement that document content remain stable.]

Endnote: comparison-with-AWWW

[TBD: thorough comparison with Architecture of the WWW]

Acknowledgments

Help most recently received from: (reverse chronologically) Eric Prud'hommeaux, Bijan Parsia, Chimezie Ogbuji, Kaitlin Thaney, Alan Ruttenberg, Pat Hayes, Jake Beal. Thank you.