Copyright © 2007 Creative Commons Corporation. Licensed under a Creative Commons Attribution 3.0 License except where otherwise noted.
[TBD]
This is an editors' draft with no official standing.
I am planning to ask HCLS at an early November teleconference for approval to publish some form of this note on w3.org. The note will still be just an editor's draft at that time.
Before publication on w3.org, the copyright notice will be replaced by the standard W3C copyright notice, and the CC version of the document will be attributed.
Changes since the previous version:
10/19 simplified and reworked the section on
web documents
10/21 changed 'definition' to 'DEED'
(see endnote: what-to-call-these-things).
10/21 moving toward W3C IG-NOTE document style
[Insert W3C disclaimers re nonendorsement and patent notice here]
This document is a request to the community of readers and writers of RDF to follow certain practices around the use and definition of URIs. The advice is formulated with the goal of promoting meaningful exchange and recombination of RDF, and the proposed solutions are meant to be robust in the face of the passage of time. It is hoped that much of the document will be advice that makes intuitive sense. However, certain novel aspects will likely be challenging, and these are meant to stimulate debate.
The focus is the use of URIs as names, or terms that denote. Terms that denote may be used sensibly in declarative statements, and RDF statements in particular. This is in contrast with the conventional use of many URIs as specifying communication endpoints. Sometimes a URI is used both ways, in which case there is either a pun or overloading.
The intended audience is workers in the life science and health care, but it is hoped that it will be useful to others.
Most of the prescriptions may be followed without additional infrastructure, but some of the advice (support for alternative URI resolution) requires a URI resolution rule ontology. This should be consulted as necessary.
The document discusses first the choice and use of existing terms, then when and how to establish new terms. The exposition ends with the special case of web documents. To make the main line of the document more concise, details on a number of topics are relegated to endnotes.
[Need an overview: more details on DEEDs, explain that we're not talking logic here, some examples of each technical-term defined]
For each term that is used in an RDF document:
Establishing a new term requires these steps:
In the below, by "the URI" I mean "the URI that is the spelling of the term".
Loosely speaking, the WWW is grandfathered into the semantic web (as described here) by considering web documents to be named by their URIs. If a document is obtained when dereferencing the term (or rather its spelling), then by convention the term is taken to refer to the document. The document is not a DEED to the term, but it serves a similar role. While DEEDs to for such terms may be helful, they are not required.
This only works if responses are consistent with one another (not in a formal sense, but in terms of their use or implications). Server responses vary over time and over variation in request details such as requested language or format. Any particular server response should be taken to say that the term denotes what the response communicates, not any particular byte sequence or sequences. By using the term in RDF we express an expectation of consistency - that all responses communicate the same thing, whatever that is.
If a resource (in the sense of RFC 2616) must be denoted that behaves inconsistently or has any interest other than what is communicated, it should be denoted by a term (other than its own URI) that has a DEED and associated 303-response server behavior.
This convention only applies to non-# URIs.
I'm still searching for a term I feel comfortable. I have used
"definition," "defining description," "defining document,"
"declaration," "declaration document," and many variations. The idea
is almost the same as "declaration page" in
Booth's article, except that here it is required to be RDF,
and it is really more of a document-essence than a "page".
Currently I am using the pseudo-acronym DEED, which if it had to might
stand for
"a DEclaring Essence of a Document".
Endnote: why-new-terminology
Here is why, in some cases, we didn't reuse existing terms from RFC 2616 or Architecture of the WWW:
Nose-following is a heuristic process, similar to dereference, for obtaining a deed to a URI. There a various ways to nose-follow depending on the syntax of the URI.
In the unlikely event a term is in wide use but its DEED is unpublished or only ephemerally published - for example, if you know it only from use, but not from a DEED - and the URI owner cannot or will not publish a DEED, an expert might compose and publish a DEED if they believe it to correspond to community practice. This neo-DEED has no "authority" according to the URI ownership rule, but may be of use to the community. The neo-DEED might be publicized using a URI resolution rule.
[Is this "eminent domain" for URIs?]
[David B isn't comfortable calling such a document a DEED. Eric P is.]
Endnote: minting-nonlocators
Locators have the advantage of nonlocators in that they are more likely to lead to documents. Clients that do not understand a nonlocator natively, and that either do not understand URI resolution rules or have URI resolution rules that lead to a DEED or other document, may still be able to access the document if the URI is a locator. (Of course this is of no help if the link is broken.)
Rather than mint a non-locator URI, you can use a proxy service prefix to create a locator from the non-locator URI. Arrange, somehow, for everyone performing this transformation to use the same proxy prefix. State an equivalence (e.g. owl:sameAs) in case anyone uses of the bare non-locator URI in RDF - or as a way of defining what you mean by the proxy-relative form.
For a concrete case study see TDWG Life Sciences Identifiers Applicability Statement.
For summary of this issue see AttitudeTowardNonlocators.
We encourage NCName suffixes, or at least SPARQL-liberalized-NCName suffixes, for all terms. This helps make Turtle and SPARQL queries more concise. [explain] [explain bug in the RDF/XML spec, SPARQL's extension, etc.]
For the purposes of this document, "RDF" means either Turtle or an established RDF standard.
Do RDFa documents qualify as RDF documents? I.e. should we recommend using them
as DEEDs? Problems: (1) they don't have their own MIME types, so
can't be recognized or requested, and (2) they don't work with # URIs.
Endnote: assertions-that-are-not-definitional
If definitions are to supposed to be precise, where do we put RDF statements about a term's referent that are not supposed to be restrictive on the referent of the term?
See issue DefinitionDelineation, to which an answer is given
here.
Endnote: about-inconsistency
Assume that anything that can go wrong, will.
We've suggested various sources of DEEDs:
DEEDs can be in conflict only if someone has
introduced an inconsistency - that is, the present guidelines
haven't been followed. There is no formulaic way to resolve such
conflicts. Priority goes to the original URI owner's first version
published for nose-following, or some other published version for
nonlocators, but there are rare circumstances in which it is
preferable to use a revised DEED. On the other hand, a
DEED found via a resolution rule included in RDF may be more
likely to reflect the author's intent. Ultimately it is up to the
community of users of the term to figure this out.
Endnote: response-codes-explained
[TBD: explanation of 2xx and 303 response codes, after httpRange-14]
The following only apply to servers that observe the present recommendations:
A way to help protect against accidental collisions over time (publication of an inconsistent document by a future URI owner) is to have the path component of the URI contain site version information in the form of a year or more precise date. Future owners following this convention will either use no date or will put a different date in the URIs they mint. See [cite RFC 4151 tag: URI] for further information on this convention.
Tools that care about accessing things (endpoints, DEEDs, etc.) should understand use of URI resolution rules, so that they can properly implement relocation and redundant sourcing.
In particular, there is often occasion to display RDF in a web browser
or other user-facing interface. When arranging this, be prepared to
link to a DEED or other appropriate document using
a browser-friendly URI, e.g. by routing through a proxy. The term's
spelling may be an inadequate locator for most browsers
(e.g. urn:lsid:) or it may not lead to the correct DEED. Use
URI resolution rules to obtain a locator that can be used for
hyperlinking.
Endnote: versioning
[TBD: A complete versioning story: database records, databases, ontologies.
This is made critical by the requirement that document content remain stable.]
Endnote: comparison-with-AWWW
[TBD: thorough comparison with Architecture of the WWW]
Help most recently received from: (reverse chronologically) Eric Prud'hommeaux, Bijan Parsia, Chimezie Ogbuji, Kaitlin Thaney, Alan Ruttenberg, Pat Hayes, Jake Beal. Thank you.