Using URIs in RDF

DRAFT 35, 2007-11-01 11:14

Current version:
http://sw.neurocommons.org/2007/uri-note/
This version:
http://sw.neurocommons.org/2007/uri-note/35/
Previous version:
http://sw.neurocommons.org/2007/uri-note/34/
Authors:
Jonathan Rees, Science Commons
Alan Ruttenberg, Science Commons

Abstract

This document is a request to the community of producers and consumers of RDF to follow certain practices around the use and resolution of URIs. The advice is formulated with the goal of promoting meaningful exchange and recombination of RDF artifacts and to help protect the meanings of these artifacts against the ravages of time.

The intended audience of this note is workers in the areas of life science research and health care who are building semantic web resources and tools, but it is hoped that it will be useful to others as well.

Status of this document

This is an editor's draft with no official standing.

The title attached to this draft is provisional.

This is a draft of a document written in response to needs expressed by the W3C Semantic Web Health Care and Life Sciences Interest Group (HCLS). It is intended for publication as an Interest Group Note on w3.org. Before publication there, it will be made to conform to standard W3C document and policies.

Recent changes:
10/31 (35) change 'URI owner' to 'naming authority'
10/30 (33) more explanation of why you want resolution rules
10/30 (33) further de-emphasize resolution rules in main text
10/27 (31) replace figure series with single omnigraffle figure
10/27 (31) rework resolution appendix
10/27 (28) rework section on documents
10/25 (26) put all resolution stuff in appendix
10/25 (25) whimsical title (was "Note on Choosing and Using URIs")
10/25 (25) "term" changed to "name"
10/25 (25) fixes to terminology around "resolve" and "dereference"
10/25 (25) "nose-follow" changed to "meta-dereference"

URI Note home:
http://esw.w3.org/topic/HCLSIG_BioRDF_Subgroup/Tasks/URI_Best_Practices/Recommendations

How to comment on this draft:
Please put your comments on the DraftTalk wiki page, or if commenting on a "major issue," join the fray at one of the pages reserved for this purpose - see the list here. I will attempt to address all concerns and record dissenting views fairly.

Introduction

This document is a request to the community of producers and consumers of RDF (see note: {what is RDF}) to follow certain practices around the use and resolution of URIs. The advice is formulated with the goal of promoting meaningful exchange and recombination of RDF, and the proposed solutions are meant to be protect investments made in composing RDF by enabling a sustainable Semantic Web infrastructure.

[JW: give a non-technical intro here. how science is done, how semantic web/RDF can help.]

The focus is the use of URIs, as names of particular things or as otherwise used technically. Names that denote individuals, relationships, and classes may be used sensibly in statements. This is in contrast with the conventional use of many URIs as specifying communication endpoints. Sometimes a URI is used both ways - it dereferences to what it denotes, presumably a document-like thing - in which case there is a pun. [consider reworking last sentence - not quite right]

Failure to handle URIs wisely leads to errors, inconsistencies, and lost opportunities. The specific sources of these problems include:

Most of the advice given here may be followed without the addition of new technical infrastructure. However, obtaining adequate generality and durability requires that publishers provide resolution information and that applications understand and make use of it. One approach to doing this is described in an appendix.

The document starts by describing an approach to specifying the intended usage of names that builds on current practice. The next two sections give a sort of "protocol" for finding usage specs for names and for establish usage specs for new names. Finally a treatment of the case of web documents is given. To make the outline of the argument easier to follow, details on a number of topics are relegated to endnotes.

Specifying usage

The organizing principle proposed here is that for each name in use, there is a document [was: an RDF graph] designated as specifying correct usage of the name, somewhere.

A usage spec for a name is simply a graph that is designated as one that specifies when the name should and shouldn't be used. The usage spec contains descriptive statements that use the name to refer to the name's intended referent. The description is given in prose and/or RDF assertions. If someone uses the name, their use should be consistent with what its usage spec says.

Example:

  specimen:S05-100_A_1_2.3 
    a dicom:Specimen ;
    dicom:patient patient:65536 ;
    dicom:machine dicom:AVUTRIX_MULTIPLE_B7792 ;
    dicom:date_collected "2007-08-07"^^xsd:date .
could serve as a usage spec for the name specimen:S05-100_A_1_2.3 : we would be saying that the name should be used to denote the intended specimen.

Prose description and RDF description are intermixed by placing the prose in a literal string related via rdfs:comment.

Underspecification - usage spec that is not very specific - is to be discouraged, as it may easily lead to confusion.

A graph that merely uses a name to describe the name's referent is not necessarily a usage spec. Whether a graph is a usage spec depends on whether the naming authority has said that it is. [but see note: {neo-specifications}]

A usage spec will itself use a number of other names, and a full specification of the name would in principle require an understanding of those names.

Because the name's usage spec is the arbiter of what the name means and how to use it, the problem of finding usage specs, which are distributed around the network, is quite important. Stability of a usage spec is also important: changing a usage spec is a recipe for confusion, as different users of a name may rely on different versions of the usage spec without being aware that the change has occurred.

Sustainability principles

In order to enable RDF graphs that are meaningful and useful to both humans and machines, and that can be meaningfully combined with other, independently developed, graphs, the names that the graph uses must be meet certain quality benchmarks. We define a name to be sustainable when it obeys the following principles:

  1. a usage spec for the name can be found by anyone reading RDF that uses the name,
  2. there are no other usage specs for the name that might mislead readers,
  3. the (effective) usage spec is consistent and clear, and
  4. the usage spec specifies the referent as precisely as necessary for intended applications.
These principles are intentionally imprecise; they specify general requirements, not particular solutions. Suggestions for particular solutions will be given in later sections of this document.

Using an existing name

  1. Do some research to track down names that are already in use that might be useful to you, and their usage specs.
  2. Use an existing name when it has the correct meaning for your Make sure that fundamental aspects such as type, domain, and range will work for your application.
  3. However, use an existing only when it satisfies the _sustainability principles (above).
  4. If a name refers to a document or database record, do not use the name to refer to the thing described by the document or record, or vice versa. (A potato can't have an author, but a document describing a potato can.) If the thing and the document or record both need to have names, the names must be different.
  5. When there is a choice among existing appropriate names, choose the name that
    • best matches your intended use
    • is in widest use
    • has the most easily located usage spec
    • has the best (clearest, most accurate, most consistent) usage spec
    These criteria are often in conflict, so balancing these criteria may require judgment. Seek advice from the communities of users of the names if you're not sure.
  6. If you've done your best to find a name that serves, and failed, establish a new name (next section).

Minting and establishing new names

A new name should be established for a new meaning. Establishing a term consists of 'minting' it (deciding what its spelling should be), composing a usage spec, and publishing the usage spec. The overall requirement is to establish a name that satisfies the sustainability principles (above [section?]).

Should a new name be needed, establishing a new one requires these steps:

  1. Justify any decision not to use an extant name
  2. Decide what the name should be (i.e. its spelling - the way it's written)
  3. Compose a graph that is to become the name's usage spec
  4. Publish the graph in such a way that others will know that it's the name's usage spec

Names that are to denote documents may be established simply by publishing on the web; see next section.

In the below, by "the URI" I mean "the URI that is the spelling of the name". [?? is this too labored ?? "spelling" is awkward ??]

Reuse or mint?

[examples subclass, union, restriction -- not all agents can deduce via these -- but we should insist] [special note on sameAs -- extremely strong -- note: {equivalence discussion}]

Minting a name

  1. Do not mint or declare a name unless you are its naming authority (in the sense of Architecture of the WWW). (But see note: {neo-specifications})
  2. Don't redefine [?word?] a name that you or any previous naming authority of the URI has previously estabished. (See (collision_avoidance_hack) for one collision-avoidance tactic.)
  3. As the HTTP protocol is widely deployed, you are advised to mint names that are http: or https: URIs. See note: {minting nonlocators}. Doing so is not to be taken to imply that the name denotes a document.
  4. Because of the overhead of accessing large files, a racine should not be shared (i.e. the same racine used with many fragment ids) among a large number of terms. Moreover, sharing a racine to any extent at all makes delineation of separate usage specs for the names difficult, if not impossible, so this is discouraged as well. (RacineSharing)
  5. To permit use of the RDF/XML form of RDF, if the name is to denote a property, the URI must end in an XML NCName (roughly speaking, a sequence of characters from the set {letters | digits | "_" | "." | "-"} that begins with a letter or with "_"). (See note: {NCName pragmatics} for related practices.)
  6. Choose names that steer users away from incorrect interpretations. For example, if a name is to denote a property, choose the name so that there can be no confusion about the direction of the relationship: ex:hasCapital (or ex:has_capital) or ex:isCapitalOf (or ex:is_capital_of), but not ex:capital.

Composing a usage spec

  1. Compose RDF statements that provide clear guidance on use. The RDF should specify single and particular usage. (See note: {what is RDF}.)
  2. A usage spec should not be in conflict with the semantics of an established RDF-based language. [[rework this.]]
  3. The declaration should be specific enough to rule out incorrect usage, but not so specific that it overcommits and fosters inconsistency or discourages reuse.
  4. In addition to (or if necessary instead of) any formal assertions, provision of constraining prose in an rdfs:comment property is strongly encouraged, and usually necessary if the name's relation to the extralogical world is novel. In lieu of prose use a well-justified formal method such as OWL statements.
  5. Usage specs should be thought of as irrevocable (see note: {about inconsistency} and _seenote(`versioning')). Compose the RDF defensively; correct use by others (according to this account) will be determined by what the usage spec says, not by what you might say later.
  6. Simply naming a name is never a substitution for articuating a usage spec. It is not self-evident that the URI http://example.org/mbl-lillie-building refers to the Lillie Building at MBL.
  7. Take a stand on time dependence. [need example, e.g. age, height] Anchor statements at a particular time (permitting statements that assume that time) or clarify that the name only denotes whatever is time-invariant about something. [talk about occurrents and continuants?? or cite.]
  8. Every usage spec should establish a nontrivial rdf:type for the name's referent using appropriate RDF statements. (E.g. owl:thing is not informative.)
  9. rdfs:label assertions are a courtesy to user- and developer-facing interfaces. We suggest the use of short textual labels that can be placed in (for example) lists and menus in user interfaces.
  10. [move or flush? move to intro?] A property should be given a nontrivial domain and range, when these exist.
  11. [move or flush?] A class should be asserted to be a subclass of some nontrivial class, when one exists.
  12. [When declaring a name, ] Also publish statements relating its reference to other things (see note: {assertions that are not constraining}).
  13. [example of usage is very helpful. cf. obo minimal metadata]
  14. [citation]

Publishing a usage spec

  1. Publish the usage spec in such a way that its specifying nature is implied and so that meta-dereference will work.
    • If the term has a fragment id (is a # URI), publish the usage spec at the "racine" (the 'racine' is URI formed by dropping truncating the URI starting with the #)
    • If the name is a locator, publish the graph to permit "meta-resolution" to the usage spec (see

      Note: {where do usage specs go?}

      ).
    • If the name is some other kind of URI, use a protocol-specific method or make a best effort to make it known that the graph specifies the term's usage.
    [cite httpRange-14 and other discussion about this]
  2. Make best efforts to ensure that the name's usage spec is accessible for the lifetime of any RDF that uses the name. (See note: {how to get persistence}.)
  3. Once you publish the graph, others will start to depend on what it says, so the name with your given meaning effectively becomes community property. Never redeclare a name in a way that might break or confuse others' use of the name. (See note: {versioning}).
  4. Avoid mixing mere descriptive statements with the essential statements that specify a name's usage. Instead the usage spec should link to a second document containing the non-contingent statements [make sure the seeAlso can be nonconstraining; see note: {assertions that are not constraining}].

Talking about documents

Loosely speaking, the "web of documents" is grandfathered into the semantic web, as described here, by considering web documents to be named by their URIs. If a document is obtained when interpreting a name, then by convention the name is taken to refer to the document. The document is not a usage spec for the term: a usage spec is part of the discourse being conducted in RDF, while a _document is merely one more thing that one might talk about. In contrast to usage specs, by mentioning a document there is no expectation that its contents are supposed to be believed.

While establishment of usage specs for document-denoting names would often be helpful - one could state type, title, authorship, revision status and so on ("metadata") - this is difficult, at least using HTTP, and one might be forgiven for not providing one. See appendix A, below, for a hack for simultaneously publishing data and metadata using HTTP.

In order to be considered true, a statement involving the name must apply not just to what has been observed at the time the statement was written, but also to what will be observed when someone else is trying to understand or use the statement. Server responses vary over variation in request details such as requested language or , as well as time.

Excluding time variation, any particular server response should be taken to say that the name denotes what the response communicates, not any particular byte sequence or sequences. That is, statements about documents (or at least those not varying in time) talk about what the document says irrespective of representation details.

To articulate the allowable inferences about documents that can be made based on server response, it is proposed here to classify documents according to consistency criteria:

fixed document
all server responses are identical (particular bytes + their metadata; LSID "piece of data"; similar to "representation" sensu Architecture of the WWW)
stable document
responses vary only over representation details, not underlying content (they all say the same thing)
document
responses may vary over time, but are consistent enough to permit something to be said (author, topic, etc.) (a kind of "continuant" sensu Barry Smith)
endpoint
a "network data object or service" (quoting RFC 2616); no a priori consistency guarantees of any kind; requires an independent explanation when used in RDF
Examples: (don't just follow the links - think about what the URIs will denote over time and across HTTP header variation)
  1. fixed document: http://www.biomedcentral.com/content/pdf/1471-2105-8-S3-S2.pdf - a particular PDF file with fixed length, checksum, etc.
  2. stable document: info:hdl/10.1186/1471-2105-8-S3-S2 - a particular unchanging document available in several representations
  3. stable document: http://www.w3.org/TR/2003/PR-rdf-concepts-20031215/ - similarly
  4. document: http://www.w3.org/TR/rdf-concepts/ - a document that says different things at different times, but that has a consistent topic and social origin
  5. endpoint: http://sw.neurocommons.org/2007/strange-resource - a network resource that does not handle GET requests

If one of these types is given as the domain and/or range of properties, then not only can one state the assumptions under which statements are made, but having communicated those assumptions, further inference is enabled from inspection of the document. For example, if we know that x has a length, and that only fixed documents have a length, then we can infer that x is a fixed document and therefore that it also has a checksum, which can be computed by reading the octets of x.

Advice around documents:

  1. For every use of a document-denoting name in RDF, make sure that a consumer will be able to understand the assumptions under which that statement was made.
  2. Provide the strongest possible consistency assurances, in order to enable the greatest number of inferences.

Glossary

name
a URI; used with the connotation of having the potential for meaningful use in RDF, and without the suggestion that one "identifies" something or that what's named is a "resource" (see note: {why new terminology?}); also a way to sidestep the URI vs. IRI switch
locator
a http: or https: URI that has no fragment id (#); defined technically and syntactically for the purposes of this document
denote
to name, refer to, or designate something (see RDF semantics)
referent (of a name)
the thing (individual, class, property, etc.) denoted by the name
graph
a set of statements written in RDF
usage spec (for a name)
a graph that is used to specify the intended usage of name.
well-established
[tbd]
spelling (of a name)
the name taken as a string (i.e. stripped of its role of referring); a URI.
naming authority
someone who has the right to specify a name's usage (see note: {_namingauthority}
resolve (a name)
obtain the document denoted by the name
meta-resolve (a name)
obtain a usage spec for the name
dereference (a name)
attempt to obtain the name's referent using the protocol implied by its URI scheme
meta-dereference (a name)
attempt to obtain a usage spec for the name using the protocol implied by its URI scheme
property
a two-place predicate; roughly speaking, a transitive verb. This term is a misnomer - being green is a property, not having color (specify color) - but is firmly established in RDF lore.


Notes

Notes are in alphabetical order.

Note: {about inconsistency}

Assume that anything that can go wrong, will.

We've suggested various ways to find usage specs (Appendix A). If a _namingauthority has published more than one usage spec over time or in different places, or if someone has taken it upon themselves to change a usage spec and pass it off as correct (perhaps using a resolution rule), then there is a possibility of disagreement between usage specs.

Usage specs that differ in inconsequential ways - that is, that neither broaden nor narrow the applicability of the name - are not in conflict. For example, a newer usage spec may give more examples or explanation than another, or provide statements (such as rdfs:seeAlso statements) that would not affect the correct use of the name.

There is no formulaic way to solve true conflicts. Other things being equal, priority should go to the first usage spec for the name published by whomever was the naming authority at the time, or to a usage spec compatible with it; other published usage specs may be inconsistent with published use and should be examined with skepticism.

However, there may be rare circumstances in which it is preferable to use a revised usage spec - for example, a usage spec may be internally consistent in a nonobvious but easily fixed way. On the other hand, if there is inconsistency, a usage spec found via a resolution rule included in an author's RDF may be more likely to reflect the author's intent than one that was not so cited. Ultimately it is up to the community of users of the name to determine how to solve conflicts.

Note: {assertions that are not constraining}

When you establish a name (or after), some of what you say is meant to be constraining on all uses, while some of what you say is incidental: either advisory, hypothetical, or unimportant. A discovery that incidental information was incorrect would not force a retraction of a usage spec.

Where do we put RDF statements about a name's referent that are not supposed to be constraining on the use of the name? We have no syntactic marker in RDF that can separate set of statements from another.

One solution is to grandfather existing ontologies by saying that this separation is an informal process or is simply not specified by this note; look elsewhere for guidance.

Another approach is to take all statements as constraining. The non-constraining statements should be placed in a separated document and a relation placed in the usage spec relating the usage spec to the secondary description via a predicate such as rdfs:seeAlso. [This is roughly the answer given here. See issue DefinitionDelineation.]

Note: {collision_avoidance_hack}

A way to help protect against accidental collisions over time (publication of an inconsistent usage spec or other document by a future site administrator or naming authority) is to have the path component of the URI contain "site version" information in the form YYYY, YYYY-MM, or YYYY-MM-DD (example: http://www.w3.org/2001/XMLSchema). Future administrators following this convention will either use no date or will put a different date in the URIs they mint. See [cite RFC 4151 tag: URI] for further information on this convention. [this practice must be detailed somewhere, but where?]

Note: {convert nonlocators to locators}

[no reference in text]

Tools that care about accessing things (endpoints, usage specs, etc.) should understand use of resolution rules, so that they can properly implement relocation and redundant sourcing.

In particular, there is often occasion to present names in a web browser or other user-facing interface. When arranging this, be prepared to link to a usage spec or other appropriate document using a browser-friendly URI, e.g. by routing through a proxy. The name's spelling may be an inadequate locator for many browsers (e.g. urn:lsid:, info:) or it may not lead to the correct usage spec. Observe resolution rules that will help generate a locator that can be used for hyperlinking. [details - presentation is not same as usage spec]

[Compare ARKs, handle proxies, etc.]

Note: {equivalence discussion}

[TBD. Not linked from text yet. When to use/not use sameAs, equivalentClass, etc. Use in constraining/nonconstraining situations. When one of these constitutes a correct usage spec. The idea of hypothetical sameAses as a way to modulate precision and recall. blah.]

Note: {how to get persistence}

By persistence we mean the ability for a name to resolve to its referent (if a document) and meta-resolve to a usage spec over the potential lifetime of the name. This could be anywhere from seconds to decades, although it is the latter that we usually have in mind.

Persistence has two aspects:

  1. apparatus for the name to (meta-)resolve over time, e.g. using a forwarding service such as purl.org [for discussion of purl.org see Purls]
  2. the accessibility and consistency of associated document(s)
These aspects are independent. A forwarding service may know about the name, but may not have a valid current address; while a document may be perfectly accessible on the network, but lack a persistent name (consider a hypothetical archiving service that changes its URIs every few years, or a highly replicated document whose replicas are each short-lived). The two services may be provided by different organizations.

Because persistence implies possibly outliving any individual or organization involved in establishing the name, and perhaps even their interest in keeping it resolvable, persistence requires long-term institutional commitment to identifiers and accessibility.

The importance of persistence hinges on your attitude toward mechanisms such as resolution rules. If you believe that peer-to-peer resolution rules (or any similar mechanism) will be understood, then a persistence service becomes less important. If you believe that consumers you care about either will not understand resolution rules or will not have adequate rules, then a persistence service is more important than it would be otherwise. [See AttitudeTowardMigration.]

Note: {minting nonlocators}

Locators have the advantage of nonlocators in that they are more likely to lead to documents. Clients that do not understand a nonlocator natively, and that either do not understand resolution rules or have resolution rules that lead to usage spec or other document, may still be able to access the document if the URI is a locator. (Of course this is of no help if the link is broken.)

Rather than mint a non-locator URI, you can use a proxy service prefix to create a locator from the non-locator URI. Arrange, somehow, for everyone performing this transformation to use the same proxy prefix. State an equivalence (e.g. owl:sameAs) in case anyone uses the bare non-locator URI in RDF - or as a way of specifying what you mean by the proxy-relative form.

For a concrete case study see TDWG Life Sciences Identifiers Applicability Statement.

[For summary of this issue see AttitudeTowardNonlocators; also see AttitudeTowardMigration.]

Note: {naming authority}

For purposes of this document the naming authority of a name is defined to be the entity that has the "right" to say how the name ought to be used. For locators, the naming authority is the entity who is allowed to determine HTTP server behavior at the designated location: 200 OK (names a document, generally speaking), 301 (Moved Permanently), 303 (See Other), or some other response. Naming authority coincides with the concept of "URI owner" as specified in section 2.2.2.1 of Architecture of the WWW (which should be consulted in conjunction with other URI schemes), but "URI owner" has bred some confusion around exactly what rights are conferred and how permanent those rights are.

Note: {NCName pragmatics}

We encourage NCName suffixes, or at least SPARQL-liberalized-NCName suffixes, for all names. This helps make Turtle and SPARQL queries more concise. [explain] [explain bug in the RDF/XML spec, SPARQL's extension, etc.]

Note: {neo-specifications}

In the unlikely event a name is in wide use but its usage spec is unpublished, lost, or only ephemerally published - for example, if it is known only from use - and the naming authority cannot or will not establish a new usage spec, an expert might compose and publish a graph that they believe to correspond to community practice, and attempt to get the community to accept the graph as specification to be followed. This neo-usage spec has no naming authority, but may be of use to the community. The neo-usage spec might be publicized using a resolution rule.

[hypothetical situation, should I flush this note? important illustration of how community process should trump priority in extreme circumstances & how there are no rigorous rules governing this process. Better approach: mint a new term, then assert that the new one and old one are equivalent as names. ]

Note: {versioning}

TBD: A versioning story: database records, databases, ontologies, usage specs. Why this is critical:

  1. The requirement that documents and deeds remain stable means we will need to make a growing sequence of stable versions.
  2. So is there any way to talk about "the latest version"? What about "what is common to all versions"? (that would have to do with classes.)
  3. How to deal with an unstable document such as a catalog of known versions of something (the LSID mutable metadata story)?
  4. Is it worth giving (names for) relations relating RDF graphs, such as entailment?

Look at continuant/occurrent theory, DAV, etc.

Note: {what is RDF}

For the purposes of this document, "RDF" means either Turtle or an established RDF standard.

Do RDFa documents qualify as RDF documents? I.e. should we recommend using them as usage specs? Problems: (1) they don't have their own MIME types, so can't be recognized or requested, and (2) they don't work with # URIs.

Note: {where do usage specs go?}

The location for finding a usage spec is a problem because the HTTP protocol has no native way to provide it. Often the usage spec (or similar document) is made available via simple dereference, and while this may be OK for access by humans, it leaves open the question of whether what you get when you get an OK is a usage spec or the denotation (the document) and makes reliable processing by machine difficult.

Two solutions have emerged for use with HTTP, and we recommend their use. In both cases one obtains a second URI that is we call here the usage-spec-name for the term; the usage-spec-name may then be resolved to the usage spec.

  1. #-truncation - the _metaterm is the URI with the part including and following the # dropped
  2. 303 See Other response - the usage-spec-name is the value of the Location: header in a See Other HTTP response.

Although these conventions are not in universal or exclusive use, they are of value when you know that one of the conventions is in use, or when the agent is forgiving enough to tolerate situations where the putative usage-spec-name doesn't, or isn't known to, lead to a usage spec.

Note: {why new terminology?}

Here are some excuses for not reusing certain terms from RFC 2616 or Architecture of the WWW.

thing (instead of "resource") - I really mean anything, not just the resources considered in RFC 2616. Alternative: "entity" (this is being argued on the www-tag list)

locator (instead of "URL") - "URL" is defined quite broadly in various RFCs; I mean to restrict it to the least common denominator among deployed web agents

usage spec - I'm still searching for a term for definition-like things that I feel comfortable with. I have used "definition," "defining description," "defining document," "declaration," "declaration document," "correct use specification" (CUSP), "correct use recommendation", "normative description", "agreement for use", "license to use", "deed", "statement of applicability", "recommendation for use", and many variations. The idea is almost the same as "declaration page" in Booth's article, except that here it is required to be RDF, and in the terms of Architecture of the WWW it is really more of a information-resource-essence than a "page".

Future work

[Not on this note, that is - work to be left until after the note the is done.]


Appendix A: URI Resolution

A name may be associated with either of two kinds of document:

  1. the document that is the name's referent, if the name denotes a document
  2. a graph that is to be used to specify appropriate use of the name (usage spec), if any
The first of these is called simply "resolution" or "URI resolution," while I'll call second "meta-resolution".

To resolve a name, a set of applicable asserted resolution rules is found (perhaps via query). Rules are meant to resolve names to their referents or names to their usage specs. Often this is done by replacing the name with another name: either a synonym, or, in the case of meta-resolution, a second name that denotes the first name's usage spec (a "usage-spec-name").

One standard resolution rule expresses the common treatment of # URIs: The URI's racine (the part before the #) is specified to be a _metaterm (`'_term for its usage spec).

The default (when no rule applies) is to attempt to dereference (or meta-dereference) the URI. This means using standard protocols (cf. IANA URI scheme registry) guided by the spelling of the name. Some URI schemes, such as ftp: and data:, only specify how to dereference, while others may give separate methods for dereference and meta-dereference. An important third case is that of the HTTP protocol, where the distinction has been overlaid on existing practices. (A protocol designed with the usage spec/denotation distinction in mind would have simply provided two different access methods for the two cases. You know who you are.) With HTTP you can't say ahead of time which document you're looking for; you have to use the single operation (GET) provided to retrieve one of the two, and the HTTP response code lets you check to see whether what you got is what you wanted [cite httpRange-14]:

(A document denoted by two names can both resolve and meta-resolve: one name dereferences to the document (200) while the other meta-resolves. The synonym relationship can be established using a resolution rule. -- enough to make you want to invent a new protocol that fixes this problem, huh?)

These two strategies failing, a search (manual or automated) might be mounted using a search engine or a plea sent to an individual or community that might know how to resolve the name. As this is likely to be a bit of work, any resolution information that turns up ought to be passed along to anyone receiving communication from you that uses the name.

Summary of resolution tactics:

Situation To get usage-spec-name
(meta-resolve)
To get usage spec
(meta-resolve)
To get referent (resolve)
1. resolution rules redirection redirect rules,
then usage-spec-name rules
get usage-spec-name,
then resolve
redirect rules,
then resolve
other tactics TBD
2. dereference
# URI #-truncateget _metaterm,
then resolve
N/A
http, https schemes GET to 303get usage-spec-name,
then resolve
GET to 200
other schemes per protocol
3. cast a wide net

Summary of resolution rules

The purpose of resolution rules is essentially to deal with the "broken link" problem on the client side. It acts as an insurance policy that protects against a situation where a document (including a usage spec) is available, but not by presenting the name to an HTTP client module. This can happen when content moves, when mirrored content is unavailable at its primary location, or when someone decided (against the advice of this document) to mint a non-HTTP URI.

A broken link on the "document web" leads to inconvenience to the human reader during navigation. Broken links are generally repaired quickly because the server operator is usually motivated to make the site content work well for visitors. The operator learns of a broken link either automatically through validation and error reporting, or through complaints lodged by readers.

With the expansion of the use of URIs from navigation to use in meaningful assertions, a broken link becomes a threat to any kind of interpretation of the page, and therefore jeopardizes the value of the document per se. At the same time, the demand for shared meaningful _names will lead to the use of _names whose accessibility is not highly reliable or durable.

The worry is not so much over content loss as over loss of opportunity: the failure to connect a _name in use with information that will make it meaningful. It is essential therefore that uses of an unresolvable name by connected somehow to documents found in secondary locations. This must be done in a way that does not require involvement of the original publisher, who may be defunct or may simply not care.

A related purpose of resolution rules is to allow the use of RDF written using non-locators by "low-tech" client software that only understands HTTP. This problem reduces to the first, as we may treat challenging URIs such as tag: URIs as we would broken links.

Resolutionrules are used simply by providing assertions giving the locations of usage specs and referent documents either specifically (one URI at a time) or generically (by URI string match and replacement). A producer of RDF includes in an RDF document a resolution rule for any names whose usage spec may be difficult for a consumer to find, and a consumer makes use of resolution rules using logic inserted at the point where any name is to be dereferenced.

We seek answers to the following questions:

  1. What URI may I use to access this document?
  2. What document is the usage spec for a given name?

Answers are written using the relations

  1. (document) tns:isDenotedBy (URI)
  2. (usage spec) tns:specifiesUsageFor (URI)

Trivial examples:

<http://www.w3.org/TR/rdf-concepts/> tns:isDenotedBy
   "http://www.w3.org/TR/rdf-concepts/"^^xsd:anyURI .
<http://www.w3.org/TR/rdf-schema/> tns:specifiesUsageFor
   "http://www.w3.org/TR/rdf-schema/type"^^xsd:anyURI .
The first just says that the document denoted by the name may be resolved by way of the URI that is the name. The second says that the /type URI usage is specified in the indicated graph; it doesn't say how that graph is to be found, which would require resolution.

Schematic rewrite rules permits the expression of rules that map terms to terms. There are two kinds of rewrite rules:

  1. Replacement for the purpose of further resolution
  2. Finding a usage-spec-name for the name (a name for the name's usage spec)

A _rule is an instance of one of the classes above, with two string-valued properties giving the input pattern and output template for the rule. Deductions about new ways to find denotations and usage specs can be made by instantiating the pattern and template at a particular URI.

For example, the rule

_ a tns:RedirectRule;
  tns:hasPattern  "http://stale.example.com/{more}";
  tns:hasTemplate "http://current.example.com/{more}".
says that any URI matching the pattern denotes the same thing as the corresponding URI matching the template (assuming either denotes something), and permits the inference
<http://stale.example.com/bland.png>
  tns:isDenotedBy
  "http://current.example.com/bland.png"^^xsd:anyURI .
which justifies the use of HTTP with the 'current' URI to obtain the document (or whatever) named by the 'stale' URI.

As an example of name to usage-spec-name conversion,

_ a MetaTermRule;
  tns:hasPattern  "{schemepath}#{frag}";
  tns:hasTemplate "{schemepath}".
permits inference (assuming the URI denotes anything at all) of
<http://example.com/hashola>
  tns:specifiesUsageFor
  "http://example.com/hashola#"^^xsd:anyURI .
which justifies the use of http://example.com/hashola as a name for http://example.com/hashola#'s _deed. (This _term can then be further resolved to get the usage spec itself.) [Probably inaccurate syntactically since the # might be in a query string, etc. - do we need more powerful matching?]


Figure legend:

  • Black arcs represent relationships that hold in the absence of any resolution rules.
  • Introducing the redirection from the first name to the second (solid red arc) permits the inference that the second name denotes the same thing as the first (broken red arc), or, equivalently, that one might access the thing using the second name if the first name doesn't work. Ordinarily one would assert the denotes' relationship directly, but a redirection might be induced by a generic rule or a 301 ("Moved Permanently") HTTP response.
  • Introducing the usage-spec-name relationship (blue arc) establishes that the graph denoted by the usage-spec-name is a usage spec for the _name (broken blue arcs). Ordinarily one would assert the 'tns:specifiesUsageFor' relationship directly, but a usage-spec-name relationship might be induced by a generic rule or posited from a 303 ("See Other") HTTP response.
  • Given that the graph specifies usage for the first name and that the second name may be substituted for the first, we conclude that the graph specifies usage for the second name (purple arc).
  • If the thing is a document, then the name will resolve to it, via either standard protocol, another name for the same thing deduced using resolution rules, or some other procedure.
  • Redirection is stronger than owl:sameAs, which has no imputation that the terms have the same usage spec. However owl:sameAs can be used for alternative resolution by denoting the thing in the owl:sameAs assertion using two different names.

Acknowledgments

Special thanks to David Booth for help with document organization and technical issues.

The following people commented on drafts: (reverse chronologically) Dan Corwin, David Booth, Alan Bawden, Sankar Virdhagriswaran, Gerald Jay Sussman, Jake Beal, Eric Prud'hommeaux, Bijan Parsia, Mark Tobenken, Chimezie Ogbuji, Kaitlin Thaney. Thank you.


TBD

  1. DB: Make clear the difference between 'establish' and 'mint' at the outset of that section. -- DONE
  2. similar predicate -- suggest rdfs:comment or a subproperty (but not OWL DL) vagueness OK (suggested further down.) --- fixed OK?
  3. although development of the web began with web documents, to get a good account we'll start with other things, then we'll do documents at the end.
  4. 'locator' -- not a good term -- racine? fragmentless http uri (FHU)?
  5. specification example relies on the other names having been specified well enough
  6. revisit the underspecification discussion
  7. that someone has said it's a spec is necessary but not sufficient. you don't have the 'authority' to expect people to believe nonsense.
  8. choose the more general thing and subclass it, rather than using the most common similar name. Or use union or intersection. Or...
  9. define a notion of SW statement... SW stmts are made in terms of growing set of languages... not in conflict ...
  10. allow usage specs that permit revocation.
  11. create specific properties for: example of usage, citation
  12. haven't mentioned sparql endpoint as publication strategy (but how would one know where to go, and what the graph was? select ?s ?v ?o from , to get spec for g?)
  13. best effort usage lifetime -- subject to contract
  14. Include a non-http use case (LSID and/or other), how to do redirects.
  15. DB prefers 'uri' to 'name'.
  16. If you know where something is, tell people.
  17. If you, the publisher of some RDF (such as a US), own existing non-http URIs, redirect them to your http uris for the same thing.
  18. Remove the 'resolves to' arrows from the diagram
  19. Fixed doc, stable doc, unstable doc, endpoint are all continuants. continuant = something that's fully present at any point in time
  20. qualities are (dependent) continuants.
  21. "generic document" is in use; consider using it somehow
  22. More prominence to link between statements in RDF and statements about documents. (Metadata)
  23. EricP: mine EARL (Evaluation and Repair Language) for vocabulary
  24. We don't explicitly connect RDF documents to stuff that's interpreted
  25. Make a note that you need to take care -- some statements are about RDF documents, etc.
  26. Say up front that we're going to get the things case right first, then do documents.
  27. Dan Corwin: talk about 'named graphs'; talk about how class definition gets you modularity in an individual definition; talk about how usage spec of a doc is a kind of metadata for the doc
  28. Avoid using <> or other relative URIrefs
  29. Title: "Naming Things" or something that has "URI" in title; Establishing URIs as Names for Things; "How to Name Things"; "Naming Practices for a Semantic Web for Science"; "Naming Practices for a Sustainable Semantic Web"; "Sustainable Semantic Web Naming Practices" ...
  30. Eric P wants to write an HTTP-focussed version ... push through
  31. Emphasize informal nature of definitions
  32. {Note Expensive Procedures}
  33. Things that challenged Jake: the #-truncation and 303 diagrams final slide that reverts to document instead of thing
  34. figure out how/whether to talk about layered protocol idea including whether there are any signals that say the protocol is being followed
  35. 2xx does not imply document - may be just an endpoint - or may (ugh) be the name's usage spec (e.g. Pat Hayes's famous page)
  36. 'statement of acceptable use' 'statement of allowed use' 'recommendation for use' RFU (of a term)
  37. Draft: update policy versioning story
  38. comparison with AWWW.
  39. compare usage spec to Boothian declarations - in a note