Science Commons

SWHCLSIG Note on Sustainable Naming

Authors' Draft 40, 2007-11-06 04:56

Current version:
http://sw.neurocommons.org/2007/uri-note/
This version:
http://sw.neurocommons.org/2007/uri-note/40/
Previous version:
http://sw.neurocommons.org/2007/uri-note/39/
Authors:
Jonathan Rees, Science Commons
Alan Ruttenberg, Science Commons

Abstract

This document is a request to the community of producers and consumers of RDF to follow certain practices around the use and resolution of URIs. The advice is formulated with the goal of promoting meaningful exchange and recombination of RDF artifacts and to help protect the meanings of these artifacts against the ravages of time.

The intended audience of this note is workers in the areas of life science research and health care who are building semantic web resources and tools, but it is hoped that it will be useful to others as well.

Status of this document

This is an authors' draft with no official standing.

The title attached to this draft is provisional.

This is a draft of a document written in response to needs expressed by the W3C Semantic Web Health Care and Life Sciences Interest Group (HCLS). It is intended for publication as an Interest Group Note on w3.org. Before publication there, it will undergo a round of revision and will be made to conform to standard W3C document and policies. We anticipate a few more authors' drafts before a final version is published.

How to comment on this draft:
Please put your comments, accompanied by the draft number (this is number 40), on the DraftTalk wiki page, or send email. We will do our best to address all concerns and record dissenting views fairly.

URI Note home:
http://esw.w3.org/topic/HCLSIG_BioRDF_Subgroup/Tasks/URI_Best_Practices/Recommendations

Recent changes:
11/06 (40) tweak text about "errors in terminology"
11/06 (40) make sure title lies within first 512 bytes
11/06 (40) explain name vs. URI/IRI
11/06 (40) pushed it through the html validator
10/31 (37) change 'naming authority' to 'minting authority'
10/31 (35) change 'URI owner' to 'naming authority'
10/30 (33) more explanation of why you want resolution rules
10/30 (33) further de-emphasize resolution rules in main text
10/27 (31) replace figure series with single omnigraffle figure
10/27 (31) rework resolution appendix
10/27 (28) rework section on documents

Introduction

This document is a request to the community of producers and consumers of RDF (see note: {what is RDF?}) in science and medicine to follow certain practices around the use and resolution of URIs. The advice is formulated with the goal of promoting meaningful exchange and recombination of RDF, and the proposed solutions are meant to protect investments made in composing RDF by enabling a sustainable Semantic Web infrastructure. Much published RDF will be essentially static because incentives for ongoing maintenance are often missing and because provenance and trust require stability of content. Therefore it is important that RDF be published so that consumers will be able to use it unchanged for as long as the material is of any potential interest.

[?46 Give a non-technical intro here. how science is done, how semantic web/RDF can help. -JW]

The focus is the use of URIs, as names of particular things or as otherwise used technically. Names that denote particular things (individuals, relationships, and so on) may be used sensibly in statements.

Failure to handle URIs carefully leads to errors, inconsistencies, and lost opportunities. The specific sources of these problems include:

The use of URIs to denote is in contrast with the conventional use of URIs as specifying communication endpoints. This can be confusing because sometimes there is harmony between the two - a communication endpoint can deliver a document, and we can use the URI to talk about that document - and sometimes there isn't, as when a URI names a star or a chemical element. When a URI is used both ways, but the endpoint delivers a document while the URI is used to denote a non-endpoint, we have a kind of pun and special care must be taken not to confuse the two senses. A tissue culture is not an HTML document.

The WWW is, among other things, a rich, universally deployed network of documents. The Semantic Web movement leverages the foundation of hyperlinking into a knowledge representation infrastructure, with URIs used not just for navigation but as terms that allow RDF to be used as a language of discourse. URIs retain their use in linking, however, allowing a single package (the URI) to bundle two different kinds of reference: to a thing and to a communication endpoint that is about the thing.

Sustainability principles

In order to enable RDF graphs that are meaningful and useful to both humans and machines, and that can be meaningfully combined with other, independently developed, graphs, the names that the graph uses must meet certain quality benchmarks. We say a name is sustainable when it obeys the following principles:

  1. a specification of intended usage of the name ("usage spec") can be found by an agent reading RDF that uses the name,
  2. there are no other usage specs for the name that might mislead readers,
  3. the usage spec is consistent and clear, and
  4. the usage spec specifies the name's referent.

These principles are intentionally imprecise; they specify general requirements, not particular solutions. This document aspires to contribute to practice that achieves these principles.

Structure of this note

The document starts by describing an approach to specifying the intended usage of names that builds on current practice. The following two sections give a sort of "protocol" for finding usage specs for names and for establish usage specs for new names. Finally a treatment of the case of web documents is given. To make the outline of the argument easier to follow, details on a number of topics are relegated to endnotes.

An appendix presents a simple idea that would help to preserve deterministic access to documents when they disappear from their original location, or in other circumstances where the URI is not an effective locator on its own.

Specifying usage

The central idea proposed here is that associated with every name in use there is a document that specifies when the name should and shouldn't be used. This usage spec establishes the ground rules that enable clear communication using the name, and therefore enables substantial statements about the name's referent that can contribute to discourse. If someone uses the name, the use should, if one wants to follow present advice, be consistent with what the name's usage spec says. If the usage spec is not acceptable, a name possessing a more suitable usage spec should be used.

An analogy to traditional scientific literature may be helpful. Because a term used in a document may have a variety of meanings, it is good practice to provide a reference to a publication that defines the term in the way in which one would like to use it. On the semantic web, a URI bundles both functions, use in discourse and reference to definition, into a single string.

A complete usage spec consists of the following:

Example:

  specimen:S05-100_A_1_2.3 
    a dicom:Specimen ;
    dicom:from_patient patient:65536 ;
    dicom:collected_using_machine dicom:AVUTRIX_MULTIPLE_B7792 ;
    dicom:collected_on "2007-08-07"^^xsd:date ;
    rdfs:label "S05-100_A_1_2.3" ;
    rdfs:comment "skin puncture from left elbow" .

could serve as a usage spec for the name specimen:S05-100_A_1_2.3. By saying that this is a usage spec, we would be saying that the name should be used to denote the intended specimen. (Note, by the way, that none of the other names mentioned in this usage spec denote documents.) [?80 this has no example] [source]

To understand the difference between a usage spec and other discourse involving the name, it is helpful to consider what happens if a statement involving the name turns out not to be true. In ordinary discourse, which one would hope to be the bulk of the RDF written, a false statement means that someone has made a mistake about the subject matter. Statements in usage specs, on the other hand, are true "by definition." A judgment that a statement in the usage spec is false, if based on independent information expressed using the name, probably means that the independent information is using the name contrary to the usage spec, and therefore reflects an underlying confusion of terminology, not of fact.

The formal component may be taken as "universally true" statements about the referent, i.e. axioms that are supposed to be assumed true in any context where the name is used. [-MD] If policy statements are absent, the usage spec should be assumed to be stable (the meaning won't change) and durable (meaningful into the indefinite future).

[?82 need an example of a nonlogical usage spec -MD]

Prose description and RDF description are intermixed by placing the prose in a literal string related via rdfs:comment.

A usage spec will itself use a number of other names, and a full specification of the name would in principle require an understanding of those names.

Underspecification - usage spec that is not very specific - is to be discouraged, as it may easily lead to confusion.

Being a usage spec for a name is a special relationship. A usage spec sets the ground rules for any other use of the name; a graph that is about the name's referent may be very interesting, but it does not necessarily specify usage. Whether a graph is a usage spec generally depends on whether the minting authority has said or implied that it is (see note: {minting authority} for details). Later we will suggest particular publication methods for usage specs that establish their special role with respect to the name.

Because the name's usage spec is determines what the name means and how to use it, the problem of finding usage specs, which are distributed around the network, is quite important. Stability of a usage spec is also important: changing a usage spec is a recipe for confusion, as different users of a name may rely on different versions of the usage spec without being aware that the change has occurred.

[?49 transition missing]

Using an existing name

When attempting to express something in RDF, it is necessary to choose the right terms to use.

  1. Do some research to track down names that are already in use that might be useful to you, and their usage specs.
  2. Use an existing name when it has the right meaning. Make sure that fundamental aspects such as type, domain, and range are defined as appropriate.
  3. However, use an existing name only when it satisfies the sustainability principles (above).
  4. If a name refers to a document or database record, do not use the name to refer to the thing described by the document or record, or vice versa. (A potato can't have an author, but a document describing a potato can.) If the thing and the document or record both need to have names, the names must be different.
  5. When there is a choice among existing appropriate names, choose the name that
    • best matches your intended use
    • is in widest use
    • has the most easily located usage spec
    • has the best (clearest, most accurate, most consistent) usage spec
    These criteria are often in conflict, so balancing these criteria may require judgment. Seek advice from the communities of users of the names if you're not sure.
  6. If you've done your best to find a name that serves, and failed, establish a new name (next section).

Minting and establishing new names

A new name should be established for a new meaning. Establishing a term consists of 'minting' it (deciding what its spelling should be), composing a usage spec, and publishing the usage spec. The overall requirement is to establish a name that satisfies the sustainability principles (above [section#]).

Should a new name be needed, establishing a new one requires these steps:

  1. Justify any decision not to use an extant name by appealing to the sustainability principles.
  2. Decide what the name should be (i.e. its spelling - the way it's written)
  3. Compose a graph that is to become the name's usage spec
  4. Publish the graph in such a way that others will know that it's the name's usage spec

Names that are to denote documents may be established simply by publishing on the web; see next section.

Reuse or mint?

[?51 examples subclass, union, restriction -- not all agents can deduce via these -- but we should insist]
[?52 special note on sameAs -- extremely strong -- note: {equivalence discussion}]

Minting a name

In principle, the spellings of names do not matter beyond being different from one another, and in many naming schemes (e.g. Genbank) names are simple accession numbers. Additional influences on spelling, here, come from (a) the desire to resolve and (b) the fact that humans sometimes come across names and need a mnemonic or hint as to what they're for.

  1. In order to avoid accidental collisions, mint only names that fall within some region of URI space that you control; for example, URIs specifying a domain that you own. See note: {minting authority} for details.
  2. Don't reuse a name that you, or any previous minting authority of your region of URI space, previously established. (See (collision avoidance hack) for one collision-avoidance tactic.)
  3. As the HTTP protocol is widely deployed, you are advised to mint names that are http: or https: URIs. See note: {minting nonlocators}. Doing so is not to be taken to imply that the name denotes a document.
  4. Because of the overhead of accessing large files, a racine should not be shared (i.e. the same racine used with many fragment ids) among a large number of terms. Moreover, sharing a racine to any extent at all makes delineation of separate usage specs for the names difficult, if not impossible, so this is discouraged as well. [See wiki discussion RacineSharing]
  5. To permit use of the RDF/XML form of RDF, if the name is to denote a property, the URI must end in an XML NCName (roughly speaking, a sequence of characters from the set {letters | digits | "_" | "." | "-"} that begins with a letter or with "_"). (See note: {NCName pragmatics} for related practices.)
  6. Choose names that steer users away from unintended interpretations. For example, if a name is to denote a property, choose the name so that there can be no confusion about the direction of the relationship: ex:hasCapital (or ex:has_capital) or ex:isCapitalOf (or ex:is_capital_of), but not ex:capital. [?54 This is apparently controversial... TimBL suggests that one can say 'capital' in the rdfs:label, and then it doesn't matter (to him) what the URI is. -- Find a reference to a document that advocates for this other position.] [?55 explain why the name is important even though the name isn't important - readable Turtle]

Composing a usage spec

  1. Compose RDF statements that provide clear guidance on use. The statements should specify single and particular usage. (See note: {what is RDF?}.)
  2. A usage spec should not be in conflict with the semantics of an established RDF-based language. [?56 rework this. provide positive and negative examples.]
  3. The declaration should be specific enough to rule out unintended usage, but not so specific that it overcommits and fosters inconsistency or discourages reuse.
  4. Provide prose in an rdfs:comment property. Doing so is necessary except under unusual circumstances.
  5. Provide a formal definition, wherever possible, that would allow a deduction system to discover inconsistencies or the lack thereof.
  6. Usage specs should be considered irrevocable (see note: {about inconsistency} and note: {versioning}) unless otherwise specified. Compose the RDF defensively; proper use by others (according to this account) will be determined by what the usage spec says, not by what you might say later.
  7. Simply naming a name is never a substitution for articulating a usage spec. It is not self-evident that the URI http://example.org/mbl-lillie-building refers to the Lillie Building at MBL; the denotation must be established with a usage spec. [?57 specifically?]
  8. Take a stand on time dependence. [?58 need example, e.g. age, height] Anchor statements at a particular time (permitting statements that assume that time) or clarify that the name only denotes whatever is time-invariant about something.
  9. rdfs:label assertions are a courtesy to user- and developer-facing interfaces. We suggest the use of short textual labels that can be placed in (for example) lists and menus in user interfaces.
  10. Every usage spec for an individual should establish an informative rdf:type for the name's referent using appropriate RDF statements. (owl:thing is not informative.)
  11. [?60 move or flush? move to intro? discuss in note.] A property should be given a nontrivial domain and range.
  12. [?61 move or flush?] A class should be asserted to be a subclass of some nontrivial class, when one exists.
  13. [?62 When declaring a name, ] Also publish statements relating its referent to other things (see note: {assertions that are not constraining}).
  14. [?63 example of usage is very helpful. cf. obo minimal metadata]
  15. [?64 citation]

Publishing a usage spec

  1. Publish the usage spec in such a way that one can unambiguously determine that it is a usage spec for the appropriate _term.
    • If the term has a fragment id (is a # URI), publish the usage spec at the "racine" (the 'racine' is the URI formed by dropping truncating the URI starting with the #)
    • If the name is a locator, publish the graph to permit "meta-resolution" to the usage spec (see note: {where do usage specs go?}).
    [cite httpRange-14 and other discussion about this]
  2. Make your best effort to ensure that the name's usage spec is accessible for the lifetime of any RDF that uses the name, unless duration of applicability is explicitly limited in the usage spec. Unfortunately this is difficult; see note: {how to get persistence}.
  3. Once you publish the graph, others will start to depend on what it says, so the name with your given meaning effectively becomes community property. Never redeclare a name in a way that might break or confuse others' use of the name. (See note: {versioning}).
  4. Avoid chatter [?81] that has a chance of being not true -- mixing mere descriptive statements with the essential statements that specify a name's usage weaker the usage spec. Instead the usage spec should link to a second document containing the non-contingent statements [?66 check that the rdfs:seeAlso can be nonconstraining; see note: {assertions that are not constraining}].

Talking about documents

Loosely speaking, the "web of documents" is grandfathered into the semantic web, as described here, by considering web documents to be named by their URIs. If a document is obtained when interpreting a name, then by convention the name is taken to refer to the document. The document is not a usage spec for the term: a usage spec is part of the discourse being conducted in RDF, while a document is merely one more thing that one might talk about. In contrast to usage specs, by mentioning a document there is no expectation that its contents are supposed to be believed.

While establishment of usage specs for document-denoting names would often be helpful - one could state type, title, authorship, revision status and so on ("metadata") - this is difficult, at least using HTTP, and one might be forgiven for not providing one. See appendix A, below, for a hack for simultaneously publishing a document and its RDF-encoded metadata using HTTP.

In order to be considered true, a statement involving the name must apply not just to what has been observed at the time the statement was written, but also to what will be observed when someone else is trying to understand or use the statement. Server responses vary over variation in request details such as requested language or , as well as time.

Excluding time variation, any particular server response should be taken to say that the name denotes what the response communicates, not any particular byte sequence or sequences. That is, statements about documents (or at least those not varying in time) talk about what the document says irrespective of representation details.

To articulate the allowable inferences about documents that can be made based on server response, it is proposed here to classify documents according to consistency criteria:

fixed document
all server responses are identical (particular bytes + their metadata; LSID "piece of data"; referent of a data: URI; similar to "representation" sensu Architecture of the WWW)
stable document
responses vary only over representation details, not underlying content (they all say the same thing) (e.g. http://www.w3.org/TR/2003/PR-rdf-concepts-20031215/)
document
responses may vary over time, but are consistent enough to permit something to be said (author, topic, etc.) (e.g. http://www.w3.org/TR/rdf-concepts/)
endpoint
a "network data object or service" (quoting RFC 2616); no a priori consistency guarantees of any kind; may not even handle GET at all; requires an independent explanation when used in RDF

If one of these types is given as the domain and/or range of properties, then not only can one state the assumptions under which statements are made, but having communicated those assumptions, further inference is enabled from inspection of the document. For example, if we know that x has a length, and that only fixed documents have a length, then we can infer that x is a fixed document and therefore that it also has a checksum, which can be computed by reading the octets of x.

Advice around documents:

  1. For every use of a document-denoting name in RDF, make sure that a consumer will be able to understand the assumptions under which that statement was made.
  2. Provide the strongest possible consistency assurances, in order to enable the greatest number of inferences.

Glossary

name
a URI; used with the connotation of having the potential for meaningful use in RDF, and without the suggestion that one "identifies" something or that what's named is a "resource" (see note: {why new terminology?}); also a way to sidestep the URI vs. IRI switch
locator
a http: or https: URI that has no fragment id (#); defined technically and syntactically for the purposes of this document
denote
to name, refer to, or designate something (see RDF semantics)
referent (of a name)
the thing (individual, class, property, etc.) denoted by the name
graph
a set of statements written in RDF
usage spec (for a name)
a graph that is used to specify the intended usage of name.
well-established
[?67 tbd]
spelling (of a name)
the name taken as a string (i.e. stripped of its role of referring); a URI.
minting authority
someone who has the right to specify a name's usage (see note: {minting authority}
resolve (a name)
obtain the document denoted by the name
meta-resolve (a name)
obtain a usage spec for the name
dereference (a name)
attempt to obtain the name's referent using the protocol implied by its URI scheme
meta-dereference (a name)
attempt to obtain a usage spec for the name using the protocol implied by its URI scheme
property
a two-place predicate; roughly speaking, a transitive verb. This term is a misnomer - being green is a property, not the has-color relationship isn't - but is firmly established in RDF lore.

Notes

Notes are in alphabetical order.

Note: {about inconsistency}

Assume that anything that can go wrong, will.

We've suggested various ways to find usage specs (Appendix A). If a minting authority has published more than one usage spec over time or in different places, or if someone has taken it upon themselves to change a usage spec and pass it off as correct (perhaps using a resolution rule), then disagreement among uses is likely, and confusion will ensue.

Usage specs that differ in inconsequential ways - that is, that neither broaden nor narrow the applicability of the name - are not in conflict. For example, a newer usage spec may give more examples or explanation than another, or provide statements (such as rdfs:seeAlso statements) that would not affect the correct use of the name.

There is no formulaic way to solve true conflicts. Other things being equal, priority should go to the first usage spec for the name published by whomever was the minting authority at the time, or to a usage spec compatible with it. Subsequent published usage specs may be inconsistent with published use and are disruptive. They should be examined with skepticism.

However, there may be rare circumstances in which it is preferable to use a revised usage spec - for example, a usage spec may be internally inconsistent in a nonobvious but easily fixed way.

If there is inconsistency, a usage spec found via a resolution rule included in an author's RDF - leading to, say, the particular version of the usage spec consulted in composing the RDF - is more likely to reflect the author's intent than one that was not so cited. Ultimately it is up to the community of users of the name to determine how to solve conflicts.

Note: {assertions that are not constraining}

When you establish a name (or after), some of what you say is meant to be constraining on all uses, while some of what you say is incidental: either advisory, hypothetical, or unimportant. A discovery that incidental information was incorrect would not force a retraction of a usage spec.

(Austen's novel Persuasion is about persuasion and uses the word 'persuasion' to denote persuasion, but it doesn't specify how others are supposed to use the word 'persuasion'.)

Where do we put RDF statements about a name's referent that are not supposed to be constraining on the use of the name? We have no syntactic marker in RDF that can separate set of statements from another.

One solution is to grandfather existing ontologies by saying that this separation is an informal process or is simply not specified by this note; look elsewhere for guidance.

Another approach is to take all statements as constraining. The non-constraining statements should be placed in a separated document and a relation placed in the usage spec relating the usage spec to the secondary description via a predicate such as rdfs:seeAlso. (This is roughly the answer given here. [See wiki discussion DefinitionDelineation.])

Note: {collision avoidance hack}

A way to help protect against accidental collisions over time (accidental publication of an inconsistent usage spec or other document, under a given name, by a future site administrator or minting authority) is to have the path component of the URI contain "site version" information in the form YYYY, YYYY-MM, or YYYY-MM-DD (example: http://www.w3.org/2001/XMLSchema). Future administrators following this convention will either use no date or will put a different date in the URIs they mint. See [cite RFC 4151 tag: URI] for further information on this convention. [?68 this practice must be detailed somewhere, but where?]

Note: {convert nonlocators to locators}

[?69 no reference in text]

Tools that care about accessing things (endpoints, usage specs, etc.) should understand use of resolution rules, so that they can properly implement relocation and redundant sourcing.

In particular, there is often occasion to present names in a web browser or other user-facing interface. When arranging this, be prepared to link to a usage spec or other appropriate document using a browser-friendly URI, e.g. by routing through a proxy. The name's spelling may be an inadequate locator for many browsers (e.g. urn:lsid:, info:) or it may not lead to the correct usage spec. Observe resolution rules that will help generate a locator that can be used for hyperlinking. [?70 details - presentation is not same as usage spec]

[?71 Compare ARKs, handle proxies, etc.]

Note: {equivalence discussion}

[?72 TBD. Not linked from text yet. When to use/not use sameAs, equivalentClass, etc. Use in constraining/nonconstraining situations. When one of these constitutes a correct usage spec. The idea of hypothetical sameAses as a way to modulate precision and recall. blah.]

Note: {how to get persistence}

By persistence we mean the ability for a name to resolve to its referent (if a document) and meta-resolve to a usage spec over the potential lifetime of the name. This could be anywhere from seconds to decades, although it is the latter that we usually have in mind.

Persistence has two aspects:

  1. apparatus for the name to (meta-)resolve over time, e.g. using a forwarding service such as purl.org [?73 for discussion of purl.org see wiki discussion Purls]
  2. the accessibility and consistency of associated document(s)

These aspects are independent. A forwarding service may know about the name, but may not have a valid current address; while a document may be perfectly accessible on the network, but lack a persistent name (consider a hypothetical archiving service that changes its URIs every few years, or a highly replicated document whose replicas are each short-lived). The two services may be provided by different organizations.

Because persistence implies possibly outliving any individual or organization involved in establishing the name, and perhaps even their interest in keeping it resolvable, persistence requires long-term institutional commitment to identifiers and accessibility.

Sadly, there is no good formula for persistence at present. Persistent resolution is possible via purl.org, for example, but the name will only resolve if there is a place to forward to and someone to update the redirection when the document's location changes. The publishing industry and universities have infrastructure for persistence, but by design this infrastructure is exclusive and not conducive to spontaneity. Providing long-term persistence support to the scientific community -- permitting consistent resolution for at least as long as there are readers and writers who would like to understand the name -- is a social and administrative challenge that is waiting for a solution. In the meantime, we must do something, and this is why we say to make a "best effort" to make the usage spec resolve persistently.

The importance of persistence hinges on your attitude toward mechanisms such as resolution rules. If you believe that peer-to-peer resolution rules (or any similar mechanism) will be understood, then a persistence service becomes less important. If you believe that consumers you care about either will not understand resolution rules or will not have adequate rules, then a persistence service is more important than it would be otherwise. [See wiki discussion AttitudeTowardMigration.]

Note: {minting nonlocators}

Locators have the advantage of nonlocators in that they are more likely to lead to documents. Clients that do not understand a nonlocator natively, and that either do not understand resolution rules or have resolution rules that lead to usage spec or other document, may still be able to access the document if the URI is a locator. (Of course this is of no help if the link is broken.)

Rather than mint a non-locator URI, you can use a proxy service prefix to create a locator from the non-locator URI. Arrange, somehow, for everyone performing this transformation to use the same proxy prefix. State an equivalence (e.g. owl:sameAs) in case anyone uses the bare non-locator URI in RDF - or as a way of specifying what you mean by the proxy-relative form.

For a concrete case study see TDWG Life Sciences Identifiers Applicability Statement.

[For summary of this issue see wiki discussion AttitudeTowardNonlocators; also see wiki discussion AttitudeTowardMigration.]

Note: {minting authority}

For purposes of this document the minting authority of some portion of the URI namespace is defined to be the entity designated as being allowed to establish new name in that part of the namespace. For locators, the minting authority is the entity who happens, at that particular time, to be allowed to determine HTTP server behavior in that part of the namespace. This is determined by a delegation chain that starts with IANA [cite]; a common case is that the namespace region is that specifying a particular DNS host name and the minting authority is whoever is domain owner at that time. Minting authority coincides with the concept of "URI owner" as specified in section 2.2.2.1 of Architecture of the WWW, but here we are referring only to ability to publish a usage spec or document for a newly minted name. Once a name is in use it is community property, and in the unlikely event of a disagreement about correct use of the name, resolution must be reached by community process. The minting authority must be treated with respect, but the power of revision (through publication of a replacement web document) should not be used to the detriment of stability.

In the event a name is in use but its usage spec is unpublished, lost, or only ephemerally published - for example, if it is known only from use - and a new usage spec cannot be published for dereference via the _term, a new graph might be composed to correspond to community practice, and one might attempt to get the community to accept the graph as a specification to be followed. This new usage spec does not carry the "authority" that a statement from the original minting authority might, but it may be of use to the community. The new usage spec might be publicized using a resolution rule.

[?76 it's just a hypothetical situation, so should this idea be flushed? it's here because it's an illustration of how community process should trump priority in extreme circumstances and how there are no rigorous rules governing this process. DB's alternative: mint a new term having the new usage spec, then assert that the new one and old one are equivalent as names. ]

Note: {NCName pragmatics}

We encourage NCName suffixes, or at least SPARQL-liberalized-NCName suffixes, for all names. This helps make Turtle and SPARQL queries more concise. [?74 explain] [?75 explain bug in the RDF/XML spec, SPARQL's extension, etc.]

Note: {versioning}

TBD: A versioning story: database records, databases, ontologies, usage specs. It is not required to posit a solution but the topic needs to be discussed in the note. Why this is critical:

  1. The requirement that documents, especially usage specs, remain stable means we will need to be able to describe make a growing sequence of named stable versions.
  2. So is there any way to talk about "the latest version"? What about "what is common to all versions"? (that would have to do with classes.)
  3. How to deal with an unstable document such as a catalog of known versions of something (the LSID mutable metadata story)?
  4. Satisfy needs of would-be LSID users who want to defect to HTTP
  5. Is it worth giving (names for) relations relating RDF graphs, such as entailment?

Look at continuant/occurrent theory, DAV, etc.

See also about citation of consulted versions in note: {about inconsistency}.

Note: {what is RDF?}

For the purposes of this document, "RDF" means any of a growing family of declarative languages with mostly consistent syntax (mainly Turtle and RDF/XML, as of this writing) and a variety of deduction and entailment systems (RDF, OWL).

Do RDFa documents qualify as RDF documents? I.e. should we recommend using them as usage specs? Problems: (1) they don't have their own MIME types, so can't be recognized or requested, and (2) they don't work with # URIs.

Note: {where do usage specs go?}

The location for finding a usage spec is a problem because the HTTP protocol has no native way to provide it. Often the usage spec (or similar document) is made available via simple dereference, and while this may be OK for access by humans, it leaves open the question of whether what you get when you get an OK is a usage spec or the denotation (the document) and makes reliable processing by machine difficult.

Two solutions have emerged for use with HTTP, and we recommend their use. In both cases one obtains a second URI that is we call here the usage-spec-name for the term; the usage-spec-name may then be resolved to the usage spec.

  1. #-truncation - the usage-spec-name is the URI with the part including and following the # dropped
  2. 303 See Other response - the usage-spec-name is the value of the Location: header in a See Other HTTP response.

Although these conventions are not in universal or exclusive use, they are of value when you know that one of the conventions is in use, or when the agent is forgiving enough to tolerate situations where the putative usage-spec-name doesn't, or isn't known to, lead to a usage spec.

Note: {why new terminology?}

Here are some excuses for not reusing certain terms from RFC 2616 or Architecture of the WWW.

name (instead of "URI" or "IRI") - really a URI, syntactically, but "name" helps to better evoke problems and solutions appropriate to a semantic web context. "Uniform resource identifier" is not appropriate because (1) it is the naming scheme, not the identifiers per se that are uniform; (2) they aren't restricted to identifying "resources" - they can name anything at all; (3) it's not clear that they in general "identify" much of anything. An additional benefit is that "name" helps to sidestep the URI/IRI switch. Alternatives: "term," "word," "RDF word," "SW-word," ""swet" (semantic web term), "wwword," "term of art," "meaningful URI," "worm," ...

thing (instead of "resource") - anything, not just the resources considered in RFC 2616. Alternative: "entity" (this is being argued on the www-tag list)

locator (instead of "URL") - "URL" is defined quite broadly in various RFCs; I mean to restrict it to the least common denominator among deployed web agents

usage spec - I'm still searching for a term for definition-like things that I feel comfortable with. I have used "definition," "defining description," "defining document," "declaration," "declaration document," "correct use specification" (CUSP), "correct use recommendation", "normative description", "agreement for use", "license to use", "deed", "statement of applicability", "recommendation for use", and many variations. The idea is almost the same as "declaration page" in Booth's article, except that here it is required to be RDF, and in the terms of Architecture of the WWW it is really more of a information-resource-essence than a "page".

Future work

[Not on this note, that is - work to be left until after the note the is done.]


Appendix A: URI Resolution

A name may be associated with either of two kinds of document:

  1. the document that is the name's referent, if the name denotes a document
  2. a graph that is to be used to specify appropriate use of the name (usage spec), if any

The first of these is called simply "resolution" or "URI resolution," while I'll call second "meta-resolution".

To resolve a name, a set of applicable asserted resolution rules is found (perhaps via query). Rules are meant to resolve names to their referents or names to their usage specs. Often this is done by replacing the name with another name: either a synonym, or, in the case of meta-resolution, a second name that denotes the first name's usage spec (a "usage-spec-name").

One standard resolution rule expresses the common treatment of # URIs: The URI's racine (the part before the #) is specified to be a usage-spec-name (a name for the name's usage spec).

The default (when no rule applies) is to attempt to dereference (or meta-dereference) the URI. This means using standard protocols (cf. IANA URI scheme registry) guided by the spelling of the name. Some URI schemes, such as ftp: and data:, only specify how to dereference, while others may give separate methods for dereference and meta-dereference. An important third case is that of the HTTP protocol, where the distinction has been overlaid on existing practices. (A protocol designed with the usage spec/denotation distinction in mind would have simply provided two different access methods for the two cases. You know who you are.) With HTTP you can't say ahead of time which document you're looking for; you have to use the single operation (GET) provided to retrieve one of the two, and the HTTP response code lets you check to see whether what you got is what you wanted [cite httpRange-14]:

(A document denoted by two names can both resolve and meta-resolve: one name dereferences to the document (200) while the other meta-resolves. The synonym relationship can be established using a resolution rule. -- enough to make you want to invent a new protocol that fixes this problem, huh?)

These two strategies failing, a search (manual or automated) might be mounted using a search engine or a plea sent to an individual or community that might know how to resolve the name. As this is likely to be a bit of work, any resolution information that turns up ought to be passed along to anyone receiving communication from you that uses the name.

Summary of resolution tactics:

Tactic/situation To get usage-spec-name To get usage spec
(meta-resolve)
To get referent (resolve)
1. resolution rules redirection redirect rules,
then usage-spec-name rules
get usage-spec-name,
then resolve it
redirect rules,
then further resolve
other kinds TBD
2. dereference # URI #-truncate get usage-spec-name,
then resolve it
N/A
http:, https: GET to 303 GET to 200
urn:lsid: N/A getMetadata getData
other schemes per protocol
3. cast a wide net

Summary of resolution rules

The purpose of resolution rules is essentially to deal with the "broken link" problem on the client side. It acts as an insurance policy to protect against the situation where a document (including a usage spec) is available, but not by direct presentatopm to an HTTP client module. This can happen when content moves, when mirrored content is unavailable at its primary location, or when someone decided (against the advice of this document) to mint a non-HTTP URI.

A broken link on the "document web" leads to inconvenience to the human reader during navigation. Broken links are generally repaired quickly because the server operator is usually motivated to make the site content work well for visitors. The operator learns of a broken link either automatically through validation and error reporting, or through complaints lodged by readers.

With the expansion of the use of URIs from navigation to use in meaningful assertions, a broken link becomes a threat to any kind of interpretation of the page, and therefore jeopardizes the value of the document per se. At the same time, the demand for shared meaningful names will lead to the use of names whose accessibility is not highly reliable or durable.

The worry is not so much over content loss as over loss of opportunity: the failure to connect a name in use with information that will make it meaningful. It is essential therefore that uses of an unresolvable name by connected somehow to documents found in secondary locations. This must be done in a way that does not require involvement of the original publisher, who may be defunct or may simply not care.

A related purpose of resolution rules is to allow the use of RDF that is written using non-locators by "low-tech" client software that only understands HTTP. This problem reduces to the first, as we may treat challenging URIs such as tag: URIs as we would broken links.

Resolution rules are used simply by providing assertions giving the locations of usage specs and referent documents either specifically (one URI at a time) or generically (by URI string match and replacement). A producer of RDF includes in an RDF document a resolution rule for any names whose usage spec may be difficult for a consumer to find, and a consumer makes use of resolution rules using logic inserted at the point where any name is to be dereferenced.

We seek answers to the following questions:

  1. What URI may I use to access this document?
  2. What document is the usage spec for a given name?

Answers are written using the relations

  1. (document) tns:isDenotedBy (URI)
  2. (usage spec) tns:specifiesUsageFor (URI)

Trivial examples:

<http://www.w3.org/TR/rdf-concepts/> tns:isDenotedBy
   "http://www.w3.org/TR/rdf-concepts/"^^xsd:anyURI .
<http://www.w3.org/TR/rdf-schema/> tns:specifiesUsageFor
   "http://www.w3.org/TR/rdf-schema/type"^^xsd:anyURI .

The first just says that the document denoted by the name may be resolved by way of the URI that is the name. The second says that the /type URI usage is specified in the indicated graph; it doesn't say how that graph is to be found, which would require resolution.

Schematic rewrite rules permits the expression of rules that map terms to terms. There are two kinds of rewrite rules:

  1. Replacement for the purpose of further resolution
  2. Finding a usage-spec-name for the name (a name for the name's usage spec)

A rule is an instance of one of the above classes, with two string-valued properties giving the input pattern and output template for the rule. Deductions about new ways to find denotations and usage specs can be made by instantiating the pattern and template at a particular URI.

For example, the rule

_ a tns:RedirectRule;
  tns:hasPattern  "http://stale.example.com/{more}";
  tns:hasTemplate "http://current.example.com/{more}".

says that any URI matching the pattern denotes the same thing as the corresponding URI matching the template (assuming either denotes something), and permits the inference

<http://stale.example.com/bland.png>
  tns:isDenotedBy
  "http://current.example.com/bland.png"^^xsd:anyURI .

which justifies the use of HTTP with the 'current' URI to obtain the document (or whatever) named by the 'stale' URI.

As an example of name to usage-spec-name conversion,

_ a MetaNameRule;
  tns:hasPattern  "{schemepath}#{frag}";
  tns:hasTemplate "{schemepath}".

permits inference (assuming the URI denotes anything at all) of

<http://example.com/hashola>
  tns:specifiesUsageFor
  "http://example.com/hashola#"^^xsd:anyURI .

which justifies the use of http://example.com/hashola as a name for http://example.com/hashola#'s usage spec. (This name can then be further resolved to get the usage spec itself.) [?77 Probably inaccurate syntactically since the # might be in a query string, etc.] [?83 I think we need more powerful matching. Use whatever regexps POWDER uses.]

A diagram consisting of
shapes and arcs.  If you read the legend you'll probably be able to
understand what it's about.
Figure legend:


Acknowledgments

Special thanks to David Booth for help with document organization and technical issues.

The following people commented on drafts: (reverse chronologically) Michel Dumontier, Tim Berners-Lee, Dan Corwin, David Booth, Alan Bawden, Sankar Virdhagriswaran, Gerald Jay Sussman, Jake Beal, Eric Prud'hommeaux, Bijan Parsia, Mark Tobenken, Chimezie Ogbuji, Kaitlin Thaney. Thank you.


TBD

Not everything on this list will get done. Vote early and often.

The numbering will remain stable across drafts, even as the issue list gets reorganized.

Triage: Before HCLS F2F 2007-11-09

Triage: Before W3C authors' draft publication

Triage: Before publication as HCLS-blessed IG Note

Triage: Nice to have

Triage: DONE