Friday, November 21, 2008

XML Namespaces

When I was first learning XML namespaces, one of the things that confused me most was the use of URLs (eg. xmlns:sca="http://docs.oasis-open.org/ns/opencsa/sca/200712"). For some reason, I was convinced that this is how an XML document pointed to its XML schema for validation. It took me a while to realize that the fact the namespace is [usually] a URL makes NO DIFFERENCE whatsoever to the XML document. In fact "anyStringWhatsoever" would work just as well, so long as it is universally unique. In fact, the concepts of XML namespaces and XML Schema aren't even defined by the same spec (http://www.xml.com/pub/a/2005/04/13/namespace-uris.html vs http://www.xml.com/pub/a/2005/04/13/namespace-uris.html).

So how do XML instance documents identify their XML Schema?
From www.w3.org/TR/xmlschema-1:

Schema Representation Constraint: Schema Document Location Strategy
Given a namespace name (or none) and (optionally) a URI reference from xsi:schemaLocation or xsi:noNamespaceSchemaLocation, schema-aware processors may implement any combination of the following strategies, in any order:
1 Do nothing, for instance because a schema containing components for the given namespace name is already known to be available, or because it is known in advance that no efforts to locate schema documents will be successful (for example in embedded systems);
2 Based on the location URI, identify an existing schema document, either as a resource which is an XML document or a element information item, in some local schema repository;
3 Based on the namespace name, identify an existing schema document, either as a resource which is an XML document or a element information item, in some local schema repository;
4 Attempt to resolve the location URI, to locate a resource on the web which is or contains or references a element;
5 Attempt to resolve the namespace name to locate such a resource.
Whenever possible configuration and/or invocation options for selecting and/or ordering the implemented strategies should be provided.


A couple of observations: First of all, why do they use an ordered list here if the order is not supposed to matter? Secondly, at least my preconceived notion of the link between namespaces and schema made the list (coming in at #5).

So why is it that just about every namespace in use is defined by a URL?

http://www.w3.org/TR/uri-clarification/

The XML Schema spec actually defines the namespace string to be a "Universal Resource Identifier" (URI). A URI is then further broken into 2 types: Universal Resource Locator (URL) and Universal Resource Name (URN). The reason most (certainly not all) schemas use URLs is because they are dereferenceable(aka you can go there). Basically, a dereferenceable URI gives us 2 main benefits:
1) Creating UUIDs is hard...especially UUIDs that have a shot of being meaningful/informative to human readers. The web's address system (IP, DNS, and whatnot) is a scalable, performant, and proven solution in this space.
2) Providing a URL gives the document reader a place (docs.oasis-open.org/ns/opencsa/sca/200712) and method (http) to check for more information...or even look for a schema (eg. #5 above).

For this reason, if you are defining your own namespace, please use a URL, preferably in a domain you control, and pretty please put page up at that location to describe the namespace (links or text describing what it is and what its intended to be used for). Dumb users like me will probably go there to find the schema.

http://www.xml.com/pub/a/2005/04/13/namespace-uris.html

No comments: