One of the more significant problems in producing standard metadata revolves around domain specificity — the more precisely your metadata describes the resources in your library/domain/system, the less likely it is that another system, even one in a similar domain, will be able to understand your metadata. Most if not all organizations and systems that produce and maintain metadata work very hard at balancing metadata quantity and quality against the expense of producing it. One of the ways to do this is to have less but more precise metadata. Another way is to produce relatively imprecise crappy metadata, usually with some form of automated generator.
So how do you create a standard methodology that supports both approaches? A dozen years or so ago, Dublin Core provided a particularly elegant solution: a ‘simple’ set of ‘core’ metadata elements that could be ‘qualified’ by a 2nd set of elements that represented extensions of the simple 15 but could still be understood by those systems that didn’t want to mess with the extensions (using the unfortunately named DC principal of ‘dumb-down‘).
This 2-layer standard makes it possible for systems that can’t handle the more complex qualified DC to continue to interoperate with systems that use the richer extensions. It also makes it possible for catalogers to indulge their understandable desire for perfection and completeness while allowing relatively simple-minded systems like search engines to use their data with minimal conversion. Despite the fact that such conversions inevitably result in less precise metadata, the semantic integrity of the metadata is maintained because the people designing the standard thought hard about whether each qualified statement would retain its semantics in a less semantically-rich (dumber) environment. It also makes it possible to easily convert domain-specific simple DC to qualified DC by applying a few simple rules, meaning that systems that require qualified DC can still make use of simple DC.
So how does this relate to RDA? Well, one of the things that every community trying to define an RDF-based standard has to deal with is legacy data and systems that have traditionally stored property values as literals (strings usually) when identifying the property value as a resource may be far more desirable in RDF. One of the more important considerations is whether the ‘range‘ (what a RDF reasoner may logically infer the object of the statement to be) of an RDF statement (a single triple) will be a ‘resource’ (always a URI), or a ‘literal’ (never a URI), or no defined range at all. It’s an important distinction because many systems, when trying to build a complete graph from a set of RDF statements will attempt to ‘lookup’ (dereference) URIs, and won’t do that if a URI is the object of a statement whose range is defined as a literal. The same thing happens when a system attempts to incorrectly dereference a literal that it must infer to be a URI because of the range definition.
One solution to the problem is to not define a range at all, or define multiple ranges. This throws any system trying to understand how to process a triple a mild curve — the answer to whether the object of a statement is a URI has to be ‘maybe it is, maybe it isn’t’ rather than a simple yes/no. Meaning that the system will be forced to test it to determine the answer, or simply not care — different reasoners take different approaches.
But what happens to interoperability when every producer of metadata is free to put anything they want in the object of that statement? It gets much tougher. It’s still possible but, like simple DC, making life simpler for producers of the statements puts more burden on the consumers.
RDF provides a relatively painless way out of this by providing a way to define a property as a sub-property of another property. In much the same way that a simple ‘dumb’ DC element definition provides the ‘semantic floor’ for the more complex qualified DC statements that represent sub-properties of it. So to solve the ‘Is the object a resource or a literal?’ problem, might it work to define the base property as having no range or multiple ranges and define two sub-properties of it, each of which would have a specific resource or literal range. A reasoner would always be able to infer that data in either case of the sub-property would be the semantic equivalent of the base property, while allowing each system producing the metadata to maintain the data in whatever domain-specific way it needed — literals or resources.
I’ve watched for years from the sidelines while the DCMI community chewed on this. It’s an especially significant problem for them since they’re not in a position to actually redefine the standard — any decision they make that doesn’t involve starting over means that somebody’s system breaks. This is one reason why, in attempting to extend the specificity of domains and ranges to the original 15 elements, DCMI chose to copy the original 15 to the DCTERMS namespace–giving implementers the option of more clearly specified elements without breaking the applications of those already using the Simple DC set without ranges. They didn’t have the luxury of being able to add yet another layer to the standard. The RDA effort doesn’t appear to be handicapped in this particular way.
Please note that I’m a person who has spent a significant amount of professional time moving data from system to system, and no time generating the metadata in the first place.