I need to identify people and organizations in different roles

Overview


Connecting people and organizations to metadata is critical for many discovery, use and understanding use cases. ISO 19115 supports people and organizations in many roles. ISO 19115-1 improves on this by allowing a clean encapsulation of people and organizations independent of their roles. This improves re-use of information about people and organizations. The CI_ResponsibleParty object describes people and organizations that are related to a resource and their roles. It is used throughout the ISO Standards to describe and provide contact information for people and organizations.  This page reviews the structure and usage options for the  CI_ResponsibleParty object.

 

Recommendations for ISO 19115 and 19115-1


The ISO metadata standards support identification of individuals and organizations in many roles. Existing NASA metadata standards include mechanisms for identifying individuals and organizations in several important roles.

Metadata Authors - Identifying the authors or points of contact for the metadata content is important so that users that discover errors in the metadata know who to contact. These metadata contacts are included in the contact for the base metadata object (gmd:MD_Metadata, gmi:MI_Metadata, mdb_MD_Metadata, or mdb:MI_Metadata)

ISO 19115/*/gmd:contact/gmd:CI_ResponsibleParty[gmd:role/gmd:CI_RoleCode='pointOfContact']
ISO 19115-1/*/mdb:contact/cit:CI_Responsibility[cit:role/cit:CI_RoleCode='pointOfContact']


Technical Contacts - Technical contacts are individuals or organizations that can respond to technical or scientific questions that users have about resources. These contacts should be included in both the identification and distribution sections of the metadata.  

ISO 19115/*/gmd:identificationInfo/*/gmd:pointOfContact/gmd:CI_ResponsibleParty[gmd:role/gmd:CI_RoleCode='pointOfContact']
ISO 19115-1/*/mdb:identificationInfo/*/mri:pointOfContact/cit:CI_Responsibility[cit:role/cit:CI_RoleCode='pointOfContact']

and

ISO 19115/*/gmd:distributionInfo/gmd:MD_Distribution/gmd:distributor/gmd:MD_Distributor/gmd:distributorContact/gmd:CI_ResponsibleParty[gmd:role/gmd:CI_RoleCode='pointOfContact']
ISO 19115-1/*/mdb:distributionInfo/mrd:MD_Distribution/mrd:distributor/mrd:MD_Distributor/mrd:distributorContact/cit:CI_Responsibility[cit:role/cit:CI_RoleCode='pointOfContact']

 

Investigators - Investigators are members of the science team that should be included in the citation for the resource.

ISO 19115/*/gmd:identificationInfo/*/gmd:citation/gmd:CI_Citation/gmd:citedResponsibleParty/gmd:CI_ResponsibleParty[gmd:role/gmd:CI_RoleCode='originator' | gmd:role/gmd:CI_RoleCode='principalInvestigator' ]
ISO 19115-1/*/mdb:identificationInfo/*/mri:citation/cit:CI_Citation/cit:citedResponsibleParty/cit:CI_Responsibility[cit:role/cit:CI_RoleCode='originator' | cit:role/cit:CI_RoleCode='principleInvestigator']

Conceptual Model (UML)


The simple UML for the object is shown here. It includes one required element and four optional elements each of which can occur once.  The CI_Contact includes information about physical and electronic addresses as well as a CI_OnlineResource as part of the contact information. 

ISO Citations can include any number of organizations or people (citedResponsibleParties), each with one of the following roles: resourceProvider, custodian, owner, user, distributor, originator, pointOfContact, principalInvestigator, processor, publisher, or author (see Figure). For example, the principle citation for a metadata record, in the MD_Identification section, can include an author, a publisher, and any number of principal investigators. This is very different than the FGDC approach, where the idinfo section has a citation that can include, but not differentiate roles for, many originators and a single point of contact with no clear role definition.

Roles

CI_RoleCode

ISO 19115Added in ISO 19115-1

+ resourceProvider
+ custodian
+ owner
+ user
+ distributor
+ originator

+ pointOfContact
+ principalInvestigator
+ processor
+ publisher
+ author

+ sponsor
+ coAuthor
+ collaborator
+ editor
+ mediator

+ rightsHolder
+ contributor
+ funder
+ stakeholder

Implementation (XML)


 The ISO dialect combines people and organizations into the CI_ResponsibleParty object, a flexible structure that supports many combinations of organizations and people. Most objects that include associated responsible parties can have any number, so, for example, a citation can have people identified in any or all of the roles listed in the CI_RoleCode code list.

The structure of the CI_ResponsibleParty is:

<gmd:CI_ResponsibleParty>
  <gmd:individualName/>
  <gmd:organisationName/>
  <gmd:positionName/>
  <gmd:contactInfo>
    <gmd:CI_Contact>
      <gmd:phone/>
      <gmd:address>
        <gmd:CI_Address>
          <gmd:deliveryPoint/>
          <gmd:city/>
          <gmd:administrativeArea/>
          <gmd:postalCode/>
          <gmd:country/>
          <gmd:electronicMailAddress/>
        </gmd:CI_Address>
      </gmd:address>
      <gmd:onlineResource/>
      <gmd:hoursOfService/>
      <gmd:contactInstructions/>
    </gmd:CI_Contact>
  </gmd:contactInfo>
  <gmd:role/>
</gmd:CI_ResponsibleParty>

Usage


Where are ResponsibleParty objects?  People can be connected to ISO metadata records in eight places, and in each CI_Citation. Those locations are shown in this Figure. In some cases the roles of the people are determined by where they are in the standard. In other cases, they are determined by the role code. See the Roles-by-Position vs. Roles-by-Code discussion below for more information.

 

UsageDescription and Xpath

Citation


The ISO CI_Citation object is used to refer to a variety of resources that are not included in a metadata record. It is modeled after a bibliographic reference and can include any number of organizations or people (responsibleParties) in any roles. Typically a CI_Citation includes originators or authors and a publisher.

//gmd:CI_Citation/gmd:citedResponsibleParty

Metadata Contact

The metadataContact is a person that creates and manages metadata for resources and services. This person generally has expertise in documentation standards and has enough experience and understanding of the resource to document it in partnership with the originator or resource contact. This responsibleParty generally has role = "custodian" or "pointOfContact".

/gmi:MI_Metadata/gmd:contact

Resource Contact


The CI_ResponsibleParty in MD_Identification objects identifies the pointOfConact for the resource, defined as "identification of, and means of communication with, person(s) and organization(s) associated with the resource(s)".

In many cases this person or organization is the Data Manager or the Data Center that preserves the data. These people serve as contacts when the originator of the dataset is no longer available or interested in dealing with questions about the dataset. This person has scientific expertise or experience but may not be a good source for information on data access or data order processing. This responsibleParty generally has role = "pointOfContact".

/gmi:MI_Metadata /gmd:identificationInfo/gmd:MD_DataIdentification/gmd:pointOfContact

User Contact


The CI_ResponsibleParty in MD_Usage objects identifies people that use the data. This CI_ResponsibleParty generally has the role = "pointOfContact".

/gmi:MI_Metadata /gmd:identificationInfo/gmd:MD_Identification/gmd:resourceSpecific Usage/gmd:MD_Usage/gmd:userContactInfo

Processor


The CI_ResponsibleParty in LE_ProcessStep objects identifies people that are responsible for processing the data. This CI_ResponsibleParty generally has role = "processor".

/gmi:MI_Metadata /gmd:dataQualityInfo/gmd:MD_DataQuality/gmd:Lineage/gmd:LI_Lineage/gmd:processStep/gmd:LI_ProcessStep/gmd:processor

Resource or Metadata Maintenance Contact


The CI_ResponsibleParty in MD_MaintenanceInformation objects identifies the people that are responsible for maintaining the resource or the metadata.

/gmi:MI_Metadata/gmd:identificationInfo/gmd:MD_Identification/gmd:resourceMaintenance/gmd:MD_MaintenanceInformation/gmd:contact
or
/gmi:MI_Metadata/gmd:metadataMaintenance/gmd:MD_MaintenanceInformation/gmd:contact

Distributor


The CI_ResponsibleParty in MD_Distributor objects identifies the people that manage orders and data access at a Data Center. These people have expertise in data access systems but may not be a good source for more scientific information on the resource. This CI_ResponsibleParty generally has role = "distributor".

/gmi:MI_Metadata /gmd:distributionInfo/gmd:MD_Distribution/gmd:distributor/gmd:MD_Distributor/gmd:distributorContact

Extension Contact


The CI_ResponsibleParty in MD_ExtendedElementInfo objects identifies people that are responsible for creating and maintaining community specific extensions to the standard. This CI_ResponsibleParty generally has role = "pointOfContact".

/gmi:MI_Metadata /gmd:metadataExtensionInfo/gmd:MD_MetadataExtensionInformation /gmd:extendedElementInformation/gmd:MD_ExtendedElementInformation/gmd:source

 

 

Notes


CodeLists

Codelists are shared vocabularies used throughout the ISO Standards to provide a (usually small) set of choices for the value of an element. In many cases they provide a standard set of tags that can be used for classifying an object. They can be identified in the UML because their types end with “Code”, i.e. CI_RoleCode.

All codeLists share codeList and codeListValue attributes that give the location of the codeList and the value from the codeList being used in a particular case. Multiple codelists can be stored in a single codeListCatalog (see http://standards.iso.org/ittf/PubliclyAvailableStandards/ISO_19139_Schemas/resources/codelist/gmxCodelists.xml for an example), so the location usually includes a URL and an anchor for the specific codeList. The codeList values are given in the attribute and as the value of the codeList element: <ns:codeListName codeList=”URL#codeListName” codeListValue=”value”>value</ns:codeListName>

See CodeLists for a list of all ISO CodeLists. 

Roles-by-Position vs. Roles-by-Code

People can play many different roles in the life-cycle of scientific datasets. There are two ways that those roles can be reflected in a metadata structure: by position and by code. Many people are familiar with the roles by position approach because that is the approach used in the FGDC CSDGM. The person referenced from the metadata section is the metadata contact, the person referenced from the distribution section is the distributor, and so on. Using this approach means that the object that holds information about people does not need any role indicator. That information is supplied by the position of the person in the structure.

The ISO Standards combine the roles-by-position approach with the roles-by-code approach. Roles can generally be inferred from the positions of CI_ResponsibleParty objects in the structure, but flexibility is increased by adding a code for role to the each object. This is helpful when citing a dataset that involves people in multiple roles (principle investigator, publisher, author, resourceProvider) or when specifying the point of contact.

The roles-by-position approach allows the roles of the people involved with a dataset to be known even when they are accessed separately. For example, a specific xPath can be used if one were interested in the metadata contact for a resource: (/gmi:MI_Metadata/gmd:contact), but a general xPath (//gmd:CI_ResponsibleParty) can be used to answer the general question “what people or organizations are associated with this dataset”. In the latter case, the role code provides information about roles even though the people are being accessed independently.

Multiple CI_ResponsibleParties can be included in almost all ISO objects that can include CI_ResponsibleParties. In those cases, roleCodes can be used to associate appropriate roles with particular people if necessary. For example, the ISO CI_Citation object is used to refer to a variety of resources that are not included in a metadata record. It is modeled after a bibliographic reference and can include any number of organizations or people (CI_ResponsibleParties) in any roles. Typically a CI_Citation includes originators or authors and a publisher. 

Schema vs. Schematron

The only required element in the CI_ResponsibleParty object is the role. As in the case of the CI_OnlineResource, a CI_ResponsibleParty with only the required field(s) is not very useful. In this case, however, no reasonable solution can be achieved by requiring individualName or organisationName or positionName. The solution is to constrain the object by requiring that the count of individualName + organizationName + positionName be greater than or equal to one. In other words, at a minimum one of these three elements must exist.

There are two techniques that can be used to test the “validity” of ISO metadata in XML. The first is to use the XML schema which defines the structure and types of the elements and the number of times they can occur. The schema rules are expressed as the cardinality in the UML descriptions used in this wiki. The CI_ResponsibleParty constraint described above cannot be specified in an XML schema document and so cannot be tested using simple schema validation. Instead, a tool called Schematron can be used to test constraints or business rules that are included in the UML. Many times these rules involve multiple elements, as in the CI_ResponsibleParty case. In some cases an organization can specify several sets of schematron rules to test conformance at different levels.

 

Crosswalks

This table reflects the MENDS Phase 3 voting results for 5.x items pertaining to the mapping of ECHO and ISO roles.

ISODIFECSECHO
pointOfContact (/*/gmd:contact)DIF AUTHOR 

DIF AUTHOR      TECHNICAL CONTACT

originator (/*/gmd:identificationInfo/*/gmd:citation/gmd:CI_Citation/gmd:citedResponsibleParty)  Data Originator
Producer
distributor (/*/gmd:distributionInfo/gmd:MD_Distribution/gmd:distributor/gmd:MD_Distributor/gmd:distributorContact) or
pointOfContact (xPath)
TECHNICAL CONTACTUser Services,
Distributor Archive

Data Center Contact Distributor
DATA CENTER CONTACT
ORNL DAAC User Services
GHRC USER SERVICES
User Services           Archive/Archiver

principalInvestigator (/gmi:MI_Metadata/gmd:identificationInfo/*/gmd:citation/gmd:CI_Citation/gmd:citedResponsibleParty) INVESTIGATOR

Investigator

Data Originator

Producer

 

Investigator  INVESTIGATOR

custodian (/gmi:MI_Metadata/gmd:distributionInfo/gmd:MD_Distribution/gmd:distributor/gmd:MD_Distributor/gmd:distributorContact)  Data Manager
pointOfContact (/mdb:MD_Metadata/mdb:dataQualityInfo/mdq:DQ_DataQuality/mdq:report/*/mdq:evaluation/mdq:DQ_FullInspection/mdq:evaluationProcedure/cit:CI_Citation/gmd:citedResponsibleParty) Quality Assessment 
pointOfContact (/*/mdb:acquisitionInformation/mac:MI_AcquisitionInformation/mac:instrument/mac:MI_Instrument/mac:citation/cit:CI_Citation/cit:citedResponsibleParty) Instrument 

xPath Note:  The xPaths included in this table use several wildcards. // means any path, so //gmd:CI_ResponsibleParty indicates a gmd:CI_ResponsibleParty anywhere in an XML file. /*/ indicates a single level with several possible elements. This usually indicates one of several concrete realizations of an abstract object. For example /*/gmd:identificationInfo could be gmd:MD_Metadata/gmd:identificationInfo or gmi:MI_Metadata/gmd:identificationInfo and gmd:identificationInfo//*/gmd:descriptiveKeywords could be gmd:identificationInfo/gmd:MD_DataIdentification/gmd:descriptiveKeywords or gmd:identificationInfo/srv:SV_ServiceIdentification/gmd:descriptiveKeywords.

UserEditsCommentsLabels
John Kozimor 3000
Ted Habermann 2813
Ross Bagwell 100
Kathleen Baynes 010

2 Comments

  1. Looked at this a little more. There are 3384 (out of 3389 total) collections with at least one Contact. There is one collection, NSIDC-0169 from NSIDCV0 that has 20!

    Total Roles values summary for any Contact:

    /Collection/Contacts/Contact/Role: (7443)

    DATA CENTER CONTACT : 1652
    Archive : 826
    archiving data center : 554
    internal data center : 568
    technical contact : 518
    GHRC USER SERVICES : 287
    DIF AUTHOR : 427
    Investigator : 214
    Producer : 325
    User Services : 238
    Data Manager : 182
    TECHNICAL CONTACT : 236
    INVESTIGATOR : 520
    DIF AUTHOR, TECHNICAL CONTACT : 5
    Data Originator : 89
    Technical Contact : 17
    author : 323
    investigator : 351
    Archiver : 76
    compiler : 2
    TECHNICAL CONTACT, DIF AUTHOR : 27
    Distributor : 2
    metadata author : 2
    INVESTIGATOR, TECHNICAL CONTACT : 2

     

    Some Role Counts:

    /Collection/Contacts/Contact/Role: (7443)

    /Collection/Contacts/Contact[1]/Role: (3384)

    /Collection/Contacts/Contact[2]/Role: (1218)

    /Collection/Contacts/Contact[3]/Role: (816)

    /Collection/Contacts/Contact[4]/Role: (562)

    /Collection/Contacts/Contact[5]/Role: (414)

    /Collection/Contacts/Contact[6]/Role: (330)

    /Collection/Contacts/Contact[7]/Role: (162)

    /Collection/Contacts/Contact[8]/Role: (141)

    /Collection/Contacts/Contact[9]/Role: (108) (All Collections from NSIDCV0)

    /Collection/Contacts/Contact[10]/Role: (106)

    /Collection/Contacts/Contact[11]/Role: (55)

    /Collection/Contacts/Contact[12]/Role: (54)

    /Collection/Contacts/Contact[13]/Role: (43)

    ... (and so on)

     

    1. These data are more complete than the ones I presented during MENDS. They do clearly identify the consistency problems we noted. For example, is it technical contact or TECHNICAL CONTACT or Technical Contact? Or, what is the difference between Archive (826) and Archiver (76)? These kinds of inconsistencies are a problem for facet searches and should probably be cleaned up in ECHO.