A complex network of interconnected lines and nodes, resembling a molecular or neural network structure. The image features various shades of blue and white, with nodes of different sizes connected by thin lines, creating a web-like pattern.

Knowledge Center

Metadata Compatibility Score

What is metadata? Metadata is information about data itself. In the Discovery Portal, metadata is used to describe the resources. This includes the source, health condition(s) which are the focus of the resource, host species from which the resource has been created, variables measured within the resource, etc.

What is the metadata compatibility score?#

The metadata compatibility score is a quantitative measure that assesses the extent to which the metadata associated with a resource is present. When contributors upload data or tools to a source repository, their completion of metadata fields may vary. The compatibility score evaluates whether all relevant and necessary metadata elements have been provided for a given resource. This includes details such as funding source, conditions of access, pathogens which are the focus of the resource, dates in which the resource was created, etc. A high compatibility score indicates that the metadata associated with the resource is comprehensive, making it easier for users to discover and understand the resource effectively.

What can it tell me?#

The metadata compatibility score provides insight into the extent to which the metadata associated with a resource has been adequately filled out. A higher score suggests a more comprehensive set of metadata elements, offering a richer context for understanding the resource. A low score may indicate gaps in information, making it more challenging to interpret and utilize the data or tool correctly. Users can use the score to assess whether the available metadata aligns with their requirements before investing time in accessing or utilizing the resource.

Why are there 2 tiers of metadata?#

The Discovery Portal distinguishes between “Fundamental” metadata elements and “Recommended” metadata elements. Fundamental metadata are considered essential, and Recommended metadata are advisable for data/tool providers to include but not mandatory. The Fundamental tier sets a minimum standard for metadata compatibility. It establishes the core set of metadata elements that are deemed crucial for understanding and using the resource effectively. This system of tiers acknowledges that while certain metadata is indispensable, additional details may enhance the resource’s value.
Metadata compatibility score visualization

How is the metadata compatibility score calculated?#

The metadata compatibility score is the sum of Fundamental and Recommended metadata fields that are present for that resource. Users can hover over the metadata compatibility score badge to get a breakdown of total fields and number of fields that are Fundamental vs Recommended. The tooltip also indicates which metadata fields have been “augmented,” or which fields are not present in the original metadata record but have been generated by the Discovery Portal. Read about how the Discovery Portal augments metadata here.

What are the Fundamental metadata elements for resources and how are they used?#

All Fundamental elements provide useful information that can be displayed/viewed on the Discovery Portal. Metadata fields for different resource types vary.

The Fundamental metadata elements for a Dataset are:

  • name
  • author
  • includedInDataCatalog
  • funding
  • url
  • description
  • measurementTechnique
  • distribution
  • date (hierarchically derived from datePublished, dateModified, or dateCreated)
Fundamental metadata elements.png

The Fundamental metadata elements for a Computational Tool are:

  • date
  • includedInDataCatalog
  • funding
  • author
  • description
  • name
Computational Tool highlighted with metadata elements

Recommended metadata elements include metadata elements that are currently displayed/viewable (in bold) in the Discovery portal and metadata elements that would aspirationally be added (in italics) in future displays/views in the Discovery Portal. The Recommended metadata elements for Datasets include:

  • infectiousAgent
  • dateCreated
  • citation
  • doi
  • healthCondition
  • dateModified
  • citedBy
  • species
  • conditionsOfAccess
  • datePublished
  • isBasedOn
  • license
  • variableMeasured
  • topicCategory
  • sdPublisher
  • spatialCoverage
  • keywords
  • identifier
  • temporalCoverage
  • interactionStatistic
  • usageInfo

The Recommended metadata elements for Computational Tools include:

  • citedBy
  • doi
  • topicCategory
  • codeRepository
  • programmingLanguage
  • applicationCategory
  • applicationSubCategory
  • softwareRequirements
  • softwareVersion
  • citation
  • conditionsOfAccess
  • dateModified
  • interactionStatistic
  • license
  • identifier
  • url

Where can I find the complete list of metadata elements and their definitions?#

The schema for Datasets in the NIAID Data Ecosystem can be found in the Schema Registry of the Data Discovery Engine: https://discovery.biothings.io/ns/nde/nde:Dataset Note that the metadata elements for a Dataset can differ from the metadata elements of other types of resources that are available in the NIAID Data Ecosystem like ResourceCatalogs: https://discovery.biothings.io/ns/nde/nde:ResourceCatalog and ComputationalTools: https://discovery.biothings.io/ns/nde/nde:ComputationalTool

How is the metadata compatibility score used to deliver search results?#

The “best match” search result rankings depend largely (>75%) on relevance based on Elasticsearch score, and are minorly (<25%) affected by the metadata compatibility score. Datasets with more compatible metadata get a slight “boost” in the Discovery Portal’s search results ranking. First, the ratio of completed metadata fields to total fields is calculated. The calculated metadata compatibility ratio is then integrated into the Elasticsearch score. Elasticsearch uses a scoring algorithm to determine the relevance of each resource to a search query. By applying the metadata compatibility ratio to the Elasticsearch score, the Discovery Portal slightly boosts resources with more compatible metadata during the ranking process.


Last updated on

Policies

  • Accessibility
  • Copyright
  • Disclaimer
  • Privacy Policy
  • Freedom of Information Act (FOIA)
  • Vulnerability Disclosure Policy
  • No Fear Act Data
Contact Us