Skip to main content

Metadata Ontology

shortname:      META
name: Metadata Ontology
type: Standard
status: Valid
version: 0.2
editor: Aitor Argomaniz <aitor@nevermined.io>
contributors:

Motivation

Every asset (dataset, algorithm) in Nevermined has an associated Decentralized Identifier (DID) and DID document / DID Descriptor Object (DDO). Because assets without proper descriptive metadata have poor visibility and discoverability.

See DID SPEC for information about the overall structure of Nevermined DDOs and DIDs.This OEP is about one particular part of Nevermined DDOs: the asset metadata, a JSON object with information about the asset.

This SPEC defines the assets metadata ontology, i.e. the schema for the asset metadata. It's based on the public schema.org DataSet schema.

This SPEC doesn't detail the exact method of registering assets on-chain or storing DDOs.

The main motivations of this SPEC are to:

  • Specify the common attributes that MUST be included in any asset metadata stored in Nevermined networks
  • Normalize the attributes to use in any curation process, to provide a common structure to sort and filter the DDOs
  • Identify the recommended additional attributes that SHOULD be included in a DDO to facilitate asset search
  • Provide an example of an asset metadata object and additional links for reference

Life Cycle of Metadata

Local Metadata

Metadata is first created by the publisher of the asset. The publisher has knowledge of the file URLs, and they are stored in plaintext in the files object. This initial metadata is the local metadata.

Remote Metadata

A publisher publishes (registers) an asset using Nevermined SDKs, which might be running on their local machine or remotely. When they do, the local metadata is passed to the SDK, which makes some changes and additions in the metadata, puts it into a DDO, and sends that DDO to a metadata API.

The Metadata API may also make some changes and additions to the metadata, such as the datePublished or parts of the curation object. The metadata that finally gets stored by the Metadata API is the remote metadata.

A marketplace can and might also act as a publisher. SPEC ACCESS describes the publishing flow in more detail.

Metadata Attributes

An asset is the representation of different type of resources in Nevermined. Typically can asset could be one of the following asset types:

  • Dataset. An asset representing a dataset or data resource. It could be for example a CSV file or a multiple JPG files.
  • Algorithm. An asset representing a piece of software. It could be a python script using tensorflow, a spark job, etc.

Each kind of asset require a different subset of metadata attributes. The distinction between the type of asset (dataset, algorithm) is given by the attribute DDO.services["metadata"].main.type

A metadata object has the following attributes, all of which are objects.

AttributeRequiredDescription
mainYesMain attributes used to calculate the service checksum
curation(remote)Curation attributes
additionalInformationNoOptional attributes
encryptedFiles(remote)Encrypted string of the attributes.main.files object.
encryptedServices(remote)Encrypted string of the attributes.main.services object.

The main, curation and additionalInformation attributes are independent of the asset type, all assets have those metadata sections.

Main Attributes

This list of attributes can't be modified after creation, because these are considered as the metadata essence of the asset created. This information is used to calculate the unique checksum of the asset. If any change would be necessary in the following attributes, it would be necessary to create a new asset derived from the existing one.

The main object has the following attributes, not all are required. Some are required by only the metadata store (remote) and others are mandatory for local metadata only. If required or not by both, they are marked with Yes/No in the Required column.

AttributeTypeRequiredDescription
nameTextYesDescriptive name or title of the asset.
typeTextYesType of the asset. Helps to filter by the type of asset. It could be for example ("dataset", "algorithm").
dateCreatedDateTimeYesThe date on which the asset was created by the originator. ISO 8601 format, Coordinated Universal Time, e.g. 2019-01-31T08:38:32Z.
datePublishedDateTime(remote)The date on which the asset DDO is registered into the metadata store (Metadata API)
authorTextYesName of the entity generating this data (e.g. Tfl, Disney Corp, etc.).
licenseTextYesShort name referencing the license of the asset (e.g. Public Domain, CC-0, CC-BY, No License Specified, etc. ). If it's not specified, the following value will be added: "No License Specified".
filesArray of files objectYesArray of File objects including the encrypted file urls. Further metadata about each file is stored, see File Attributes

File Attributes

File attributes are a subset of the main section.

A file object has the following attributes, with the details necessary to consume and validate the data.

AttributeRequiredDescription
url(local)Content URL. Omitted from the remote metadata. Supports http(s):// and ipfs:// URLs.
namenoFile name.
indexyesIndex number starting from 0 of the file.
contentTypeyesFile format.
checksumnoChecksum of the file using your preferred format (i.e. MD5). Format specified in checksumType. If it's not provided can't be validated if the file was not modified after registering.
checksumTypenoFormat of the provided checksum. Can vary according to server (i.e Amazon vs. Azure)
contentLengthnoSize of the file in bytes.
encodingnoFile encoding (e.g. UTF-8).
compressionnoFile compression (e.g. no, gzip, bzip2, etc).
resourceIdnoRemote identifier of the file in the external provider. It is typically the remote id in the cloud provider.
attributesnoKey-Value hash map with additional attributes describing the asset file. It could include details like the Amazon S3 bucket, region, etc.
encryptionTextNo

Additional Attributes

All the additional information will be stored as part of the additionalInformation section.

AttributeTypeRequired
categoriesArray of TextNo
tagsArray of TextNo
descriptionTextNo
copyrightHolderTextNo
workExampleTextNo
linksArray of LinkNo
poseidonHashTextNo
providerKey.xTextNo
providerKey.yTextNo

Other Suggested Additional Attributes

These are examples of attributes that can enhance the discoverability of a resource:

AttributeDescription
slaService Level Agreement.
industry
updateFrequencyAn indication of update latency - i.e. How often are updates expected (seldom, annually, quarterly, etc.), or is the resource static that is never expected to get updated.
termsOfService
privacy
keywordA list of keywords/tags describing a dataset.
structuredMarkupA link to machine-readable structured markup (such as ttl/json-ld/rdf) describing the dataset.

The publisher of a DDO MAY add additional attributes or change the above object definition.

Curation Attributes

A curation object has the following attributes.

AttributeTypeRequiredDescription
ratingNumber (decimal)YesDecimal value between 0 and 1. 0 is the default value.
numVotesIntegerYesNumber of votes. 0 is the default value.
schemaTextNoSchema applied to calculate the rating.
isListedBooleanNoFlag unsuitable content. False by default. If it's true, the content must not be returned.

Example of Local Metadata

{  
"main": {
"name": "Madrid Weather forecast",
"dateCreated": "2019-05-16T12:36:14.535Z",
"author": "Norwegian Meteorological Institute",
"type": "dataset",
"license": "Public Domain",
"files": [
{
"index": 0,
"url": "https://example-url.net/weather/forecast/madrid/350750305731.xml",
"contentLength": "0",
"contentType": "text/xml",
"compression": "none"
}
]
},
"additionalInformation":{
"description": "Weather forecast of Europe/Madrid in XML format",
"copyrightHolder": "Norwegian Meteorological Institute",
"categories": ["Other"],
"links": [],
"tags": [],
"updateFrequency": null,
"structuredMarkup": []
}
}

Example of Remote Metadata

Similarly, this is how the metadata file would look as a response to querying Metadata API (remote metadata). Note that url is removed from all objects in the files array, and encryptedFiles & curation are added.

{  
"service": [
{
"index": 0,
"serviceEndpoint": "http://metadata:5000/api/v1/metadata/assets/ddo/{did}",
"immutableServiceEndpoint": "cid://QmVT3wfySvZJqAvkBCyxoz3EvD3yeLqf3cvAssFDpFFXNm",
"type": "metadata",
"attributes": {
"main": {
"type": "dataset",
"name": "Madrid Weather forecast",
"dateCreated": "2019-05-16T12:36:14.535Z",
"author": "Norwegian Meteorological Institute",
"license": "Public Domain",
"files":[
{
"contentLength": "0",
"contentType": "text/xml",
"compression": "none",
"index": 0
}
],
"datePublished": "2019-05-16T12:41:01Z"
},
"encryptedFiles": "0x7a0d1c66ae861…df43aa9",
"curation":{
"rating": 1,
"numVotes": 7,
"schema": "BINARY",
"isListed": true
},
"additionalInformation": {
"description": "Weather forecast of Europe/Madrid in XML format",
"copyrightHolder": "Norwegian Meteorological Institute",
"categories": ["Other"],
"links": [],
"tags": [],
"updateFrequency": null,
"structuredMarkup": []
}
}
}
]
}

Specific attributes per Service Type

A DDO includes different services attached to the asset. They are offered from the asset owner/provider to the rest of the users. Each of these services can have some metadata describing the specific of that service offered. For example price. The attributes going in each individual service and not in the metadata attributes sections are because they apply only to a specific service and not to the whole asset.

The main object has the following attributes, not all are required.

AttributeTypeRelated to ServiceRequiredDescription
priceTextnft-sales & accessNoPrice of the service. The token to use and the distribution of payments (receivers and amounts) will be part of the condition parameters
timeoutTextallYesDefault 0 if not give. The number of blocks the service is valid
nftTypeStringnft-sales & nft-accessNoIf the service refers to a NFT (ERC-721 or ERC-1155), this will store a erc-721 or erc-1155 values. This will be open to additional NFT standards or variations of the current ERCs

Specific attributes per Asset Type

Depending on the asset type (dataset, algorithm), there are different metadata attributes supported:

Algorithm attributes

An asset of type algorithm has the following additional attributes under main.algorithm:

AttributeTypeRequiredDescription
languagestringnoLanguage used to implement the software
formatstringnoPackaging format of the software.
versionstringnoVersion of the software.
containerObjectyesObject describing the Docker container image.

The container object has the following attributes:

AttributeTypeRequiredDescription
entrypointstringyesThe command to execute, or script to run inside the Docker image.
imagestringyesName of the Docker image.
tagstringyesTag of the Docker image.
{
"index": 0,
"serviceEndpoint": "http://localhost:5000/api/v1/metadata/assets/ddo/{did}",
"immutableServiceEndpoint": "cid://QmVT3wfySvZJqAvkBCyxoz3EvD3yeLqf3cvAssFDpFFXNm",
"type": "metadata",
"attributes": {
"main": {
"author": "John Doe",
"dateCreated": "2019-02-08T08:13:49Z",
"license": "CC-BY",
"name": "My super algorithm",
"type": "algorithm",
"algorithm": {
"language": "scala",
"format": "docker-image",
"version": "0.1",
"container": {
"entrypoint": "node $ALGO",
"image": "node",
"tag": "10"
}
},
"files": [
{
"name": "build_model",
"url": "https://raw.githubusercontent.com/keyko-io/test-algorithm/master/javascript/algo.js",
"index": 0,
"checksum": "efb2c764274b745f5fc37f97c6b0e761",
"contentLength": "4535431",
"contentType": "text/plain",
"encoding": "UTF-8",
"compression": "zip"
}
]
},
"additionalInformation": {
"description": "Workflow to aggregate weather information",
"tags": [
"weather",
"uk",
"2011",
"workflow",
"aggregation"
],
"copyrightHolder": "John Doe"
}
}
}


References

Schema.org is a collaborative, community activity with a mission to create, maintain, and promote schemas for structured data on the Internet. Data types use the Schema.org primitive data types.