The Full Wiki

SPASE: Wikis

Advertisements
  

Note: Many of our articles have direct quotes from sources you can cite, within the Wikipedia article! This article doesn't yet, but we're working on it! See more info or our list of citable articles.

Encyclopedia

From Wikipedia, the free encyclopedia

The SPASE (Space Physics Archive Search and Extract) Data Model is a set of terms and values along with the relationships between them that allow descriptions of all the resources in a heliophysics data environment. It is the product of years of effort by an international team of collaborators (see spase-group.org website) to unify and improve existing Space and Solar Physics data models. The intent of this data model is to provide the means to describe resources, most importantly, scientifically useful data products, in a uniform way so they may be easily registered, found, accessed, and used.

The SPASE data model divides the heliophysics data environment into a limited set of resource types. An example of this is numerical data, which is a key resource type. This type of resource typically consists of a set of files containing values of one or more physical variables, which differs from each other only by time span. To fully describe a numerical data resource requires other types of resources, namely observatory, instrument, person, and repository, each of which has its own set of attributes.

Often, numerical data are presented in prepared images (gif or jpeg), and such presentations are referred to as Display Data resources. Other data related resource types are Catalog which are lists of events; Annotation which enable expert comments on data products; and Granule which describe individual files within another resource (i.e., numerical data, display data or catalog) phenomena. Other types of resources include, Document which can contain narratives or supporting information; Service that provide software to use data resources; Repository for storage locations; and Registry for metadata collections. Resource descriptions, and the links in them, are intended to make the Resource useful to scientific users.

Contents

History of Development

The data model presented here was conceived in 2002 and became formalized in regular teleconferences between data providers. They included the scientific and technical representatives of some of the largest data holdings in the US, Europe and Japan. The original impetus occurred at an ISTP (International Solar-Terrestrial Physical Science Initiative) meeting in 1998 where a resolution was passed, calling to make data more accessible. As efforts to provide seamless distributed data proceeded, it became clear that the creation of a central data model was essential.

Interoperability test beds were constructed in 2001, and in 2002 a grassroots effort was undertaken to define the needs of the community. March 2003 saw the beginning of the construction of the data model in earnest. The initial effort involved collecting terms from CDPP (Centre de Données de la Physique des Plasmas), SWRI (Southwest Research Institute), NSSDC (National Space Science Data Center), ISTP and other organizations to form a starting point. SPASE Data Model was developed using an iterative process where additions were made when unaddressed needs were discovered. Two years of teleconferences, e-mailed revisions, and face-to-face efforts, along with the application of the terms to specific cases, led to the release of version 1.0 of the data in November 2005.

Following the release, many existing data products were described and led to further improvements of the data model, which cumulated into version 1.1 released in August 2006. At this time, NASA established the Heliophysics VxOs, (where VO stands for Virtual Observatory and x stands for any discpline within the heliophysics and space physics field). After an extended period of use and improvements version 1.2.2 was released in August 2008. The version of the data model described in this document is an extension of this earlier release.

Intended Purpose

The design of the SPASE data model is based on a core set of principles related to the intended purpose of descriptive information (metadata), the data environment, and the operational environment. The overall goal of SPASE is to be able to describe resources using a taxonomy of terms familiar to the heliophysics domain. This taxonomy should provide sufficient scientific context and data content information for an individual to assess the applicability of the resource (data and metadata) to a research question. A data model is the cornerstone of an information system, and one purpose for SPASE is to enable the creation of "Virtual Observatories" that will link the broad range of heliophysics resources available in a loosely coupled distributed environment. Additional goals of the data model are to:

  1. Provide a way of registering data products using a standard set of terms that allow the data products to be found with simple searches and described so that users can determine their utility for a specific purpose;
  2. Allow searching for data products containing particular physical quantities (e.g., magnetic field; spectral irradiance) that are variously represented in a diverse array of data products;
  3. Facilitate a means of mapping comparable variables from many data products onto a common set of terms so that visualization, analysis, and higher-order query tools and services can be used without regard to the origin of the data.

The content of a resource description based on the data model should enable services (either at the provider or in a VxO) to discover and access individual resources. The service layer can contain services for a variety of purposes, but the basic functionality is to provide the links necessary to connect user applications and search- and-retrieval front ends to data repositories. Ultimately, the data environment based on the data model will involve a number of software tools and services linked together as an Internet-based environment. The data along with software tools and documentation associated with products will be directly accessible using standard web protocols (http, ftp). This model has the potential to provide capabilities that can aide all users of a particular dataset (e.g., on-the-fly coordinate transformations, the ability to merge datasets from different instruments, easy reference to related indices or other data), in addition to providing the broad access needed to investigate emerging questions in heliophysics.

Design Principles

The design of the SPASE data model begins with the following basic principles.

1. Data is self-documented. Data resources have an internal schema or structures for storing values. The physical structure is determined by the storage format. Each retrievable entity on the format is assigned a key or tag which can be used to retrieve the entity.
The SPASE Data Model does not attempt to describe the physical storage of the parameters, (i.e the byte offsets, record format or data encoding in the data resource). Instead, the SPASE Data Model describes the scientific attributes of the parameter and links this to the parameter by a key or tag used by the storage format. Applications can use the SPASE descriptions to locate a parameter and the appropriate format-specific reader to extract parameters.
Not all data in the heliophysics data environment are stored in self-documented formats, (ex. data stored as ASCII tables). The method of assigning a key or tag name for each field in the ASCII table is external to the SPASE data model. This method must be part of a "format" specification, which may be as simple as the first row of the table containing the tag name of the field.
2. Resources are distributed. There are many providers of resources, and these providers can be located anywhere in the world.
Each provider operates independently and activities are not necessarily coordinated. The SPASE data model assumes that providers have local autonomy and may operate under local rules or jurisdictions.
3. Online Resources have Universal Resource Locators (URL) If a resource is on-line, it can be accessed and retrieved using Universal Resource Locators (URL).
4. The data environment is continuously evolving. New resources are actively generated, either as part of an on-going experiment or as a result of analysis and assessment.
These new resources may be directly related to other resources. As new resources are generated or new associations defined, the network or collections formed will expand over time.

Conceptual System Environment

A Two blue circles with a yellow ring surrounding them represent Gateways. They point to red boxes labeled Repositories and orange boxes labeled Application. The Gateways also point to blue circles labeled Access Points. The Access points point to smaller red boxes clustered together to represent the Community.
The SPASE concept.

SPASE is intended to enable the sharing of knowledge through the exchange in queries and responses between systems. The environment in which this occurs, is the current Internet where systems and users are loosely coupled and highly distributed. Services or portals may collect the SPASE descriptions from multiple sources to create an enriched capability for the user. For example, a search engine may provide a comprehensive search for a particular scientific discipline. The website on NASA gives a guide to many currently active projects and detailed background information. Of particular interest is the document entitled, "A Framework for Space and Solar Physics Virtual Observatories."

Figure 1 illustrates a conceptual architecture in a distributed environment. In this environment, multiple communities have resources to share. The storage location of a resource is called a repository. Some of these repositories (boxes) have local SPASE descriptions which are available through a local registry service (green circles). The contents of other repositories are described at external, possibly independent, locations which make the descriptions available through remote registries. Gateways (yellow rings) can harvest and aggregate the resources from multiple registries or perform federated searches which provide a single access point to multiple registries. Applications access the registries to discover resources, determine their location and retrieve them from the repositories.

Resource Types

The top level entity in the SPASE data model is a Resource. There are 12 different types of resources. Each consists of a set of attributes that characterize the resource. The resource types can be divided into three categories: Data, Origination and Infrastructure.

Advertisements

Data Resources

Data Resources describe individual data products or data product sets. Data products can be images (Display Data), sample or observation values (Numerical Data) and event lists (Catalog). Included in the Data Resource category are those resources used to describe individual files (Granule), which are part of data product sets and assessments of a resource (Annotations). The following is a complete list of the Data Resources:

Numerical Data
Display Data
Catalog
Granule
Annotation

Origination Resources

Origination Resources describe the generators or sources of data. This information will be included in a Data Resource description, and will refer to one or more Origination Resources. The complete list of Origination Resources contains:

Observatory
Instrument
Person
Document

Infrastructure Resources

Infrastructure Resources describe system components that are part of the exchange and use of data. This includes storage locations for the data (Repository), metadata (Registry) and functions (Service).

Ontology

A Series of grey boxes. The sides of the figure are framed by three different arrows. On the left the arrow is labeled Data; on the right, Origination, and along the bottom, Infrastructure. On the left, the grey boxes represent a number of different types of data. On the right side, the boxes are labeled person, instrument, observatory, representing the origin of the data. Along the bottom, they are labeled Repository, Service, Registry
Associations in SPASE.

In the SPASE data model, there can be associations between pairs of resources. Some associations are specific and required in order to fully describe a resource. For example, an Instrument resource is always paired with an Observatory resource. The specific associations form an ontology, illustrated in Figure 2. The SPASE data model also allows free associations resources which are not constrained by any ontology.


Resource Identifiers

Every resource has a unique identifier so that it can be tracked and referenced within a system. This identifier is defined by the naming authority for the resource. The entity which acts as the naming authority is determined by the agency or group who provides the resource. Each resource identifier is a URL that has the form,

scheme://authority/path

where scheme is "spase" for those resources administered through the SPASE framework; authority is the unique identifier for the naming authority within the data environment, and path is the unique local identifier of the resource within the context of the authority. The resource ID must be unique within the data environment.

To illustrate the definition of a resource identifier, consider one registered authority called SMWG (SPASE Metadata Working Group), which maintains information for the spacecraft (Observatory) resources. One such spacecraft is GOES8. The registered authority (SMWG) decides the path to GOES8 should include the Resource Type as part of the path. Therefore, the resource identifier would be:

spase://SMWG/Observatory/GOES8

The Resource ID is used to formally or informally associate one resource with another, for example, the Instrument resource must be formally associated with an Observatory, or Numerical Data resources may be formally associated with an Instrument resource and informally associated with other Numerical Data resources. The free association of resources allows networks or collections to be formed, as well as allowing for new associations to be formed as needed without affecting existing associations.

Core Attributes

With the exception of Granule and Person, every resource has a common set of core attributes. The core attributes provide textual descriptions of the resource and the capability to reference external sources of information (Information URL). It also describes the context of the resource in the larger data environment. This context consists of associations with other resources (Association) and with previous versions (Prior ID). These attributes are grouped in a Resource Header and consists of:

Resource Name
Alternate Name
Release Date
Expiration Date
Description
Acknowledgement
Contact
Information URL
Association
Prior ID

Extensions

The SPASE Data Model allows for additional metadata to be embedded within a SPACE description. Every Resource type has an "Extension" element which can contain metadata compliant with other data models. The "Extension" element has a SPASE data model type of "Text", but is not limited to alphanumeric characters and may contain tagged information.

Element Data Types

Each element in the SPASE Data Model has a data type. One design feature of the SPASE data model is that an element can contain either a value or other elements. Mixed content (elements and values) are not allowed. This allows the data model to be implemented in a wider range of metadata languages. The following data types are supported:

Container A container of other elements.

Count A whole number.

DateTime A value is given in the ISO 8601 recommended primary standard notation:

YYYY-MM-DD. where YYYY is the year in the usual Gregorian calendar, MM is the
month of the year between 01 (January) and 12 (December), and DD is the day of the month
between 01 and 31. It may also have an optional time portion given in the ISO 8601
recommended primary standard notation: HH:MM:SS.sss where HH is the number of
complete hours that have passed since midnight (00-24), MM is the number of complete
minutes that have passed since the start of the hour (00-59), and SS is the number of
complete seconds since the start of the minute (00-60), and sss are milliseconds that have
passed since the start of the second (000-999). Time zones are not allowed so all times are in
Universal Time. The time portion must follow the date portion with both portions separated
by a "T". For example, "2004-07-29" is July 29, 2004 and "2004-07-29T12:30:00" is
precisely 12:30 on July 29, 2004.

Duration A duration of time. A time value given in the ISO 8601 recommended primary standard

notation: PTHH:MM:SS.sss where PT are tokens to indicate that the time value is a duration,
HH is the number of complete hours that have passed since midnight (00-24), MM is the
number of complete minutes that have passed since the start of the hour (00-59), and SS is
the number of complete seconds since the start of the minute (00-60), and sss are
milliseconds that have passed since the start of the second (000-999).

Enumeration Value is selected from a list of allowed values. The name of a list is

an additional attribute of the element. Lists may be externally controlled in which case
the location of the list is indicated in the textual definition of the element.

Item An element which is a value for an enumerated list.

Numeric A fractional number which can be expressed in scientific notation. The string "NaN"

represents not-a-number (flag) values and the string "INF" represents an infinitely large
value. The value "-INF" represents an infinitely small value.

Sequence A list of whole number values where the order of the values is fixed. A

space separates each value. For example, "1 2 3".

Text A string of alphanumeric characters. See Text Mark-up section for details.

URL Universal Resource Locator.

Text Mark-up

While descriptive text may be brief, some formatting may be necessary to convey the necessary information, for example multiple paragraphs or nested lists. To ensure system portability, text values in SPASE are sequences of alphanumeric one byte UTF-8 (US_ASCII) characters with white space preserved. However in some applications, a strict preservation of white space may not result in a desirable presentation. A web browser is the best example of where this practice may fail.

To allow an author to express a preferred layout for the text, a special set of text mark-up rules are defined. The layout can then be determined by normalizing the text and applying a simple set of interpretation rules.

Text Normalization Rules

To aid in determining the layout or structural intent of the author the following rules are to be applied to text to create a normalized form:

1. All lines are to end with a newline character.
2. The entire text is left justified. No line has leading whitespace.

Text Interpretation Rules

After normalization of the text, the following rules can be used to interpret the layout intent of the author.

1. Blank lines indicate paragraph breaks.
2. Lists
a. Must be preceded by a blank line.
b. Items are indicated by a line beginning with a reserved character followed by a
space. Three levels of lists are supported. The reserved characters are:
("*"): First level list.
("-"): Second level list (must appear within a first level context).
("."): Third level list (must appear within a second level context).
c. End with a blank line.
3. Tables
a. Must begin and end with a line that starts with ("+--").
b. The first "row" of a table is the field headings.
c. Fields in a table are separated with a vertical bar ("|").
d. Visual row separators are lines which begin with ("|--").

External links


Advertisements






Got something to say? Make a comment.
Your name
Your email address
Message