Document info

  • Status: Draft
  • Audience: Developers and Science Coordinators


Overview

Users want to be able to do similar searches on search.earthdata.nasa.gov as can be done on https://data.noaa.gov/onestop/collections. On this site, users can enter in year offsets starting backwards from 1950, larger numbers being more in the past, or from CE, positive after year 0000. Ranges can also be searched for on these sites by specifying an keyword which points to an interval.

Solution

Schema Changes

For DIF 10:

Change the Paleo Start and Stop Dates from a free text to a number and unit fields, by changing the type of the dates, and adding the unit fields backed by an enum. - don't chang diff  10

<xs:complexType name="PaleoDateTimeType">
	<xs:sequence>
		<xs:element name="Paleo_Start_Date" type="xs:decimal" minOccurs="1"/>
		<xs:element name="Paleo_Start_Date_Unit" type="<new-paleo-unit-enum>" minOccurs="1"/>
		<xs:element name="Paleo_Stop_Date" type="xs:decimal" minOccurs="1"/>
		<xs:element name="Paleo_Stop_Date_Unit" type="<new-paleo-unit-enum>" minOccurs="1"/>
		<xs:element name="Chronostratigraphic_Unit" type="ChronostratigraphicUnitType" minOccurs="0" maxOccurs="unbounded"/>
	</xs:sequence>
</xs:complexType>

Enum Units may be:

  • Ga (billions of years before present),
  • Ma (millions of years before present),
  • ka (thousands of years before present),
  • ybp (years before present, also known as BP), or
  • CE (full number back (negative) from Common Era reference point). (preferred format for indexing)

For UMM-C:

PaleoTemporalCoverageType
	AnyOf: (one or both)
		ChronostratigraphicUnitType 0..*
   		PalioDateRangeType 0..1

PalioDateRangeType:
	Epoc: Enum[CE, PB], default is CE
	StartDate: Number (Long), required - further in the past
	StartDateUnit: Enum [Ga, Ma, ka, ybp]
	EndDate: Number (Long), required - closest to now
 	EndDateUnit: Enum [Ga, Ma, ka-CE, ybp]
 

end date is larger then start date


Parsing legacy data

Steps:

  1. Parse raw value from strings, look for value and unit regex-ish code: (.* [0-9]+ .* [unit-list ] | unit name.* ) OR (.* keyword .*)
    1. log bad matches for qa reports
  2. create index in elastic under a key like "paleo_start_date_normalized" with the full CE offset where 1=1year (which seams to be NOAA's preferred value)
  3. Find which date range under Chronostratigraphic Units the paleo date falls under and create and index for each level allowing textual searches
    1. paleo_start_date_keywords [Archaean, Eoarchean]
    2. paleo_end_date_keywords [Archaean, Meso...]


Code:

  • Schema (yea, not code)
    • new restriction on start and end date content to require value, unit pairs.
  •  ingesting
    • function: Date Parser: Raw->date CE value (very large (64bit) negative number
    • data: keyword range map, contains start and end ce dates for each keyword
    • function: ce value -> keywords
  • Searching - can search by range, preset range, keyword, filtering by intersect rules
    • function: keyword to ce range
    • function: range generator (given start date, end date, filtering rules) -> elastic range queries
    • function: range generator (given preset-name, filtering rules ) -> elastic range queries

Elastic Design

Data is stored in Elastic search as signed longs (-2^63 to 2^63-1). Actual data requires -2^33 making an int to small and a long more than adequate. Lists of dates can not be queried without using the "nested" document feature which is not a common in CMR. As such all paleo dates will be combined into one larger stand and stop range. This larger range will then be indexed.

All values internally are to be stored in Common Era notation, for example  "-2598050".2,598,050

Two elastic range queries are then used find records, one for the start date, another for the end date. While 'Range' supports "relation" for doing intersects, contains, and within, this command does not work across multiple fields, meaning that more complex queries will need to be built to construct rules like "intersect", or "contains". Bounding values like gte and lte can be dropped as needed to produce the effect.

All queries should use the "e" (equal) option on the logic that if the date is provided by the user, it is of significants to the user and should be a part of the query.

NOTE: Lists of objects will not work without using nested documents as stated by


{"track_total_hits":true,
    "query": {
        "bool": {
            "must": [
                {"range": {"numeric-paleo-start": {"gte": 3400000000, "lt": 3500000001}}},
                {"range": {"numeric-paleo-stop": {"gte": 3000000000, "lt": 3000000001}}}
            ]
        }
    }
}

Search Interface

The search interface should accept paleo date ranges and also accept "preset ranges" which are date ranges with a "humanized" name. These preset ranges on NOAA one-stop are in CE format

  • Holocene : from -9750 to now (CE)
  • Last Deglaciation from -17050 to -9750 (CE)
  • Last Glacial Period: from -113050 to -9750 (CE)
  • Last Interglacial: from -128050 to -113050 (CE)
  • Pliocene: from -5298050 to -2598050 (CE)
  • Paleocene-Eocene Thermal Maximum (PETM): from -55998050 to -54998050 (CE)

Note, the max date is -55,998,050 55 million years before CE

Query Parameters:

  • &paleo_start_date=-4,000,000&paleo_stop_date=-3,000,000
    • &paleo_filter=disjointed
  • &paleo_Interval=Holocene
    • &paleo_filter=disjointed
  • &chronostratigraphic=proterozoic
    • &paleo_filter=disjointed

Search Options

Records can also be filtered with

  • intersects - at least one date is inside range, or both span ; is not disjointed from - elastic intersect
  • fully contains - search criteria are inside record date range - elastic contains
  • is fully within - both start and end date are within search range - elastic within
  • is disjoint from - not in the range - elastic: not contains?



  • No labels

1 Comment

  1. NOAA ISO Paleo temporal example (https://www.ncei.noaa.gov/pub/data/metadata/published/paleo/iso/xml/noaa-icecore-27950.xml):

    <gmd:temporalElement>
        <gmd:EX_TemporalExtent id="boundingTemporalExtent">
            <gmd:extent>
                <gml:TimePeriod gml:id="K1">
                    <gml:beginPosition>-138110</gml:beginPosition>
                    <gml:endPosition>1912</gml:endPosition>
                </gml:TimePeriod>
            </gmd:extent>
        </gmd:EX_TemporalExtent>
    </gmd:temporalElement>