Generic outage communication information can be found here: Communicating outages to the customer

Overview

When a degraded service event or planned maintenance occurs that impacts Earthdata Search users, we are required to notify the user community through 3 platforms:

  1. Slack
  2. Email
  3. Status App

A degraded service event also needs to be reported on the Degraded Service Events Wiki page using the template at the top of that page. In instances where a DSE impacts multiple applications, coordinate with other team operations to only write one DSE report for each event.


Slack

Earthdata Search relies on two Slack channels to communicate to users:

  • #edsc: This is a "public" channel. Messages related to Earthdata Search PROD and UAT should be reported here.
  • #edsc-dev: This channel is made up of primarily internal users. If a DSE or maintenance event only impacts Earthdata Search SIT, it can just be posted here. In general, most users that are in this channel are also in the #edsc channel, so messages don't need to be posted in both channels.

Because these are internal channels, messages can be more technical and explain the reason for a DSE, impacted or dependent applications, or expected resolution times. A message should be posted at the beginning of a DSE, whenever relevant information is obtained, and once the DSE has been resolved. If an application that Earthdata Search is dependent on is experiencing a DSE (such as CMR or Earthdata Login), any messages from those application's Slack channels can be shared in the #edsc or #edsc-dev channels.

Example Message (Start of Issue): Partners, CMR PROD is currently experiencing degraded service, affecting the ability to retrieve collections in Earthdata Search PROD. We will update this channel when we have more information or an expected resolution time. We apologize for the inconvenience.

Example Message (Resolved): Partners, the issues in CMR PROD have been resolved and Earthdata Search operations have returned to normal. We apologize for the inconvenience.

Example Message (NGAP Maintenance): Partners: A reminder that NGAP will be deploying to each of the EDSC environments this morning and you may experience a short disruption in service during each deployment. Thank you for your patience.

Email

Messaging must also be sent out on the earthdata-status-internal listserv (earthdata-status-internal@lists.nasa.gov) using the template below. This listserv is also made up of the internal user community, but should have more technical empathy than messages posted through Slack. The message should only contain very high level information notifying users they can expect some sort of impact to Earthdata Search.

Emails to the listserv must be approved by the listserv admins. The current list of owners that can approve messages includes:

  • Catalino Cuadrado
  • John Teague
  • Mark Schmele
  • Srinivasa Tummala

Example Message (DSE): Earthdata Search is currently experiencing issues related to the Earthdata Login test scheduled for this morning, 9/16/21, between 8:00 AM - 12:00 PM ET. Users may experience issues when trying to load granules. We are investigating and will have a fix deployed shortly. We will then resume the Earthdata Login fail open test with login disabled in Earthdata Search. When the test is complete, we will re-enable login in Earthdata Search.

Example Message (DSE): Earthdata Search is currently experiencing issues with temporal filtering. Users may see an error banner when trying to filter collections or granules with temporal filters. We are investigating and apologize for the inconvenience. 

Example Message (Planned Maintenance): Earthdata Login is conducting a test today, 9/16/21 between 8:00 AM - 12:00 PM ET for approximately 1 hour. During this time, login for Earthdata Search will be disabled. Users will still be able to download single granules, but will be unable to download collections and granules via the project page. Options for direct download, stage for delivery, or customized orders will not be available during this test. We will remove this alert once the test is complete.

The email must follow a specific template (file link here):

Once the DSE or planned maintenance is resolved, a follow-up email should be sent to the listserv notifying them that it is complete. Reply to the previous email that was sent to the listserv, but include "RESOLVED" in the subject line and update the template:

Status App

See the Status App section on the Search Team Operations Task Transfer page for information on getting approval to post notifications.

The Status App is used across Earthdata applications to create notifications to the user community. The alerts appear in the bell icon at the top of the Earthdata Search page and will be visible to all users who visit the website. Notifications can be posted by any approved user of the Status App with permissions to the Earthdata Search pages. Make sure to check existing notifications before posting a new message to ensure they are not being duplicated.

There are three types of notifications that can be posted with the Status App:

  1. Message: Displayed in blue. Use to provide general information.
  2. Alert: Displayed in yellow. Use for planned maintenance notifications or DSE's with minor impact.
  3. Outage: Displayed in red and will automatically pop-up outside of the alarm bell for all users who visit the Earthdata Search page. Use for major DSE's or impacts from planned maintenance.

Because notifications posted here will be available to all users who visit the Earthdata Search page, the verbiage used should have more technical empathy than messages posted through Slack (similar to the email listserv). The message should only contain very high level information notifying users they can expect some sort of impact to Earthdata Search.

Example Notification (Planned Maintenance - ALERT): Earthdata Login is conducting a test today, 9/16/21 between 8:00 AM - 12:00 PM ET for approximately 1 hour. During this time, login for Earthdata Search will be disabled. Users will still be able to download single granules, but will be unable to download collections and granules via the project page. Options for direct download, stage for delivery, or customized orders will not be available during this test. We will remove this alert once the test is complete.

Example Notification (DSE - OUTAGE): We have detected an issue with Earthdata Login that may impact user's ability to login. We are currently investigating the issue and apologize for the inconvenience.

Example Notification (DSE - ALERT): The service used by Earthdata Search to process shapefiles is currently down. Users will not be able to upload or process shapefiles for spatial filtering. We apologize for the inconvenience.

Steps to Post a Status App Message:

  1. Login to the Status App at https://status.earthdata.nasa.gov/.
  2. Click 'Create New Notification'.
  3. Select the 'NOTIFICATION TYPE' based on the criteria mentioned above.
  4. Fill in the 'START DATE' set to a couple of minutes after you will complete writing the notification. Do not put in an 'END DATE'.
  5. Fill out the 'NOTIFICATION MESSAGE', remembering to use technical empathy.
  6. Select the appropriate applications. A notification can be posted to multiple applications at a time. If planned maintenance will impact Earthdata Search PROD, UAT, and/or SIT, make sure to select all relevant applications.
  7. Click 'Post Notification'.
  8. Visit the Earthdata Search application(s) that were selected to confirm the notification is posted.

Steps to Remove a Status App Message:

  1. Once the DSE or planned maintenance is resolved, return to the Status App and find the notification by selecting the 'Applications' drop-down or by paging through the 'Active Notifications' and clicking 'Edit' underneath the notification.
  2. Fill in the 'END DATE' field to a time at least one minute out, then click 'Update Notification'.
  3. Visit the Earthdata Search application(s) to confirm the notification has been removed.
  • No labels