The goal of meeting is to learn what options are available to handle bigger amount of data (e.g., TerraFusion) transformation & service faster through clustering on AWS.

Date: 2020.08.26 (Wed)

Attendees

Meeting Agenda

Here are some questions for Esri:

  • Scaling options and licensing
    • Is it allowed to put AWS Marketplace AMI into auto-scaling group? Will the same active license work if a new instance is created by the load balancer?
      • Yes. Some customers successfully ran this scenario. License will be authorized automatically. It's always good to set threshold conservatively like CPU load = 0.6 because of boot-up time of new AWS instance.
    • Will the new server instance federate automatically with the portal?
    • Can Portals be put under auto-scaling group, too?
      • Doable but it is not recommended. It's better to have one robust machine (16 core can serve 1,000 users).
    • How about Notebook Servers? Can they form a Dask cluster automatically?
      • This is in a to-do list for Notebook team.
  • How to make service respond faster
    • Can PostgreSQL DB become a bottleneck if multiple instances make requests to DB that has mosaic datasets? Do you recommend the use of DB cluster?
      • Splitting and archiving large data into separate Goespation DB (2TB) is possibility.
      • Use-case based high-demand data into PostgreSQL is more sensible.
      • Databricks / Snowflake like connector & streaming is possible.
    • What's the best way to distribute traffic based on service and region?
      • For example, if a user comes from east, serve the user from EC2 instance on east region.
      • If a user, regardless of user's region, asks for "A" image service, serve the user from EC2 instance on west region. If "B" service, serve the user from EC2 instance on east.
        • Use Route 53 plus CDN & CloudFront.
  • How to make service handle bigger data
    • Can image / feature service benefit from Bigdata store? Will it be faster than using DB in creating mosaic or serving mosaic dataset?
    • Is it better to store data in Parquet in cluster environment? Or is CRF already optimized for cluster environment?

Action Items for Next Week

0%

Task List