Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Link to First Stage of Root Cause Analysis for CMR-9094

Table of Contents
maxLevel2

Document Overview

This performance analysis is based off running a 2021 production workload of search requests, examining only those requests with target format of UMM-JSON. The search requests were run using LogMiner (which parses a set of production logs and fires the same search requests it reads), with default pacing on full workload environment, against a modified version of CMR that had key functions wrapped with a 'time-execution' function that reports how long those take, along with extra logs to report those times. Logs captured the milliseconds (ms) that a request takes in 2 key areas:

...

  • It is standard advice to use median instead of avg (mean/average) when looking at data that is highly variable with many outliers that have extreme values, as we have here.At the same time, to some extent outliers are of interest, because the pressing question is moreso "how can we improve those requests which are slow" over improving all requests. For that reason, avg is used alongside
  • median where appropriate.95th percentiles and 75th percentiles are shown alongside median where appropriate in order to give a sense of distribution/variance without showing highest outliers in these skewed distributions

Change log

  • 6/8/23 – Did not update section 'Comparing UMM-JSON transformation step(s) vs. total Transform Stage', but otherwise:
    • Changed charts to use binning, which A) provides labels on x axes, and B) 'smooths out' the presence of outliers. Thus, mentions of outliers have been crossed out.
    • Added date to all splunk queries (so they work as is without selecting date)
    • Changed use of average and max to 95th percentile and 75th percentile, to lessen impact of outliers and because that's generally better for skewed distributions
    • All descriptive text referencing the above, changed as appropriate
    • Major updates to section 2: added more analysis to migrate umm step section (Table D replaced with Charts D1, D2), and corrected errors as well as added analysis to umm lib step section (updated Table E, added Charts F1, F2, F3 by metadata length, Tables G1, G2 by provider)

...