Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • It is standard advice to use median instead of avg (mean/average) when looking at data that is highly variable with many outliers that have extreme values, as we have here.
  • At the same time, to some extent outliers are of interest, because the pressing question is moreso "how can we improve those requests which are slow" over improving all requests. For that reason, avg is used alongside median where appropriate.

Change log

  • 6/8/23 – Did not update section 'Comparing UMM-JSON transformation step(s) vs. total Transform Stage', but otherwise:
    • Changed charts to use binning, which A) provides labels on x axes, and B) 'smooths out' the presence of outliers. Thus, mentions of outliers have been crossed out.
    • Added date to all splunk queries (so they work as is without selecting date)
    • Changed use of average and max to 95th percentile and 75th percentile, to lessen impact of outliers and because that's generally better for skewed distributions
    • All descriptive text referencing the above, changed as appropriate
    • Major updates to section 2: added more analysis to migrate umm step section (Table D replaced with Charts D1, D2), and corrected errors as well as added analysis to umm lib step section (updated Table E, added Charts F1, F2, F3)

Section 1 -- Fetch stage vs Transform stage per request (batch)

...