Doug noticed a new format in the logs coming from 1.1. Upon so, he noticed that all queries needed to be updated in order to meet the current metric reporting.
That's a bit of a beat down given that the hope was just to update sourcetypes and everything just works
This works for EDSC with some manual work.
The biggest concern for right now is triaging issues.
Applications are just providing text. Splunk has some logic/black magic that identifies events as such and formats them certain ways.i
Some more notes on what Docker/NGAP 1.1 is doing to set up application logs:
We use the Splunk log driver that's built into Docker, configuring it with the Splunk URL and HEC Token that Docker uses to send events (every line of the container's output). The events that Docker sends package the line of text in a JSON object, containing metadata about the event, container, ECS service, and instance; this info is critical for debugging any issues that come up, as it's the only way we can track a log event down to the specific ECS service, task and container that produced it. Example event:
Extracting just the line field can be done through queries such as `... | table line`. Searching within a specific field can be done like: `... | search line="*error*"`.
This is using the Splunk log driver's default "inline" format. There are also "json" and "raw" options:
"json" - assumes the text (line) is JSON, and parses it (falls back to `inline` if it can't be parsed). The parsed JSON is still, as with the `inline` format, passed under the `line` field; just as a structured JSON object, rather than as a string.
"raw" - Prefixes each line of text w/ all of the attributes and tags. The text of the line is otherwise unchanged. This still leaves alot for consumers to wade through, but possibly easier to read/trim than the JSON-formatted output--as long as we don't lose all the container metadata (or preserve it in event attributes rather than in the text of the event). Example from docs:
MyImage/MyContainer env1=val1 label1=label1 my message
Since the "raw" format adds metadata at the beginning of each line, it's likely to still break whatever logic Splunk has for concatenating lines of a stack trace.
All three formats treat a single line of output as an event.
NGAP also has a CloudWatch-to-Splunk forwarder that's used for e.g. getting Lambda logs into Splunk. We're starting to rely on this more and more, as CloudWatch is the one logging mechanism that's (most likely to be) built into any given AWS service. Docker also has a CloudWatch log driver, that we could use instead of the Splunk driver, to get the logs to where they would be picked up by the CloudWatch-to-Splunk forwarder. The CloudWatch driver uses the log group and log stream to provide container-specific metadata, rather than embedding it into the log text:
This doesn't solve the problem of concatenating stack trace lines, though, unless Splunk has built-in logic for doing this concatenation that is now (because the event text is just the raw output text, without the additional metadata) able to recognize and handle the traces.
1 Comment
Nathan Clark
Some more notes on what Docker/NGAP 1.1 is doing to set up application logs:
We use the Splunk log driver that's built into Docker, configuring it with the Splunk URL and HEC Token that Docker uses to send events (every line of the container's output). The events that Docker sends package the line of text in a JSON object, containing metadata about the event, container, ECS service, and instance; this info is critical for debugging any issues that come up, as it's the only way we can track a log event down to the specific ECS service, task and container that produced it. Example event:
Screenshot
Extracting just the
line
field can be done through queries such as `... | table line`. Searching within a specific field can be done like: `... | search line="*error*"`.This is using the Splunk log driver's default "inline" format. There are also "json" and "raw" options:
"json" - assumes the text (line) is JSON, and parses it (falls back to `inline` if it can't be parsed). The parsed JSON is still, as with the `inline` format, passed under the `line` field; just as a structured JSON object, rather than as a string.
"raw" - Prefixes each line of text w/ all of the attributes and tags. The text of the line is otherwise unchanged. This still leaves alot for consumers to wade through, but possibly easier to read/trim
than the JSON-formatted output--as long as we don't lose all the container metadata (or preserve it in event attributes rather than in the text of the event). Example from docs:
MyImage/MyContainer env1=val1 label1=label1 my message
Since the "raw" format adds metadata at the beginning of each line, it's likely to still break whatever logic Splunk has for concatenating lines of a stack trace.
All three formats treat a single line of output as an event.
NGAP also has a CloudWatch-to-Splunk forwarder that's used for e.g. getting Lambda logs into Splunk. We're starting to rely on this more and more, as CloudWatch is the one logging mechanism that's (most likely to be) built into any given AWS service. Docker also has a CloudWatch log driver, that we could use instead of the Splunk driver, to get the logs to where they would be picked up by the CloudWatch-to-Splunk forwarder. The CloudWatch driver uses the log group and log stream to provide container-specific metadata, rather than embedding it into the log text:
Screenshot
(Query: "index=main log_group::/ngap/ecs-prod/app/nclark4-prod-test-app")
This doesn't solve the problem of concatenating stack trace lines, though, unless Splunk has built-in logic for doing this concatenation that is now (because the event text is just the raw output text, without the additional metadata) able to recognize and handle the traces.