Talk Python to Me is a weekly podcast hosted by developer and entrepreneur Michael Kennedy. We dive deep into the popular packages and software developers, data scientists, and incredible hobbyists doing amazing things with Python. If you're new to Python, you'll quickly learn the ins and outs of the community by hearing from the leaders. And if you've been Pythoning for years, you'll learn about your favorite packages and the hot new ones coming out of open source.
…
continue reading
محتوای ارائه شده توسط Christoph Neumann and Nate Jones, Christoph Neumann, and Nate Jones. تمام محتوای پادکست شامل قسمتها، گرافیکها و توضیحات پادکست مستقیماً توسط Christoph Neumann and Nate Jones, Christoph Neumann, and Nate Jones یا شریک پلتفرم پادکست آنها آپلود و ارائه میشوند. اگر فکر میکنید شخصی بدون اجازه شما از اثر دارای حق نسخهبرداری شما استفاده میکند، میتوانید روندی که در اینجا شرح داده شده است را دنبال کنید.https://fa.player.fm/legal
Player FM - برنامه پادکست
با برنامه Player FM !
با برنامه Player FM !
Ep 029: Problem Unknown: Log Lines
Manage episode 233999238 series 2463849
محتوای ارائه شده توسط Christoph Neumann and Nate Jones, Christoph Neumann, and Nate Jones. تمام محتوای پادکست شامل قسمتها، گرافیکها و توضیحات پادکست مستقیماً توسط Christoph Neumann and Nate Jones, Christoph Neumann, and Nate Jones یا شریک پلتفرم پادکست آنها آپلود و ارائه میشوند. اگر فکر میکنید شخصی بدون اجازه شما از اثر دارای حق نسخهبرداری شما استفاده میکند، میتوانید روندی که در اینجا شرح داده شده است را دنبال کنید.https://fa.player.fm/legal
Nate is dropped in the middle of a huge log file and hunts for the source of the errors.
- We dig into the world of DonutGram.
- "We are the devops."
- Problem: we start seeing errors in the log.
- "The kind of error that makes total sense to whoever wrote the log line."
- (04:10) We want to use Clojure to characterize the errors.
- Why not grep?
- Well...we grep out the lines, count them, and we get 10,351 times. Is that a lot? Hard to say. We need more info.
- (05:50) Developers think it's safe to ignore, but it's bugging us.
- We want to come up with an incident count by user.
- "Important thing we do a lot in the devops area: we see data and then we have to figure out, what does it mean?"
- "What story is the data telling us?"
- The "forensic data" versus "operating data" (Ep 022, 027)
- "With forensic data, you're trying to save the state of the imprint of the known universe over time."
- (07:40) The log file is the most common forensic data for applications.
- To characterize, we need to manipulate the data at a higher level than grep.
- First goal: make the data useful in Clojure
- Get a nice sample from the production log, put it in a file, load up the REPL, and start writing code to parse it.
- Need to parse out the data from the log file and turn it into something more structured.
- (12:05) Let's make a function,
parse-line
that separates out all the common elements for each line:- timestamp
- log level
- thread name
- package name
- freeform "message"
- "A message has arrived from the developers, lovingly bundled in a long log line."
- (13:10) We can parse every line generically, but we need to parse the message and make sense of that.
- "We want to lift up the unstructured data into structured data that we can map, filter, and reduce on."
- We will have different kinds of log lines, so we need to detect them and parse appropriately.
- We want to amplify the map to include the details for the specific kind of log line.
- We'll use a
kind
field to identify which kind of log line it is. - There are two steps:
- recognize which kind of log line the "message" is for
- parse the data out of that message
- "Maybe we'll have a podcast that's constantly about
constantly
. It's not justjuxt
." - (18:45) How do we do this concisely?
- Let's use
cond
:- flatten out all the cases
- code to detect kind (the condition)
- code to parse the details (the expression)
- Can use
includes?
in the condition to detect the kind andre-matches
to parse. - (20:00) Why not use the regex to detect the kind too?
- Can avoid writing the regex twice by using a
def
. - It's less code, but now we're running the regex twice: once in the condition and once to parse.
- We can't capture the result of the test in
cond
. No "cond-let" macro. - We could write our own
cond-let
macro, but should we? - "When you feel like you should write a macro, you should step back and assess your current state of being. Am I tired? Did I have a bad day? Do I really need this?"
- (24:05) New goals for our problem:
- one regex literal
- only run the regex once
- keep the code that uses the matches close to the regex
- Similar problem to "routes" for web endpoints: want the route definition next to the code that executes using the data for that route.
- Feels like an index or table of options.
- (25:20) Let's make a "table". A vector of regex and handler-code pairs.
- We need a coffee mug: "Clojure. It's just a list."
- The code can be a function literal that takes the matches and returns a map of parsed data.
- Write a function
parse-details
which goes through each regex until one matches and then invokes the handler for that one. (See below.) - (30:15) Once we have higher-level data, it's straight forward to
filter
andgroup-by
to get our user count. - Once again, the goal is to take unstructured data and turn it into structured data.
- "You have just up-leveled unstructured information into a sequence of structured information."
- Can
slurp
,split
,map
,filter
, and then aggregate. - (32:10) What happens when we try to open a 10 GB log file?
- Sounds like a problem for next week.
- "Fixing production is always a problem for right now."
- "If a server falls over and no one outside of the devops team knows about it, did the server really fall over?"
- "Who watches the watchers?"
Related episodes:
Clojure in this episode:
#""
re-matches
case
cond
if-let
->>
slurp
filter
group-by
def
constantly
juxt
clojure.string/
includes?
split
Related links:
Code sample from this episode:
(ns devops.week-01 (:require [clojure.java.io :as io] [clojure.string :as string] )) ;; General parsing (def general-re #"(\d\d\d\d-\d\d-\d\d)\s+(\d\d:\d\d:\d\d)\s+\|\s+(\S+)\s+\|\s+(\S+)\s+\|\s+(\S+)\s+\|\s(.*)") (defn parse-line [line] (when-let [[whole dt tm thread-name level ns message] (re-matches general-re line)] {:raw/line whole :log/date dt :log/time tm :log/thread thread-name :log/level level :log/namespace ns :log/message message })) (defn general-parse [lines] (->> lines (map parse-line) (filter some?))) ;; Detailed parsing (def detail-specs [[#"transaction failed while updating user ([^:]+): code 357" (fn [[_whole user]] {:kind :code-357 :code-357/user user})] ]) (defn try-detail-spec [message [re fn]] (when-some [matches (re-matches re message)] (fn matches))) (defn parse-details [entry] (let [{:keys [log/message]} entry] (if-some [extra (->> detail-specs (map (partial try-detail-spec message)) (filter some?) (first))] (merge entry extra) entry))) ;; Log analysis (defn lines [filename] (->> (slurp filename) (string/split-lines))) (defn summarize [filename calc] (->> (lines filename) (general-parse) (map parse-details) (calc))) ;; data summarizing (defn code-357-by-user [entries] (->> entries (filter #(= :code-357 (:kind %))) (map :code-357/user) (frequencies))) (comment (summarize "sample.log" code-357-by-user) )
Log file sample:
2019-05-14 16:48:55 | process-Poster | INFO | com.donutgram.poster | transaction failed while updating user joe: code 357 2019-05-14 16:48:56 | process-Poster | INFO | com.donutgram.poster | transaction failed while updating user sally: code 357 2019-05-14 16:48:57 | process-Poster | INFO | com.donutgram.poster | transaction failed while updating user joe: code 357
118 قسمت
Manage episode 233999238 series 2463849
محتوای ارائه شده توسط Christoph Neumann and Nate Jones, Christoph Neumann, and Nate Jones. تمام محتوای پادکست شامل قسمتها، گرافیکها و توضیحات پادکست مستقیماً توسط Christoph Neumann and Nate Jones, Christoph Neumann, and Nate Jones یا شریک پلتفرم پادکست آنها آپلود و ارائه میشوند. اگر فکر میکنید شخصی بدون اجازه شما از اثر دارای حق نسخهبرداری شما استفاده میکند، میتوانید روندی که در اینجا شرح داده شده است را دنبال کنید.https://fa.player.fm/legal
Nate is dropped in the middle of a huge log file and hunts for the source of the errors.
- We dig into the world of DonutGram.
- "We are the devops."
- Problem: we start seeing errors in the log.
- "The kind of error that makes total sense to whoever wrote the log line."
- (04:10) We want to use Clojure to characterize the errors.
- Why not grep?
- Well...we grep out the lines, count them, and we get 10,351 times. Is that a lot? Hard to say. We need more info.
- (05:50) Developers think it's safe to ignore, but it's bugging us.
- We want to come up with an incident count by user.
- "Important thing we do a lot in the devops area: we see data and then we have to figure out, what does it mean?"
- "What story is the data telling us?"
- The "forensic data" versus "operating data" (Ep 022, 027)
- "With forensic data, you're trying to save the state of the imprint of the known universe over time."
- (07:40) The log file is the most common forensic data for applications.
- To characterize, we need to manipulate the data at a higher level than grep.
- First goal: make the data useful in Clojure
- Get a nice sample from the production log, put it in a file, load up the REPL, and start writing code to parse it.
- Need to parse out the data from the log file and turn it into something more structured.
- (12:05) Let's make a function,
parse-line
that separates out all the common elements for each line:- timestamp
- log level
- thread name
- package name
- freeform "message"
- "A message has arrived from the developers, lovingly bundled in a long log line."
- (13:10) We can parse every line generically, but we need to parse the message and make sense of that.
- "We want to lift up the unstructured data into structured data that we can map, filter, and reduce on."
- We will have different kinds of log lines, so we need to detect them and parse appropriately.
- We want to amplify the map to include the details for the specific kind of log line.
- We'll use a
kind
field to identify which kind of log line it is. - There are two steps:
- recognize which kind of log line the "message" is for
- parse the data out of that message
- "Maybe we'll have a podcast that's constantly about
constantly
. It's not justjuxt
." - (18:45) How do we do this concisely?
- Let's use
cond
:- flatten out all the cases
- code to detect kind (the condition)
- code to parse the details (the expression)
- Can use
includes?
in the condition to detect the kind andre-matches
to parse. - (20:00) Why not use the regex to detect the kind too?
- Can avoid writing the regex twice by using a
def
. - It's less code, but now we're running the regex twice: once in the condition and once to parse.
- We can't capture the result of the test in
cond
. No "cond-let" macro. - We could write our own
cond-let
macro, but should we? - "When you feel like you should write a macro, you should step back and assess your current state of being. Am I tired? Did I have a bad day? Do I really need this?"
- (24:05) New goals for our problem:
- one regex literal
- only run the regex once
- keep the code that uses the matches close to the regex
- Similar problem to "routes" for web endpoints: want the route definition next to the code that executes using the data for that route.
- Feels like an index or table of options.
- (25:20) Let's make a "table". A vector of regex and handler-code pairs.
- We need a coffee mug: "Clojure. It's just a list."
- The code can be a function literal that takes the matches and returns a map of parsed data.
- Write a function
parse-details
which goes through each regex until one matches and then invokes the handler for that one. (See below.) - (30:15) Once we have higher-level data, it's straight forward to
filter
andgroup-by
to get our user count. - Once again, the goal is to take unstructured data and turn it into structured data.
- "You have just up-leveled unstructured information into a sequence of structured information."
- Can
slurp
,split
,map
,filter
, and then aggregate. - (32:10) What happens when we try to open a 10 GB log file?
- Sounds like a problem for next week.
- "Fixing production is always a problem for right now."
- "If a server falls over and no one outside of the devops team knows about it, did the server really fall over?"
- "Who watches the watchers?"
Related episodes:
Clojure in this episode:
#""
re-matches
case
cond
if-let
->>
slurp
filter
group-by
def
constantly
juxt
clojure.string/
includes?
split
Related links:
Code sample from this episode:
(ns devops.week-01 (:require [clojure.java.io :as io] [clojure.string :as string] )) ;; General parsing (def general-re #"(\d\d\d\d-\d\d-\d\d)\s+(\d\d:\d\d:\d\d)\s+\|\s+(\S+)\s+\|\s+(\S+)\s+\|\s+(\S+)\s+\|\s(.*)") (defn parse-line [line] (when-let [[whole dt tm thread-name level ns message] (re-matches general-re line)] {:raw/line whole :log/date dt :log/time tm :log/thread thread-name :log/level level :log/namespace ns :log/message message })) (defn general-parse [lines] (->> lines (map parse-line) (filter some?))) ;; Detailed parsing (def detail-specs [[#"transaction failed while updating user ([^:]+): code 357" (fn [[_whole user]] {:kind :code-357 :code-357/user user})] ]) (defn try-detail-spec [message [re fn]] (when-some [matches (re-matches re message)] (fn matches))) (defn parse-details [entry] (let [{:keys [log/message]} entry] (if-some [extra (->> detail-specs (map (partial try-detail-spec message)) (filter some?) (first))] (merge entry extra) entry))) ;; Log analysis (defn lines [filename] (->> (slurp filename) (string/split-lines))) (defn summarize [filename calc] (->> (lines filename) (general-parse) (map parse-details) (calc))) ;; data summarizing (defn code-357-by-user [entries] (->> entries (filter #(= :code-357 (:kind %))) (map :code-357/user) (frequencies))) (comment (summarize "sample.log" code-357-by-user) )
Log file sample:
2019-05-14 16:48:55 | process-Poster | INFO | com.donutgram.poster | transaction failed while updating user joe: code 357 2019-05-14 16:48:56 | process-Poster | INFO | com.donutgram.poster | transaction failed while updating user sally: code 357 2019-05-14 16:48:57 | process-Poster | INFO | com.donutgram.poster | transaction failed while updating user joe: code 357
118 قسمت
همه قسمت ها
×به Player FM خوش آمدید!
Player FM در سراسر وب را برای یافتن پادکست های با کیفیت اسکن می کند تا همین الان لذت ببرید. این بهترین برنامه ی پادکست است که در اندروید، آیفون و وب کار می کند. ثبت نام کنید تا اشتراک های شما در بین دستگاه های مختلف همگام سازی شود.