Leaping into Log Analysis
Leaping into Log Analysis

Leaping into Log Analysis

Leaping into Log Analysis

If you’re like me, sifting through tons of data by hand is one of the most boring and tedious processes known to man. Thankfully, we have Bash (the Unix shell command language) to streamline our search. I’ll walk you through the fundamentals and show you how you can put them together to get the answers you need.

Background

For those of you who are not already familiar with them, “a log, in a computing context, is the automatically produced and time-stamped documentation of events relevant to a particular system. Virtually all software applications and systems produce log files.”1 Out in the wild, log analysis is often under-appreciated, but it becomes very important when you’re trying to identify the source of a breach.

You want to know which IP address connected to your server and downloaded files? Logs can tell you that. You want to know how many different users are trying to log into your system without proper credentials? You can find that too.

Commands

Command
Use
Examples
Common Flags
cat
Outputting text
cat output2.txt(takes text of file ‘output2.txt’ and prints it to the terminal window)
|
Piping
cat output2.txt | [2nd command] (takes text of file output2.txt and uses it as an input for the next command)
N/A
grep
Pattern matching
grep -i ‘example’Prints all lines with ‘example’
-i, -v
awk
Everything
awk ‘{print $3}’(prints 3rd column of text)
wc
Word count
cat file.txt | wc -l(Counts lines in file.txt)
-l, -c, -w
sort
Sorts output
cat file.txt | awk ‘{print $7}’ | sort -d(takes the output from awk and sorts it in dictionary order)
-d, -n
uniq
Remove duplicates
cat file.txt | sort | uniq
-c

cat

The most fundamental of commands. cat is not just a cute animal that people make memes about, but it also prints out all the text of a file to your terminal. “Now WebWitch”, you might say, “I can read the file in my GUI, why do I need to print it out to the terminal?” Though it might be rather useless on its own, if you take the output of cat and feed it into the input of another command, you can sift through for the information you need.

image

|

This vertical bar (generally found on the right side of your keyboard above the enter key) is called a pipe. Like its namesake, it will direct the output of one command into the input of another like so:

image

grep

This is a basic pattern matching tool mostly used to find specific words in a piece of text. If I wanted to find all the lines in the log that say CONNECT, I would type:

image

Notice how we take cat to put the contents of the file into the input of our grep command.

In its natural state, grep searches for text that matches the case of whatever you put in. If I typed grep connect, it wouldn’t return any line with CONNECT, Connect, etc. If you want the search to ignore case, pass in the -i flag:

image

If you need grep to search for a short phrase, encapsulate the phrase in quotes:

image

If you’re in a situation where you’re not quite sure what you need out of the log file, but you’re very certain of what you don’t need, you can use the -v flag to match everything but what you put in the quotes:

image

Note: This forces case matching whether it is a word or a phrase.

This is not all grep can do! Grep has even more flags you can pass in for different functionality — check the man page for details.

wc

Now that we have all of the lines where a user is trying to CONNECT to the server, how do we determine how many connections there are? wc can help! This command counts the number of words, lines, or characters of the file that you give it. In the NCL, it’s most commonly used with the -l flag after a grep command to count the number of lines that match the conditions you’ve searched for. If I needed to find how many failed logins there were by all users, I would enter the following:

image

Note: -l gives number of lines, -w gives number of words, and -m gives number of total characters.

awk

This is an extremely powerful command that we will only be scratching the surface of. The function I used it most often for was its ability to print data out by individual columns based on a separator you define. By default, it will use spaces in the text. Let’s say you’ve identified all the lines where users are connecting to the server in our example log. If you only want their IP addresses, you would craft the command like this:

image

$0 would print you out the whole line, $NF would give you the last column in the rows. There are thousands of uses for awk, but going over all of them would take a blog *series*. I highly suggest looking up some of its other uses on your own — you might find them very helpful.

sort and uniq

These two commands are the cherry on top of your filtered data. There are many times in the NCL where you will be asked questions along the lines of “what is the name of the user who failed connect to the server the 3rd most times?” or “How many different IP addresses tried to connect to the server.” When you have a long list of potential answers, these two commands are your friend. Sort, as you may assume, sorts all input data alphabetically. Uniq will remove duplicate lines that are next to each other. Passing the -c flag into uniq will count how many duplicates existed in the data. You can combine these two commands in different orders to suit the needs of the question at hand.

Pro Tips:

  1. Not all log files look alike. One of the biggest hurdles you’ll face is first understanding the structure of each entry. Spend a bit of time getting to understand the data you’re looking at before diving in, it will save you a lot of accuracy loss in the future.
  2. Make sure you understand the output you’re getting with each command you’re writing. Though you might think you’re only grabbing lines that say FAIL LOGIN if you search FAIL, but some lines with FAIL DOWNLOAD might have sneaked in there without you noticing — throwing off the final count of whatever you were originally looking for.
  3. If you’re having difficulty deciphering exactly what information is in the log file, the title of the module may provide some insight. You may be able to find other examples online with that same log structure that are explained in more detail.
  4. If the question asks how many bytes were uploaded onto the server, it is probably smart to grep for the lines with “upload” and see what information the log gives you right off the bat. You can always whittle it down or expand your search from there.

With all of that, I bid you bon voyage on your journey through the sea of logs!

© 2019-2020 WebWitch | Security Consultant | Assistant Chief Player Ambassador, National Cyber League