Nginx

Prompt

Analyze an nginx access log and answer questions about what happened.

access.log13.1KB

Tutorial Video

Walk-Through

Video tutorial: Cyber Skyline NCL Summer Live - Log Analysis 1 - July 8 2021Cyber Skyline NCL Summer Live - Log Analysis 1 - July 8 2021

This challenge involves analyzing an nginx access log. The questions can be solved through manual inspection of the file and by using basic Linux commands to parse the log.

Looking through the first few lines of the log, it is apparent that the IP addresses are the first field in each line:

image

To answer the first question, the IPs need to be extracted, sorted to filter for only unique ones, and counted. This can be done with cut, sort, uniq and wc :

image

For a more thorough explanation of these commands, refer to the Walkthrough for Log Analysis Challenge Login.

Extracting data from a column in a log file:

Looking at the first screenshot, the HTTP return codes are in the fourth field from the last:

image

The field right before it is enclosed in double quotes, so " can be used as a delimiter with cut. The first field before the " will contain data from the IP address to the timestamp. The second field, starting with GET, is the actual HTTP request. Therefore, the return codes will be the third field when a double quote is used as the delimiter.

💡

*Note: you only need to use one "between two single quotes in the cut command.

A portion of the output of cat access.log | cut -d '"' -f3 is shown here :

image

To eliminate the second column from this output, the output can be piped through another cut command using a space as the delimiter. It looks like there is a space before the HTTP return codes as well, so the codes will be the second field after the first “space”:

image

Now that we are working with the HTTP return codes, we can sort and count the code occurrences:

image

Matching patterns with grep:

The remaining parts of this challenge require using grep , a tool that can be used to search entries for a keyword. Refer to the Linux: Basic Commands Walkthrough for more information on using grep. Using grep with the -o flag tells grep to print only the part of the line that matches the pattern, instead of the entire line.

image

Extracting columnar data with awk

Before answering the questions about HTTP methods is may be helpful to learn more about them here. Solving the questions about the HTTP methods used can be approached in two ways; using cut or awk.

cut can be used to extract the field containing the HTTP request methods ( the second field enclosed in double quotes). Then cut will be used to extract the first field of that output (which contains the actual HTTP request method). The output of that will be sorted, and counted using uniq.

sort -rn will list the output in reverse numeric order, so the term with the highest number of occurrences is listed at the top.

image

awk can also be used to get the desired output. It is a powerful text processing tool that treats any amount of whitespace as a single field separator by default, as compared to cut, which treats only a tab space as a default field separator. Therefore, for awk, the HTTP Request Method would be the 6th field from the left.

image

Backslash - Escape character:

The last question prompts us to look for a raw byte sequence in the log file. If grep '\41' access.log was used, the Linux shell is going to convert the byte to ASCII, and look for ‘A’ instead. To prevent the shell from interpreting the backslash as an escape character, it needs to be escaped with another backslash as follows: grep ‘\\41’ access.log. This ensures grep receives the full byte sequence and not the translated character.

The output of
The output of grep ‘\\x04’ access.log

Questions

1. How many different IP addresses reached the server?

cat access.log | cut -d " " -f 1 | sort | uniq | wc -l
Extract the first field (with the IP addresses), sort the IP addresses, get the unique IP addresses, and then get a line count

2. How many requests yielded a 200 code?

cat access.log | cut -d '"' -f 3 | cut -d ' ' -f 2 | sort | uniq -c | sort -rn
Extract the third field (with the IP addresses), sort the IP addresses, get the unique values with a count of the occurrences of each IP address, and then sort in descending numeric order

3. How many requests yielded a 400 code?

Same as the question above

4. What IP address rang at the doorbell?

cat access.log | grep "bell"
Search the log for any lines that contain “bell”

5. What version of the Googlebot visited the website?

cat access.log | grep "Googlebot"
Search the log for any lines that contain “Googlebot”

6. Which IP address attempted to exploit the Shellshock vulnerability?

Search online for details about the Shellshock vulnerability. You should be able to find that the presence of this sequence of characters () { :; }; is an indication of an attempted exploitation of this vulnerability.

cat access.log | grep '() { :; };'
Search the log for any lines that contain () { :; };

7. What was the most popular version of Firefox used for browsing the website?

cat access.log | egrep -o "Firefox/.*" | sort | uniq -c
Search the log for all lines that contain “Firefox” and the following characters which make up the version number, sort those values, and then get a unique count.

8. What is the most common HTTP method used?

cat access.log | awk -F " " '{print $6}' | sort | uniq -c | sort -rn
Extract the 6th field (with the HTTP method), sort, get the unique values with a count of the occurrences of each value, and then sort in descending numeric order.

9. What is the second most common HTTP method used?

Same as the question above

10. How many requests were for \x04\x01\x00P\xC6\xCE\x0Eu0\x00?

cat access.log | grep '\\x04\\x01\\x00P\\xC6\\xCE\\x0Eu0\\x00' | wc -l
Search the log for all lines that contain that sequence of characters and then get a line count. Note that that command requires two backslashes for each original backslash to perform a proper escape sequence for the backslash.

©️ 2025 Cyber Skyline. All Rights Reserved. Unauthorized reproduction or distribution of this copyrighted work is illegal.