Prompt
We need to analyze a custom log format that we use to monitor data transfers on a secure network. Unfortunately, the parser was not committed to our code repository, but we do have a copy of the file format spec. See if you can use it and help us answer questions about the log file. Provide all date/timestamps as UTC.
File Format
Overview
The SKY log file format was designed to record network traffic data in an efficient manner, utilizing raw binary data to save space compared to traditional text files. All fields in this document are written in Big-Endian notation.
Terms
- A int is 32 bits, or 4 bytes
- A long is 64 bits, or 8 bytes
- A timestamp is a 32-bit Unix Timestamp, or 4 bytes
Specification
Header
The SKY header begins at offset 0 and is comprised of the following fields in order with no padding.
- Magic Bytes
- Version Number
- Creation Timestamp
- Hostname length
- Hostname
- Flag length
- Flag
- Number of entries
Magic Bytes
The magic bytes field is an 8-byte unique sequence used to identify this file as using the SKY format. All valid SKY files must begin with a single, 8 byte sequence: 0x91534B590D0A1A0A
.
Version
The version field is a single byte. This document outlines the specification for version 1. All valid SKYv1 files must have the version field set to 0x01
.
Creation Timestamp
The creation timestamp is a single timestamp used to denote the start of data collection within the log file.
Hostname Length
The hostname length is a single int used to denote how many bytes long the hostname is. This value is used to determine how many bytes to read for the following Hostname
field. If no host is specified, this value should be 0x00
.
Hostname
The hostname is a dynamic length string used to identify the name of the host the log file was created on. The byte-length of the hostname is the value from the previous Hostname Length
field.
Flag Length
The flag length is a single int used to denote how many bytes long the flag is. This value is used to determine how many bytes to read for the following Flag
field. If no flag is specified, this value should be 0x00
.
Flag
The flag is a dynamic length string used to specify a flag value for the log file. The byte-length of the flag is the value from the previous Flag Length
field.
Note: This field can be used to store both encoded/encrypted flags as well as plaintext flags. It is the responsibility of the parsing application to interpret this value and do any necessary conversions.
Number of entries
The number of entries field is a single int used to denote the number of items in the body.
Body
The body is a sequence of items, each with 4 required fields. All items are written in chronological order without any padding between items. Each item contains the following fields:
- Source IP
- Destination IP
- Timestamp
- Bytes transferred
Source IP
The IPv4 address of the sender of the data stream. Represented as an int.
Destination IP
The IPv4 address of the destination of the data stream. Represented as an int.
Timestamp
The timestamp when the data stream was initiated. Represented as a timestamp.
Bytes Transferred
The number of bytes transferred in the data stream. Represented as an int.
Walk-Through
This challenge involves following a specification to interpret and analyze a binary log file. Solving this challenge requires either using a data manipulation tool, such as cyberchef, or writing a custom script. If using cyberchef, it is possible to load in the log file using the “Open file as input” option.
The file format specification provided in the prompt provides all the necessary information to solve the questions. Using the specification to create a quick reference guide (listing out the different fields, their offsets, and their lengths) can make it easier to solve the questions. This can be created by stepping through the specification one field at a time with the length of each field being described in the specification and the offset being calculated as the sum of the lengths of all previous fields. A version of this quick reference is below:
Field | Offset | Length |
Magic Bytes | 0 | 8 |
Version | 8 | 1 |
Creation Timestamp | 9 | 4 |
Hostname Length | 13 | 4 |
Hostname | 17 | 14 |
Flag Length | 31 | 4 |
Flag | 35 | 20 |
Number of entries | 55 | 4 |
Body | 59 | 2592 |
The first few questions can be solved entirely using cyberchef. However, the later questions which involve analysis on the entries within the log file require additional data processing that would be easier using Linux command line tools. This cyberchef recipe can be used to create a column-formatted hex version of the log. The recipe extracts the bytes for the entries, splits the data into 16-byte lines (one line per entry) and then splits those into 4-byte columns. Save this result into a filed named hex.log
.
Once the log file is in a text format, it becomes easier to convert the data one column at a time and then rejoin the columns into a completed, human-readable log.
Column 1
- Extract the first column from hex.log
- Convert the hex values into IPv4 addresses using cyberchef
- Save the results to a file named
col1.log
cat hex.log | cut -d " " -f 1
Column 2
- Extract the second column from hex.log
- Convert the hex values into IPv4 addresses using cyberchef
- Save the results to a file named
col2.log
cat hex.log | cut -d " " -f 2
Column 3
- Extract the third column from hex.log
- Convert the hex values into dates using cyberchef
- Save the results to a file named
col3.log
cat hex.log | cut -d " " -f 3
Column 4
- Extract the fourth column from hex.log
- Convert the hex values into integers using cyberchef
- Save the results to a file named
col4.log
cat hex.log | cut -d " " -f 4
Combine the logs into a single, human-readable log named merged.log
paste col1.log col2.log col3.log col4.log > merged.log
Questions
What is the hostname of the server?
- Calculate the length of the hostname, which is represented as a 4-byte integer at offset 13. Use this cyberchef recipe to calculate the length of the hostname, which is 14.
- Extract the hostname, which is an ASCII string of length 14 at offset 17. Use this cyberchef recipe to extract the hostname.
What is the plaintext flag in the log file?
- Calculate the length of the flag, which is represented as a 4-byte integer at offset 31. Use this cyberchef recipe to calculate the length of the flag, which is 20.
- Extract the flag, which is an ASCII string of length 20 at offset 35. Then, perform a base64 decode. Use this cyberchef recipe to extract the flag.
On what date was the file created (in UTC)?
The date is a 4-byte timestamp at offset 9. Use this cyberchef recipe to extract the timestamp.
How many entries are in the log file?
The number of entries is a 4-byte integer at offset 55. Use this cyberchef recipe to extract the timestamp.
How many total transferred bytes were recorded in the log?
The entries are each 16-bytes long starting at offset 59. The number of bytes transferred for each entry is a 4-byte integer that is at offset 12 from the start of each entry. You can use this cyberchef recipe to calculate the total transferred bytes. This recipe extracts the bytes for the entries, splits the data into 16-byte lines (one line per entry), extracts just the number of bytes transferred as an integer value, and then sums the total amounts.
How many unique IP addresses (both senders and receivers) are recorded?
The entries are each 16-bytes long starting at offset 59. The source and destination IP addresses are each a 4-byte integers at are at offsets 0 and 4 from the start of each entry. You can use this cyberchef recipe to calculate the number of unique IP addresses. This recipe extracts the bytes for the entries, splits the data into 16-byte lines (one line per entry), extracts the source + destination IP addresses, splits those into 4-byte lines (one line per IP address), converts each line into a human-readable IP addresses, and then gets a count of the number of unique IP addresses.
Which IP address sent the most amount of data?
Use awk
to parse the human-readable log.
awk '{sums[$1]+=$4} END {for (ip in sums) print sums[ip], ip}' merged.log | sort -n
How many total bytes were sent by the above IP address that sent the most amount of data?
Use the same steps as the previous question.
What was the busiest day (day with the most bytes transferred)?
Use awk
to parse the human-readable log.
awk '{sums[$3]+=$4} END {for (date in sums) print sums[date], date}' merged.log | sort -n
©️ 2024 Cyber Skyline. All Rights Reserved. Unauthorized reproduction or distribution of this copyrighted work is illegal.