Custom File Format

Prompt

We need to analyze a custom log format that we use to monitor data transfers on a secure network. Unfortunately, the parser was not committed to our code repository, but we do have a copy of the file format spec. See if you can use it and help us answer questions about the log file. Provide all date/timestamps as UTC.

Custom File Format.sky2.6KB

File Format

Overview

The SKY log file format was designed to record network traffic data in an efficient manner, utilizing raw binary data to save space compared to traditional text files. All fields in this document are written in Big-Endian notation.

Terms

  • A int is 32 bits, or 4 bytes
  • A long is 64 bits, or 8 bytes
  • A timestamp is a 32-bit Unix Timestamp, or 4 bytes

Specification

Header

The SKY header begins at offset 0 and is comprised of the following fields in order with no padding.

  1. Magic Bytes
  2. Version Number
  3. Creation Timestamp
  4. Hostname length
  5. Hostname
  6. Flag length
  7. Flag
  8. Number of entries

Magic Bytes

The magic bytes field is an 8-byte unique sequence used to identify this file as using the SKY format. All valid SKY files must begin with a single, 8 byte sequence: 0x91534B590D0A1A0A.

Version

The version field is a single byte. This document outlines the specification for version 1. All valid SKYv1 files must have the version field set to 0x01.

Creation Timestamp

The creation timestamp is a single timestamp used to denote the start of data collection within the log file.

Hostname Length

The hostname length is a single int used to denote how many bytes long the hostname is. This value is used to determine how many bytes to read for the following Hostname field. If no host is specified, this value should be 0x00.

Hostname

The hostname is a dynamic length string used to identify the name of the host the log file was created on. The byte-length of the hostname is the value from the previous Hostname Length field.

Flag Length

The flag length is a single int used to denote how many bytes long the flag is. This value is used to determine how many bytes to read for the following Flag field. If no flag is specified, this value should be 0x00.

Flag

The flag is a dynamic length string used to specify a flag value for the log file. The byte-length of the flag is the value from the previous Flag Length field.

Note: This field can be used to store both encoded/encrypted flags as well as plaintext flags. It is the responsibility of the parsing application to interpret this value and do any necessary conversions.

Number of entries

The number of entries field is a single int used to denote the number of items in the body.

Body

The body is a sequence of items, each with 4 required fields. All items are written in chronological order without any padding between items. Each item contains the following fields:

  1. Source IP
  2. Destination IP
  3. Timestamp
  4. Bytes transferred

Source IP

The IPv4 address of the sender of the data stream. Represented as an int.

Destination IP

The IPv4 address of the destination of the data stream. Represented as an int.

Timestamp

The timestamp when the data stream was initiated. Represented as a timestamp.

Bytes Transferred

The number of bytes transferred in the data stream. Represented as an int.

Walk-Through

This challenge involves following a specification to interpret and analyze a binary log file. Solving this challenge requires either using a data manipulation tool, such as cyberchef, or writing a custom script. If using cyberchef, it is possible to load in the log file using the “Open file as input” option.

The file format specification provided in the prompt provides all the necessary information to solve the questions. Using the specification to create a quick reference guide (listing out the different fields, their offsets, and their lengths) can make it easier to solve the questions. This can be created by stepping through the specification one field at a time with the length of each field being described in the specification and the offset being calculated as the sum of the lengths of all previous fields. A version of this quick reference is below:

Field
Offset
Length
Magic Bytes
0
8
Version
8
1
Creation Timestamp
9
4
Hostname Length
13
4
Hostname
17
14
Flag Length
31
4
Flag
35
20
Number of entries
55
4
Body
59
2592

The first few questions can be solved entirely using cyberchef. However, the later questions which involve analysis on the entries within the log file require additional data processing that would be easier using Linux command line tools. This cyberchef recipe can be used to create a column-formatted hex version of the log. The recipe extracts the bytes for the entries, splits the data into 16-byte lines (one line per entry) and then splits those into 4-byte columns. Save this result into a filed named hex.log.

Once the log file is in a text format, it becomes easier to convert the data one column at a time and then rejoin the columns into a completed, human-readable log.

Column 1

  1. Extract the first column from hex.log
  2. cat hex.log | cut -d " " -f 1
  3. Convert the hex values into IPv4 addresses using cyberchef
  4. Save the results to a file named col1.log

Column 2

  1. Extract the second column from hex.log
  2. cat hex.log | cut -d " " -f 2
  3. Convert the hex values into IPv4 addresses using cyberchef
  4. Save the results to a file named col2.log

Column 3

  1. Extract the third column from hex.log
  2. cat hex.log | cut -d " " -f 3
  3. Convert the hex values into dates using cyberchef
  4. Save the results to a file named col3.log

Column 4

  1. Extract the fourth column from hex.log
  2. cat hex.log | cut -d " " -f 4
  3. Convert the hex values into integers using cyberchef
  4. Save the results to a file named col4.log

Combine the logs into a single, human-readable log named merged.log

paste col1.log col2.log col3.log col4.log > merged.log

Questions

What is the hostname of the server?

  1. Calculate the length of the hostname, which is represented as a 4-byte integer at offset 13. Use this cyberchef recipe to calculate the length of the hostname, which is 14.
  2. Extract the hostname, which is an ASCII string of length 14 at offset 17. Use this cyberchef recipe to extract the hostname.

What is the plaintext flag in the log file?

  1. Calculate the length of the flag, which is represented as a 4-byte integer at offset 31. Use this cyberchef recipe to calculate the length of the flag, which is 20.
  2. Extract the flag, which is an ASCII string of length 20 at offset 35. Then, perform a base64 decode. Use this cyberchef recipe to extract the flag.

On what date was the file created (in UTC)?

The date is a 4-byte timestamp at offset 9. Use this cyberchef recipe to extract the timestamp.

How many entries are in the log file?

The number of entries is a 4-byte integer at offset 55. Use this cyberchef recipe to extract the timestamp.

How many total transferred bytes were recorded in the log?

The entries are each 16-bytes long starting at offset 59. The number of bytes transferred for each entry is a 4-byte integer that is at offset 12 from the start of each entry. You can use this cyberchef recipe to calculate the total transferred bytes. This recipe extracts the bytes for the entries, splits the data into 16-byte lines (one line per entry), extracts just the number of bytes transferred as an integer value, and then sums the total amounts.

How many unique IP addresses (both senders and receivers) are recorded?

The entries are each 16-bytes long starting at offset 59. The source and destination IP addresses are each a 4-byte integers at are at offsets 0 and 4 from the start of each entry. You can use this cyberchef recipe to calculate the number of unique IP addresses. This recipe extracts the bytes for the entries, splits the data into 16-byte lines (one line per entry), extracts the source + destination IP addresses, splits those into 4-byte lines (one line per IP address), converts each line into a human-readable IP addresses, and then gets a count of the number of unique IP addresses.

Which IP address sent the most amount of data?

Use awk to parse the human-readable log.

awk '{sums[$1]+=$4} END {for (ip in sums) print sums[ip], ip}' merged.log  | sort -n
Create a hashtable where the keys are the source IP addresses (column 1), sum the number of bytes transferred (column 4) for each source IP address, print the result, and then sort by the most bytes transferred.

How many total bytes were sent by the above IP address that sent the most amount of data?

Use the same steps as the previous question.

What was the busiest day (day with the most bytes transferred)?

Use awk to parse the human-readable log.

awk '{sums[$3]+=$4} END {for (date in sums) print sums[date], date}' merged.log  | sort -n
Create a hashtable where the keys are the dates (column 3), sum the number of bytes transferred (column 4) for each date, print the result, and then sort by the most bytes transferred.

©️ 2024 Cyber Skyline. All Rights Reserved. Unauthorized reproduction or distribution of this copyrighted work is illegal.