HTTP Background
Understanding the basics of HTTP and web communication will be helpful to understanding some log analysis, networking and web security challenges. This guide it meant to give you the basic vocabulary and understanding to be able to learn more.
URLs
When you visit a website, such as Cyber Skyline, you will type “cyberskyline.com” into the address bar in your browser. Your browser then converts your input into a URL or Universal Resource Locator, like https://cyberskyline.com/. Your browser is doing this because web requests must be made using the URL format.
Components of a URL are labeled below and discussed below:
Protocol: The URL is typically HTTP (Hypertext Transfer Protocol) or HTTPS (Hypertext Transfer Protocol Secure). A protocol is the specific rules and guidelines necessary for two devices to communicate. The server and client need to use the same protocol to communicate, otherwise they will not be able to understand the communication from the other.
Host: This describes which website that the request is for. A lookup will be conducted with the Domain Name System (DNS) to get an IP address that the browser can visit. Your browser requires the IP address to actually facilitate the communication with the server.
Path: The specific page you are trying to obtain.
Query: Contains additional specific information in your request that is passed to the server. If you are using a search engine, information about the search is included in this portion in key=value pairs and are separated by &.
HTTP Requests/Responses
When your browser is ready to visit a website, it will send an HTTP request to the server. HTTP is a text-based protocol, so you can actually see what the request looks like:
On the first line of this request you can see GET, which is the HTTP method. There are multiple methods for HTTP with GET and POST being the most common. GET requests are typically made to request data (eg. viewing an image) whereas POST requests typically send or upload data to the server (eg. uploading an image). You can also see /contact, which is the page being requested and HTTP/1.1 which specify what version of HTTP is being used.
On the second and third lines are HTTP headers. These are automatically added by the browser to specify additional options with the request. In this case, it is specifying that it wants the response to be HTML (Hypertext Markup Language), which is the code that browsers can execute to display a webpage. It also specifies that the request is for www.cyberskyline.com, which is provided in case the web server is processing requests for multiple different websites.
The server will send a response in a similar fashion.
In the server response, you can see the first line again confirms the protocol. It also provides a number, 200, which is the HTTP status code for ‘OK’. If an error occurred, there would be a different status code.
After the first line, additional headers are provided by the server which provide extra information. This is then followed by the body, which is the actual HTML code that the browser can run. If this was a request for something different, such as an image, then the body would contain the data for the image.
Cookies
HTTP is a stateless protocol. This means that there is nothing in the protocol that would allow the server to recognize that two separate requests came from the same browser/person. This poses a challenge because if a website has a login page, your browser will send an HTTP request to log in and you will want the server to remember that you are logged in to continue to access the site.
HTTP cookies were developed to help overcome this problem. Cookies are small pieces of information that are stored in the browser. These cookies can store information about your login session (or other information for different purposes).
A typical implementation of session cookies involves generating a unique id (a series of characters/numbers) that uniquely references a login session with the website. When a user logs in, this session id is generated and the server will tell (via HTTP headers) the browser to save the session id as a cookie. Then, when the browser makes future requests, it will include that cookie in the HTTP headers. When the server reads the request headers, it will conduct a lookup in a database to identify what user that session is for. When a user logs out (or if the session expires), the session is deleted and the user will get a new session id when you log back in. This is a typical implementation, but developers can program their websites to work however they want.
If a website is using HTTP cookies, there are some security implications to consider. It is possible for an attacker to modify their cookies. This means that if an attacker somehow gets a copy of someone’s session id, it is possible for the attacker to use the session id of a victim to impersonate the victim. Additionally, attackers could manipulate a cookie to do things that the website developers may not have considered. This challenge involves one such example of that.
Learn more about JavaScript Cookies here: https://www.w3schools.com/js/js_cookies.asp
More Information
Websites utilize multiple different languages for different purposes: Hypertext Markup Language (HTML) for building the structure of the page, Cascading Style Sheets (CSS) for styling the page, and JavaScript (JS) for handling logic such as what to do on a button click.
Many website vulnerabilities will involve a misconfiguration (eg. a page that should be password protected but is not) or insecure code (eg. trusting that user-supplied data is always valid). Be mindful that there are many different paths to attack web servers and you should explore more about them by searching and learning from multiple different sources online. A good starting point is the OWASP Top 10.
Become job-ready by solving real-world challenges and build your professional cybersecurity skills with the National Cyber League.