Breaking Web Applications for Beginners

The story so far: In the beginning the Internet was created—without any security. This has resulted in countless headaches and been widely regarded as a bad move. ― DOUGLAS ADAMS (MOSTLY)

Prerequisites

While this guide doesn’t require extensive web development knowledge, it does require a basic understanding of HTTP. In particular:

You’ll need to know how to view basic pieces of information such as cookies and headers.
You should be able to read simple HTML.
For some challenges, you’ll need at least a basic understanding of JavaScript.
You should be comfortable researching other common web technologies as they appear in challenges, such as SQL, MongoDB, and regular expressions. You don’t need to have in-depth knowledge of each as long as you’re able to learn as you go.

These are fairly extensive prerequisites for a beginners’ guide. Web application exploitation is difficult to learn without prior exposure to web development; there isn’t an easy way around that. As such, this guide is aimed at people with at least some web development experience, and it doesn’t go in-depth on specific terms or how to view information like cookies and headers.

A Quick History Lesson

A long time ago in a galaxy far, far away, the internet began as a series of interconnected computers at a handful of organizations. These organizations were in search of a better way to communicate with each other and share resources.

Each organization had its own local area network (LAN); communication between networks was achieved by running long wires to connect the various LANs, forming wide area networks (WANs). Gradually, the various independent WANs merged to form the internet we know today.

Various protocols were created to standardize communication; this became increasingly important as more networks joined the internet. One such protocol, HTTP, forms the basis for the web. The basis for HTTP is simple:

I have useful documents that you want.
I put those documents on a computer that I keep online 24/7; this is known as a web server.
Your computer can contact my computer whenever it desires to request documents.
If my computer has the document you’re requesting, it will respond with the document.

HTTP initially extended this a bit with some useful features like links. One document could link to another using a URL; a fully-formed (absolute) URL contains the unique name of the web server that holds a document, as well as the path to the document on the web server. By following a link in one document, a web browser could request another document. It was originally designed to be somewhat similar to Wikipedia: each page was ore or less static; two different people requesting the same document at the same time would receive the same document. Documents wouldn’t change unless someone manually changed them.

Over time, this protocol was expanded into the web we know today. The basic premise is mostly still the same: your computer requests a document from a web server, and the web server responds with the document. These days, most websites are no longer serving simple, static text documents; they’re service complicated dynamic content—in other words, the “documents” are generated on-the-fly, and they don’t really represent documents anymore—instead, they’re meticulously-designed visual masterpieces with interactive components, and two people requesting the same “document” won’t actually see the same content. For example, when I view my cart on Amazon, I’m going to see different content than when you view your cart.

There are many more details to how the internet and web came to be, but, when dealing with security, it’s important to understand the discrepancies between how both were intended to be used and how they are actually used today:

They were designed at a time when information on the internet wasn’t really gated; everyone had access to everything, and there was little concept of security. Access controls and other security features were hacked onto the existing protocols later, often in a less-than-ideal manner.
To this day, most web services still operate on the old request-response model: one computer makes a request for a document, and another responds. While we’ve managed to make it work, this is ideal for static documents, not dynamic, interactive content.
Web traffic passes across the internet, and, in doing so, through a number of intermediary organizations. Some of these organizations may be using older devices that still treat the web as a means of serving static documents.

Performing web application exploitation

Web application exploitation can be divided into three basic phases:

Reconnaissance
Identifying weak points
Executing the attack

As you solve a challenge, you’ll likely need to iterate over these phases multiple times. For example, if your first attack grants you access to an administrator account, you’ll want to go back to the reconnaissance phase to evaluate the newly exposed attack surface before performing your next attack.

Easy challenges typically only require one or two iterations, while hard challenges often have many steps and require numerous iterations. If you get stuck in one phase—usually that happens on phase 3—don’t be afraid to fall back to a previous phase. Work smarter, not harder.

Cyber Skyline typically doesn’t allow automated tools on web application exploitation challenges. In the real world, automated tools are sometimes used, particularly during the reconnaissance phase, but they have the disadvantage of being noisy and imprecise. Cyber Skyline’s web challenges are designed to be done by hand. Each challenge will make this clear in the descriptions; should you ignore this rule, you’ll quickly find that automated tools won’t help you, and you’ll likely be disqualified.

Phase 1: Reconnaissance

The first step in attacking a website is to evaluate the attack surface and determine what might be vulnerable. The exact process for this can vary depending on the target, but a typical manual process might involve several steps:

Navigate around the website a bit and learn the flow between different pages:

Is it a single page or multiple pages?
Are there any forms you can submit?
If the entire site appears to require authentication, are there any obscure pages you might be able to access without authentication, such as password reset forms or license pages?

Take a look at information that isn’t immediately visible:

An important URI to check on each site is /robots.txt. robots.txt lists pages that crawlers like GoogleBot aren’t supposed to index, sometimes because they contain sensitive information, such as an administrative login page. If /robots.txt is present, it may contain pages that you didn’t find in your initial browse through the site.
Check the headers that are sent with each request. The Server header often contains the name and version of the public-facing web server, and X-Powered-By often contains information about the backend technology. For example, you might see references to Nginx in the Server header and Express.js in the X-Powered-By header.
Did the site set any cookies in your browser?
Are there hidden links on any of the pages—either commented-out or hidden with CSS?
Are there useful identifiers in hidden form inputs, URLs, or comments?
Are there any usernames listed anywhere?

If any forms are available, try submitting them, but don’t spend too much time attacking yet—just try to get a feel for how the forms behave.

Focus on sending deliberately invalid data or other data that the server might not expect. For example, try omitting various fields.
Try to get different responses from the servers. In particular, aim for 5XX errors, since those indicate a bug in the logic that processes the form.
Try various forms of injection. SQL is popular on a lot of websites, so plug values like ' and " into forms to see if you can get a different response; those will often break poorly-written SQL. If there are any search forms, try SQL wildcards like % and _.
If there are any cookies or querystrings you can alter, try tinkering with those, with one exception: Cyber Skyline often sets a tid cookie and querystring. That value isn’t part of the challenge itself, and if you tinker with it, the flag you eventually receive will be invalid. It’s also grounds for disqualification, as will be reiterated in bold on each challenge that uses a tid.

Phase 2: Identifying weak points

Easy and medium challenges typically have a limited attack surface, and a skilled player will know what their intended target is even if they don’t know how to go about breaking in. Hard challenges will often have a largest attack surface, but it’s still important to narrow down the list of potential weak points.

If there’s more than one question on the challenge, go through them in order. This is very important. The earlier questions are typically meant to guide you toward your ultimate target and ensure that you’re focusing on the correct elements of the challenge. This is particularly important when you’re just getting started with web application exploitation and aren’t able to quickly hone in on your target.
If you were able to identify a database engine, there’s a good chance you’re going to need to perform some manner of injection. That’s probably your target.
If there’s a lot of client-side processing in JavaScript—e.g., regular expressions—you can likely tinker with it to achieve a desired result. This will require knowledge of JavaScript, however.
If you’ve found hidden pages, such as an admin login in /robots.txt or in a hidden link, there’s a good chance that those are your targets, though there may be red herrings on harder challenges.
Password reset forms are always ripe for abuse. Even if you can’t break the form itself, it’s not unusual for them to disclose private information, such as whether or not a given user exists in the database, or the email that belongs to a user.
If there are numeric or otherwise predictable IDs with which you can tinker, you’ll likely be able to at least expose new attack surfaces or glean additional information by guessing valid IDs.
If any cookies have been set, they likely play a significant role in the challenge.
Have you found any APIs? Those might be a good starting point.
Are any technologies in use that might have exposed interfaces at specific URLs? For example, sometimes people using git accidentally upload their git configuration to /.git/config.

Harder challenges tend to be more realistic; the potential attack surface is larger, and completing the challenge often requires multiple exploits. To make matters more difficult, there will often only be one or two questions, meaning that there’s very little guidance. For such challenges, it’s important to spend extra time identifying the best points of attack before spending a lot of time executing attacks. Don’t be afraid to go back to the reconnaissance phase.

Phase 3: Executing the attack

This is the most open-ended phase. The exact procedure varies widely from challenge to challenge and season to season. You should only start this phase once you’re fairly certain you’ve narrowed down your list of weak points to just one or two.

Even though the specifics of this phase are unique to each challenge, there are some basic guidelines you can follow to make the process easier:

If you’re attacking a specific technology, such as SQL or Express.js, spend some time researching it first. For example, if your target is SQL, try running the following queries in a search engine:

OWASP SQL – OWASP often has summaries of common attack vectors for specific technologies.
SQL injection – This isn’t applicable for every technology, but it’s always worth checking.
SQL vulnerabilities – This has a risk of yielding results that are too generic.
SQL security pitfalls – This might reveal articles that are written from the perspective of a web developer; they’re often good places to get additional ideas if you’re not sure how to attack something.
SQL security news – If the challenge is modeled after a real-world vulnerability that was widely publicized, it’s worth checking recent events.

Don’t brute force it. If you’re not at least seeing interesting variation in responses from the web server, move onto a different weak point or fall back to an earlier phase.
If you’re pretty sure you’ve found the correct weak point and know roughly how to attack it, but your attacks aren’t quite working, take a break.
For multi-step challenges, you may be skipping to a weak point that you don’t have enough information to attack yet. If that’s the case, switch to a different weak point, then come back to the current one once you have more knowledge of the system.
If the challenge has multiple questions, ensure your target makes sense in the context of the earlier questions. Multiple questions are often there to guide you in the right direction.
Keep an eye on the headers sent in each response. Even if you don’t get in the way you intended, there might be useful information in the headers, such as new cookies or debug messages.

If your attack was successful and you’re on the final stage of the challenge, congratulations! Otherwise, you likely have new information that you can use for further attacks. Return the first phase and iterate again.

Published by PressSpace2Hack