Contain Yourself: Staying Undetected Using the Windows Container Isolation Framework
Introduction
This blog is based on a session we presented at DEF CON 2023 on Friday, August 11, 2023, in Las Vegas: Contain Yourself: Staying Undetected Using the Windows Container Isolation Framework
The use of containers is an integral part of any resource-efficient and secure environment. Starting with Windows Server 2016, Microsoft released its own version of this solution, Windows Containers, which offers process and Hyper-V isolation modes. The presentation covered the basics of Windows containers, broke down its file system isolation framework, reverse-engineered its main mini-filter driver, and detailed how it can be utilized and manipulated by a bad actor to bypass EDR products in multiple domains.
Here are the two “isolation modes” a Windows container can run on:
- Process Isolation Mode (also referred to as Windows Server containers): User-mode isolation where the container interacts with the host kernel directly. Each container instance is isolated from the host through namespaces and resource control. Think Linux containers.
- Hyper-V Isolation Mode (also referred to as Hyper-V containers): Kernel-level isolation that provides each container with its own Hyper-V virtual machine. The presence of the virtual machine provides hardware-level isolation between each container as well as the container host.
In both cases, there should be efficient file system separation and each container should be able to access system files and write changes that will not affect the host. Copying the main volume for each container launch would be storage-inefficient and impractical.
This technology caught our attention for several reasons:
- Containers and virtualization solutions are everywhere, and their internal workings are not well documented.
- Bad actors search for ways to escape containers. The idea of intentionally entering one to evade security products has yet to be explored.
- This framework doesn't require any prerequisites and comes as default in every modern Windows image (at least the piece being abused).
- We love reverse engineering!
How does a Windows Container Work?
Before we dig into the framework internals, let's explore how Windows provides isolation between containers.
Jobs
Job objects have been around since the days of Windows Server 2003. These objects are designed to group several processes and manage them as one unit. This allows the system to control the attributes of all processes associated with a job, like limiting their CPU usage, I/O bandwidth, virtual memory usage, and network activity. Multi-processed applications often use these objects to manage their child processes more easily (known as “Nested Jobs.”)
Although they make a good start, jobs themselves are not enough to provide the isolation needed for a container, which is why Microsoft created silos.
Silos
Silos can be considered an extension of jobs (kind of “super jobs.”) Like traditional jobs, these objects are used for process grouping with additional features. Containers use a type of silo called “Server Silo.” These provide basic job capabilities, as well as redirection of various system resources like the registry, networking, and the object manager.
The Windows kernel detects processes assigned to silos using APIs like PsIsCurrentThreadInServerSilo and PsIsProcessInSilo.
File System Redirection Using Reparse Points
Reparse points are MFT attributes that can be given to files or directories. These attributes store user-defined data that is then parsed by a file system mini-filter driver that intercepts the I/O request and handles it accordingly. Each reparse point also contains a tag that is used to uniquely identify the data it is storing.
A good example of these attributes can be seen in junctions and symbolic links — a directory that functions as a symbolic link to another directory and contains a behind-the-scenes reparse point with the path to the correct destination. The I/O manager handles I/O requests to files/directories containing these tags and redirects them.
As we’ll see, containers use these points to create a division between their dispensable volumes and the hosts.
Note: Don’t confuse reparse points with shortcuts (i.e. .lnk files), which work differently.
Containers File System Separation
In this blog post we don’t go in-depth about how containers are initialized and operate while running since this has already been detailed in these great articles by Alex Ilgayev and James Forshaw:
Playing in the “Window” Sandbox
Who Contains the Containers?
Instead, we’ll focus on how the OS separates the file system from each container to the host and avoids duplication of system files.
To avoid an additional copy of the OS files each container is using a dynamically generated image, which points to the original using the reparse points.
The result is images that contain “ghost files,” which store no actual data but point to a different volume on the system. It was at this point that the idea struck me — what if we can use this redirection mechanism to obfuscate our file system operations and confuse security products?
This does not escape the container from within but intentionally uses this feature while executing on the host.
Mini-filters Background
Mini-filter drivers were designed to make the I/O filtering process much easier for developers. Since implementing a legacy filter driver from scratch is difficult, Microsoft provided a solution in the form of its’ filter manager, a legacy filter that manages other “mini” filter drivers and takes care of all the heavy lifting for them, like their insertion to the device stack, ignoring any irrelevant requests, and the support for multiple platforms. It also exposes a mini-filter-dedicated API that implements the common operations used by these drivers (the Flt API).
Each mini-filter can be attached by the manager to one or more volumes, creating what is called a “mini-filter instance.” Similar to legacy filters, mini-filter instances can intercept the Pre and Post operations of numerous I/O functions like Create, Read, and Write.
Another important concept the filter manager implemented is the mini-filter altitude system. Each mini-filter should specify an altitude — a value between 20000 and 429999 — upon its registration to the manager.
This range is split into groups and each group is associated with a certain type of mini-filter. For example, 320000 - 329999 is the range for security vendors' drivers, while 140000 - 149999 is for encryption-related drivers.
The filter managers invoke its mini-filter operation callbacks according to their altitudes. A higher-altitude driver will handle the pre-operation before the ones below it and the post-operation after.
Wcifs.sys
Note: From here on all the information provided is undocumented by Microsoft and was gathered by reverse-engineering the driver.
The Windows Container Isolation FS (wcifs) mini-filter driver is responsible for the file system separation between Windows containers and their host. This is the driver that handles the ghost files redirection, and it does this by parsing their attached reparse points.
During my research, I was surprised to find that this driver is loaded on every Windows OS starting from Windows 10, including servers, by default. This is true even when the “containers” option is turned off in the Windows features menu.
The main reparse tags associated with this driver are IO_REPARSE_TAG_WCI_1 and IO_REPARSE_TAG_WCI_LINK_1, which according to Windows documentation are, “used by the Windows Container Isolation filter. Server-side interpretation only, not meaningful over the wire.”
The reparse point data structure for these two tags looks like this:
struct WcifsReparseDataBuffer
{
/*0*/ ULONG Version;
/*4*/ ULONG Reserved;
/*8*/ GUID Guid; // hardcoded value (8264f677-40b0-4ca5-bf9a-944ac2da8087)
/*24*/ USHORT PathStringLength;
/*26*/ wchar_t PathStringBuffer[100];
};
struct ReparseDataBuffer
{
/*0*/ ULONG ReparseTag;
/*4*/ USHORT ReparseDataLength;
/*6*/ USHORT UnparsedNameLength;
/*8*/ WcifsReparseDataBuffer InternalBuffer;
};
Let’s see how the driver handles these points.
Note: This driver plays a small role in an extensive framework, containing multiple components. We will not research how these tags operate under a traditional container operation, but only this driver’s raw implementation for these particular cases.
Mini-Filters and Reparse Points
The common reparse point parsing flow is as follows:
- The I/O manager builds an IRP_MJ_CREATE request packet that comes down the device stack of the corresponding file system.
- The file is read from the file system and the request comes up the stack in the opposite direction.
- The file system driver recognizes that a file with a reparse point was opened and changes the status of the request to STATUS_REPARSE, leaving it to other drivers up the device stack for further processing.
- The request eventually comes to the filter manager driver, which invokes the POST_CREATE callbacks of its registered mini-filters according to their altitudes, from bottom to top.
- In its POST_CREATE callback, a mini-filter responsible for a reparse tag calls FltFsControlFile with the FSCTL_GET_REPARSE_POINT control code which reads the reparse data itself from the MFT attribute.
- If the reparse tag located in the reparse data header is not associated it ignores the request and leaves it to the drivers above it.
- If it is, the mini-filter usually replaces the request’s file object using IoReplaceFileObjectName and FltSetCallbackDataDirty. This will cause the I/O manager to 'reparse' the name in the file object and pass the request back down with the correct values.
wcifs.sys is no different. In its POST_CREATE the driver parses and handles requests that are returned with STATUS_REPARSE.
Our first step is to attach the mini-filter to the main volume, attempt to open a file with one of its tags, and see how it gets parsed in the POST_CREATE callback. Unfortunately, when debugging this driver, I was unable to invoke this callback at all — even when the driver was correctly attached to the volume.
For the POST_CREATE function to execute, a mini-filter must return either FLT_PREOP_SUCCESS_WITH_CALLBACK or FLT_PREOP_SYNCHRONIZE in its PRE_CREATE function. It seems that some conditions were not met, and the filter decided not to continue handling the request after first inspecting it.
Following a swift look at the PRE function, we discovered the function containing those conditions, called WcUnionsExistsForInstance:
The function fails, which results in the FLT_PREOP_SUCCESS_NO_CALLBACK callback status being returned, causing the filter manager to ignore this driver’s POST_CREATE callback.
Stepping inside, we see two requirements that need to be met. The function checks whether the current thread is associated with the “host silo,” which is equivalent to the host OS. In other words, the driver checks if the current thread is executing in a server silo and will exit otherwise.
Simply executing inside a server silo is not enough, because the second requirement is whether this silo has a union context registered in the driver’s internal collections (notice how the check is performed on the file object and not the current thread itself; this behavior is explained in this article):
Context management is another feature provided by the filter manager. A mini-filter can create custom-defined data, known as union contexts, and link it to objects like files, instances, and silos using the Flt API.
To recap, for this driver to handle our CreateFile request, we need to achieve the following:
- Create a silo and insert our process into it.
- Inform the driver that our silo is representing a container so it will create a union context and refer to it accordingly.
Registering a Container
The first requirement is pretty straightforward. We need to create a job using CreateJobObjectW, convert it to a silo using SetInformationJobObject with the JobObjectCreateSilo class, and assign our current process to it using AssignProcessToJobObject.
The second one is a bit trickier. Communication with mini-filters is done via the FltSendMessage function, which sends the driver a custom data structure from its user-mode clients. Conveniently, the wcifs driver offers several functions to its client and one of them is an option called SetUnion. (MessageCode = 0), which registers a container.
The structure the driver expects looks like this:
struct WcifsPortMessageSetUnion
{
/*0*/ DWORD MessageVersionOrCode;
/*4*/ DWORD MessageSize;
/*8*/ DWORD NumberOfUnions;
/*12*/ wchar_t InstanceName[50];
/*112*/ DWORD InstanceNameLength;
/*116*/ DWORD ReparseTag;
/*120*/ DWORD ReparseTagLink;
/*124*/ DWORD Unknown;
/*128*/ HANDLE SiloHandle;
/*136*/ char UnionData[];
};
struct WcifsPortMessage
{
/*0*/ DWORD MessageCode;
/*4*/ DWORD MessageSize;
// While MessageCode=0, MessageData should be WcifsPortMessageSetUnion
/*8*/ char MessageData;
};
The UnionData[] field contains information about the source and destination volumes the container works with:
//The UnionData[] field holds one VolumeUnion & ContainerRootId per volume
struct ContainerRootId
{
/*0*/ USHORT Size;
/*2*/ USHORT Length;
/*4*/ USHORT MaximumLength;
/*6*/ wchar_t Buffer[23];
};
struct VolumeUnion
{
/*0*/ GUID Guid; // hardcoded value (8264f677-40b0-4ca5-bf9a-944ac2da8087)
/*16*/ BOOL IsSourceVolume;
/*20*/ DWORD OffsetOfVolumeName; // This points to a ContainerRootId structure
/*24*/ WORD SizeOfVolumeName;
/*26*/ WORD GuidFlags;
};
If built correctly, the silo will be registered and a silo context storing data about the container will be created, causing the checks at the PRE_CREATE to pass and the POST_CREATE to be invoked.
An example of what a valid structure will look like this:
struct WcifsPortMessage
{
/*0*/ DWORD MsgCode = SetUnion; // SetUnion = 0
/*4*/ DWORD MsgSize = sizeof(WcifsPortMessage);
/*8*/ WcifsPortMessageSetUnion Message;
};
struct WcifsPortMessageSetUnion
{
/*0*/ DWORD MessageVersionOrCode = 1;
/*4*/ DWORD MessageSize = sizeof(WcifsPortMessageSetUnion);
/*8*/ DWORD NumberOfUnions = 2;
/*12*/ wchar_t InstanceName[50] = L"wcifs Instance";
/*112*/ DWORD InstanceNameLength;
/*116*/ DWORD ReparseTag = IO_REPARSE_TAG_WCI_1;
/*120*/ DWORD ReparseTagLink = IO_REPARSE_TAG_WCI_LINK_1;
/*124*/ DWORD Unknown;
/*128*/ HANDLE SiloHandle;
/*136*/ VolumeUnion SourceVolumeUnion;
/*164*/ VolumeUnion TargetVolumeUnion;
/*192*/ ContainerRootId SourceVolumeContainerRootId;
/*244*/ ContainerRootId TargetVolumeContainerRootId;
};
Now we can look at how the driver parses its tags.
IO_REPARSE_TAG_WCI_LINK_1
As mentioned, one of the tags this driver handles is IO_REPARSE_TAG_WCI_LINK_1. As its name suggests, this tag acts as a regular link between two files. The driver reads the path at the WcifsReparseDataBuffer.PathStringBuffer and redirects to it at the volume the container directs to using the IoReplaceFileObjectName function. For example, if the container redirects from \Device\HarddiskVolume5 to \Device\HarddiskVolume3 and \Device\HarddiskVolume5\source\file.txt holds a reparse point with the IO_REPARSE_TAG_WCI_LINK_1 tag and \dest\file.txt as the destination path, the driver will set the file object name to \Device\HarddiskVolume3\dest\file.txt and this will be the file the returned handle will refer to.
IO_REPARSE_TAG_WCI_1
The second tag we will look at is far more interesting. When encountering the IO_REPARSE_TAG_WCI_1 tag, the driver saves the reparse data in the file object’s context and launches a work item that further handles the request. According to the driver symbols, this work item is responsible for file and directory “expansion.”
”Expansion” is this driver's definition of “copy-on-open protection.” When a process inside a container accesses a file with this tag the driver automatically copies it into the source volume (i.e., the container’s “ghost image”) so it would edit a copy of the file instead of the original (by overriding the file’s data).
This copy is done via FltReadFile and FltWriteFile:
Using our previous example, if we will swap the tag on \Device\HarddiskVolume5\source\file.txt to IO_REPARSE_TAG_WCI_1 and try to open it, the contents of \Device\HarddiskVolume3\dest\file.txt will be copied into it by the driver and the handle to the now copied file will be returned.
Another thing to note about this tag, when the expansion fails because the destination file cannot be found, the driver initiates a new I/O operation using FltPerformSynchronousIo that deletes the source file:
Copying a File
Another feature the driver offers to its clients using the FltSendMessage function is to copy & paste a file.
By building the following structure and sending it to the driver with MessageCode=4 the driver will read the source file and write to the destination (again, using FltReadFile and FltWriteFile):
struct WcifsPortMessageCopyFileHandler
{
/*0*/ DWORD MessageVersionOrCode;
/*4*/ DWORD MessageSize;
/*8*/ wchar_t InstanceName[50];
/*108*/ DWORD InstanceNameLength;
/*112*/ DWORD ReparseTag;
/*116*/ DWORD OffsetToSourceContainerRootId;
/*120*/ DWORD SizeOfSourceContainerRootId;
/*124*/ DWORD OffsetToTargetContainerRootId;
/*128*/ DWORD SizeOfTargetContainerRootId;
/*132*/ DWORD OffsetToSourceFileRelativePath;
/*136*/ DWORD SizeOfSourceFileRelativePath;
/*140*/ DWORD OffsetToTargetFileRelativePath;
/*144*/ DWORD SizeOfTargetFileRelativePath;
/*148*/ char UnionData[]; // 2*ContainerRootId + source & target relative paths
};
Note: Unlike the copy operation done using the IO_REPARSE_TAG_WCI_1 reparse tag, which requires the target file to exist, here the target file must not be present on the file system (otherwise the operation will fail with STATUS_NAME_COLLISION).
Utilizing the Framework
So, we have a process running inside a fabricated container and a mini-filter that handles our I/O requests in an unusual way. What’s next?
In the previous sections, we’ve seen how the wcifs driver can be abused to create, read, write, and delete files using the following kernel primitives: FltCreateFile, FltReadFile, FltWriteFile, and FltPerformSynchronousIo. It turns out there is a hidden benefit for performing these operations from within the kernel itself due to the way these functions work behind the scenes.
In the MSDN documentation of these four functions, you can see the following remark:
[function] causes the request to be sent to the minifilter driver instances attached below the initiating instance and to the file system. The specified instance and the instances attached above it do not receive the request.
Our driver sits at a lower altitude (189900, which can be even lower if changed manually) than security vendors' drivers (320000 - 329999), meaning we can create, read, write, and delete files on the file system without their callbacks triggering — bingo!
Let’s see how security vendors use their mini-filters and what can be detoured:
Ransomware/Wiper Detection Algorithms Bypass
File system write protection is an essential feature any EDR must provide. Ransomware can cripple entire organizations, costing their victims millions, while file wipers were proven as an effective way to disable vital infrastructures in times of war (as seen in the Russian-Ukranian conflict).
To combat these threats, security vendors tend to use their own mini-filter drivers to monitor the system’s I/O activity. Algorithms based on this log source look for certain patterns to detect file system-based malware and prevent them before any irreversible damage is done. For example, a process that opens many existing files and writes to them will be classified as ransomware/wiper, depending on the data written.
This is where our driver comes into play. Because we can override files using the IO_REPARSE_TAG_WCI_1 reparse tag without the detection of antivirus drivers, their detection algorithm will not receive the whole picture and thus will not trigger.
An example of a simple wiping algorithm using the driver will go like this:
- Create an empty file that will be our target file. Write a buffer of zeros/random data to it.
- Traverse all files on the system and set an IO_REPARSE_TAG_WCI_1 reparse point that will point to our target file.
- Create a silo, assign the current process to it, and register it as a container to wcifs where both source and target volumes are the main one (\Device\HarddiskVolume3).
- Traverse all files on the system again and open each one using CreateFile. The files will be overridden with the target file data by the wcifs driver.
A Ransomware algorithm would be similar:
- Traverse each file on the system and for each:
- Read its content and encrypt in-memory.
- Create a target file and write the encrypted data to it — will be ignored by security mini-filter because the data is written to a new file and not overriding existing content.
- Set an IO_REPARSE_TAG_WCI_1 reparse point on the source file that will point to the target file.
- Create a silo, assign the current process to it, and register it as a container to wcifs where both source and target volumes are the main one (\Device\HarddiskVolume3).
- Traverse all files on the system again and open each one using CreateFile. The files will be overridden with the target file data by the wcifs driver.
DLP Bypass – Write to Read-Only Devices and Directories
Another feature of security vendor products is to block write operations on certain directories/volumes, which can be utilized in several ways. For example, organizations often determine a read-only policy for removable devices to avoid data exfiltration or block file writes to folders containing sensitive data.
This write protection is implemented by (you guessed it) a mini-filter driver. By using wcifs’ copy & paste feature this protection can be bypassed as well.
Besides bypassing mini-filters, there are other side effects of not going the traditional route when performing I/O operations:
ETW-Based Correlations Bypass
ETW (Event Tracing for Windows) is a powerful and efficient logging mechanism built into the Windows operating system. The Windows kernel serves as a crucial log provider that captures a wide range of system operations, including those related to the file system. Security vendors leverage these events to analyze and identify potential threats, often create attack flows by cross-referencing.
Going back to the IO_REPARSE_TAG_WCI_1 tag override process, the read and write operations occur within a kernel work item. Executing from a work item, which is a kernel thread, will cause the ETW log to attribute these actions to the system process (PID 4) instead of the actual responsible process. This will lead to misinformation for any vendor consuming events number 15 (Read) and 16 (Write) from the Microsoft-Windows-Kernel-File provider, bypassing any threat hunting correlation based on these events.
An example of the built-in Windows feature that is ETW-Based is SACL. Windows offers the capability to establish an auditing policy for file system objects, known as the System Access Control List (SACL). This allows for extensive logging of all I/O operations performed on the specified objects.
ETW-based Windows tools are intentionally designed to disregard logs originating from the system. This approach guarantees that such logs, which are typically irrelevant to a user monitoring the system, are not included to avoid unnecessary overhead.
The result of all of this is that our I/O requests will be absent from the logs altogether.
CreateProcessNotifyRoutine Bypass
The Windows kernel offers the ability to get process creation/destruction notifications to any interested driver. This allows drivers to keep track of processes in the system, and in the case of security product’s drivers, scan created processes and verify they do not impose a threat.
The callback routine has the following prototype:
void
PcreateProcessNotifyRoutineEx(
_Inout_ PEPROCESS Process,
_In_ HANDLE ProcessId,
_Inout_opt_ PPS_CREATE_NOTIFY_INFO CreateInfo
);
The CreateInfo parameter contains the image file name of the created process and its command line.
Back to our driver: the kernel offers three different syscalls for creating a process: NtCreateProcess, NtCreateProcess, and NtCreateUserProcess. All three are exports of ntdll.dll and can be called directly from any user-mode program. The first two are almost identical and allow the creation of a process using a given section handle while the third one, NtCreateUserProcess, is a bit different:
NTSTATUS
NTAPI
NtCreateUserProcess(
_Out_ PHANDLE ProcessHandle,
_Out_ PHANDLE ThreadHandle,
_In_ ACCESS_MASK ProcessDesiredAccess,
_In_ ACCESS_MASK ThreadDesiredAccess,
_In_opt_ POBJECT_ATTRIBUTES ProcessObjectAttributes,
_In_opt_ POBJECT_ATTRIBUTES ThreadObjectAttributes,
_In_ ULONG ProcessFlags,
_In_ ULONG ThreadFlags,
_In_ PRTL_USER_PROCESS_PARAMETERS ProcessParameters,
_Inout_ PPS_CREATE_INFO CreateInfo,
_In_ PPS_ATTRIBUTE_LIST AttributeList
);
This function gives us the option to provide the new process' image file path in the ProcessParameter argument, which will then be opened from the kernel itself, instead of an open section handle.
By giving a path to a file that contains an IO_REPARSE_TAG_WCI_LINK_1 reparse point, the CreateInfo parameter sent to all process creation callbacks will hold its path and command line, while the actual file being opened is the one the reparse point redirects to.
The flow of events goes like this:
- Set an IO_REPARSE_TAG_WCI_LINK_1 reparse point on a benign file that points to a malicious one.
- Create a silo, assign the current process to it, and register it as a container to wcifs where both source and target volumes are the main one (\Device\HarddiskVolume3).
- Create a new process using NtCreateUserProcess, with the benign file image file path.
- The process creation notification callback for all registered drivers will trigger, containing the image path and command line of the benign file.
- The kernel will open the benign file and wcifs will intercept the reparsed request and redirect it to the malicious file.
- The process will be created using the malicious file image.
Prerequisites and Limitations
- Administrative permissions are required for communicating with the wcifs driver.
- It is not possible to set reparse points to files without WRITE primitives, meaning system files cannot be altered.
- It is not possible to set a reparse point while inside a silo.
- For the CreateFile + IO_REPARSE_TAG_WCI_1 call to succeed, the process token must hold the SeManageVolumePrivilege privilege.
- While copying a file using wcifs, the target file must not be present on the file system (meaning you cannot override files using this method).
- The driver must be attached to any volume it works with (source or target).
Mitigation
There are several routes security vendors can take to detect malicious usage of wcifs:
- Detect calls to DeviceIoControl + FSCTL_SET_REPARSE_POINT + IO_REPARSE_TAG_WCI_1 / and check for the IO_REPARSE_TAG_WCI_1 tag in the PRE_WRITE callback. Scan files with the tag in the PRE_CLEANUP function even if they were not altered.
- Check if wcifs' communication port is opened by processes that are not valid system processes.
- Check if a container is opened where the source volume is equal to the destination volume.
- Check if wcifs is attached by a user process and not the system, or if it is attached when the containers feature is deactivated.
Microsoft’s Response
Microsoft has been informed of this research and responded with the following:
“This has been determined to be a malware detection evasion technique and not a security vulnerability that would be serviced in a security update.“
GitHub Repo
You can access the POC tool source code on this GitHub repo.
Sources
- https://research.checkpoint.com/2021/playing-in-the-windows-sandbox/
- https://googleprojectzero.blogspot.com/2021/04/who-contains-containers.html
- https://unit42.paloaltonetworks.com/what-i-learned-from-reverse-engineering-windows-containers/
- https://habr.com/en/company/acronis/blog/536018/
- https://learn.microsoft.com/en-us/virtualization/windowscontainers/about/
- https://www.amazon.com/Windows-Kernel-Programming-Pavel-Yosifovich/dp/1977593372