Partial Reconfiguration DoS Vulnerability (CVE-2019-11165)

The Denial-of-Service (DoS) vulnerability due to Partial Reconfiguration (PR) (CVE-2019-11165) exists in the device driver’s PR module due to the use of non-interruptible infinite loops. This can be exploited by an attacker to cause a DoS attack by making cloud FPGAs unavailable to other users. FPGA resources remain unavailable until the host is rebooted (frequent rebooting could be challenging due to shared server resources between multiple users). This may result in financial losses to the cloud due to the unusability of (unavailable) resources.

In Stratix-10, the FPGA fabric is partially reconfigured using Configuration-via-Protocol (CvP) over the PCI/PCIe, which can be initiated using “aocl” utility or by using OpenCL APIs. As shown in Figure 1, the system performs the following series of actions during PR.

  1. Host-side ensures the compatibility of PR files with the base design.
  2. To prepare for partial reconfiguration, PR control signals are asserted which disables the PR region and all its outputs.
  3. The PR configuration file/bitstream is sent to Secure Device Manager (SDM) over PCI/PCIe, which configures the partial region after compatibility checking and writes the status to PCI/PCIe base address register (BAR).
  4. The system waits for the `SUCCESS’ signal and the normal operation is continued thereafter.

Figure 1: Sequence of steps occurring after triggering of PR by the host application or “aocl” utility.
Snippet of Vulnerable Code
// Wait for PR complete 
status = ioread32(aclpci->bar[ACL_PRCONTROLLER_BAR]+ACL_PRCONTROLLER_OFFSET+ALT_PR_CSR_OFST);
ACL_DEBUG (KERN_DEBUG "ALT_PR_CSR_OFST status is 0x%08X", (int) status);
while(status != ALT_PR_CSR_STATUS_PR_SUCCESS)
{
 msleep(100);
 status = ioread32(aclpci->bar[ACL_PRCONTROLLER_BAR]+ACL_PRCONTROLLER_OFFSET+ALT_PR_CSR_OFST);
 ACL_DEBUG (KERN_DEBUG "ALT_PR_CSR_OFST status is 0x%08X", (int) status);
};

The default system settings give users no control over the compatibility check as it remains enabled in the compiler and SDM’s firmware. However, any vulnerability in the compatibility checker will create functional and security issues in the default mode. The vulnerable code above shows the snippet from Stratix-10 device drivers that can be exploited to cause potential DoS. It can be seen in line 4 of the snippet that a non-interruptible while loop waits for the ‘SUCCESS’ status state on the PCI/PCIe BAR. If an attacker can deliberately activate a false state, the driver will remain inside the loop. Since drivers run in a kernel-context, meaning the user cannot reclaim processor resources from it, such an action will cause the driver to be stuck indefinitely rendering FPGA inaccessible to other users. The quick take video demonstrate a proof-of-concept of the live exploitation of the vulnerability.

To successfully trigger a DoS attack, the adversary must bypass the host-side compatibility check and manage to deliver the incompatible bitstream to the SDM. In our demonstrated attack, we used the vendor-supplied BSP compiled using version 18.1.1 Build 263 and generated our PR bitstream using version 18.1.1 Build 277. We were able to successfully deliver the bitstream to SDM using the “aocl” utility bypassing the PR soft-checks. Since the PR bitstream is inconsistent with the base/static design, SDM fails to program the FPGA and writes FAIL status to the BAR, triggering the vulnerability and causing driver to get stuck in the non-interruptible loop.
The vulnerability can be exploited by an attacker on the cloud to cause a DoS attack by programming FPGA with PR bitsream incompatible with the base design but close enough to bypass soft-checks. FPGA resources remain unavailable until the host is rebooted (frequent rebooting could be challenging due to shared server resources between multiple users). This may result in financial losses to the cloud due to the unusability of (unavailable) resources.

In our test case, we used two versions of the BSPs (obtained from the vendor) for the same FPGA device, helping us to develop PR files with one BSP and program the board initialized with the other BSP. This facilitated the development of the FPGA accelerator which was close enough to pass the software checking but to fail in the SDM, thus triggering the vulnerability.