Introduction
Cloud computing has changed the way we develop, release and consume software. Cloud computing includes High-Performance Computing (HPC) systems, and these systems are becoming increasingly heterogeneous. Graphics Processing Units (GPUs) have been widely available in HPC systems for some time, and the
Top500 list of supercomputers includes many systems that benefit from GPU acceleration. Field Programmable Gate Arrays (FPGAs) are another heterogeneous architecture and are emerging as popular accelerators in cloud and HPC systems. FPGAs are more flexible than GPUs and can be adapted by the user to solve a host of different problems. Their flexibility means that they often consume less power than alternative processors. It also comes at a cost, as FPGAs can be more challenging for users to program. Applications where FPGAs are popularly deployed include genomics and molecular dynamics, video and image processing, machine learning and data analytics, and in-network data processing.
Cloud computing offers users computing resources that they can access remotely. These resources are made available following several different models, including Infrastructure as a Service (IaaS), Platform as a Service (PaaS) and Software as a Service (SaaS). With IaaS, the base infrastructure is provided for the user, including servers, storage and networking. PaaS adds middleware and development tools to IaaS. In the SaaS model, software solutions are provided for the user, and in many cases, the user is not aware what infrastructure is being used to support their programming problem.
As FPGAs are added to cloud computing systems, it is not yet clear whether they fit into one of these existing cloud computing modes, or will be offered using a new model, FPGA as a Service, FaaS, and indeed what FaaS would mean.
Background: Existing FPGA Offerings in the Cloud
Two existing offerings of FPGAs by cloud service providers take different approaches to how FPGAs should be made available to users. Amazon Web Services (AWS) provide FPGAs as part of their
F1 instances. This follows the PaaS model where users can request F1 instances and run their code directly on these FPGAs using the infrastructure provided by Amazon. Microsoft provides FPGAs in the cloud following the SaaS model as part of
Project Catapult. Microsoft Azure applications and Bing searches may be accelerated using FPGAs without the user being aware what processing units are used. The implementations are done by Microsoft engineers and provided as a service.
Hardware models for FPGA in the Cloud
The traditional architecture for computer systems with attached accelerators is a server centric model where the accelerator is attached to a host processor, typically over high-speed interconnect such as PCIe. Such an architecture incurs a high cost, where data has to be transferred first to the host, and then to the accelerator, before it is processed and transferred back to the host. While the accelerator can significantly reduce processing time, the overhead of data transfer may result in little or no end-to-end improvement for an application.
An emerging alternative is to connect accelerators including GPUs and FPGAs directly to the network. In this scenario, the host processor may be responsible to program the FPGA with a bitstream, but the data flow and processing is all direct to the FPGA. This is part of a larger trend of disaggregation of the data center, where memories and other components are generally available, and no longer tied to a server. Microsoft uses this model by making the FPGA part of a SmartNIC to accelerate Azure applications \cite{others2018a}. A similar model is available through the OpenCloud Testbed.
The OpenCloud Testbed
The MGHPCC is a facility in Holyoke, MA, for housing research computing systems from a five-university consortium (University of Massachusetts, Harvard University, Boston University, Northeastern University, and the Massachusetts Institute of Technology). MGHPCC provides the space, power, and cooling capacity for approximately 750 racks of computing equipment on a single shared floor.
The Mass Open Cloud (MOC) is a best-effort cloud developed by a partnership of the same five academic institutions with government (Mass Tech Collaborative, USAF), and industry (Red Hat, Intel, Two Sigma, NetApp, Cisco). The existing MOC physical infrastructure includes around 2200 cores of commodity Intel compute, 160 Power9 cores, 40 GPUs, and 1.2PB of storage. The MOC was designed as both a research and a production cloud, where researchers can obtain metrics from running cloud applications and use the information to enhance their tools and modeling for cloud workflows.
CloudLab \cite{al2019} is particularly aimed at cloud researchers, and provides them with control and visibility all the way down to bare metal. A researcher can provision an entire cloud inside of CloudLab. Most CloudLab resources provide hard isolation from other users, so it can support hundreds of simultaneous "slices," with each getting an artifact-free environment suitable for scientific experimentation with new cloud architectures. A researcher can run standard cloud software stacks such as OpenStack and Hadoop, or build their own from the ground up.