PhD Dissertation Proposal: Zhao Zhang
Title: “Transparently Utilizing Cross-Domain Resources in Modern Infrastructure”
Modern data centers and cloud providers may have an enormous total quantity of a specific type of resource (e.g., CPU or memory) that is usually spread out over various domains: a physical or logical space that contains a number of resources. For example, an individual host may only have at most several hundred CPU cores and several terabytes of memory installed in its hardware domains. An application can naturally access resources of its encapsulating domain (e.g., CPU on the same machine), but accessing cross-domain resources may incur extra overhead or effort and may require special overhauls such as rewriting the application. The gap among domains causes inefficient and unbalanced resource utilization since such gap prevents logically aggregating resources into one large and unified resource pool that can be freely allocated and hot-modified based on real-time requirements.
This thesis proposal explores the possibility of transparently utilizing cross-domain core resources including CPU, memory, and network to solve three problems: (1) transparently utilizing CPU resources across physical machines to mitigate cluster-scale imbalanced resource utilization, (2) enabling a virtualized application inside a virtual machine to directly access physical CPU resources (rather than through virtualized CPUs) to mitigate the double scheduling problem, and (3) utilizing cross-domain network resources to solve the server-side Tor blocking problem.
In this thesis proposal, we introduce process diffusion to utilize cross-machine CPU resources transparently. Process diffusion is the ability to roam the execution of unmodified multi-threaded programs across machines based on real-time load and a pre-configured roaming policy. We describe DIFFUSED, an implementation of process diffusion. DIFFUSED is a runtime system for Linux that enables process diffusion for monolithic programs that are unable to scale out. DIFFUSED enables the target program to employ remote CPU resources in a lightweight, seamless, and transparent manner. Unlike the closest existing solution — process migration — DIFFUSED focuses on roaming a program’s execution flow rather than the entirety of its environment, which allows much more flexible and performant program movement. DIFFUSED requires no kernel modification or special hardware.
We then introduce process promotion to allow a virtualized application to utilize the host machine’s CPU resources more optimally by solving the double-scheduling problem. The double-scheduling problem affects the performance of applications running inside virtual machines, especially in common over-provisioning scenarios. Process promotion focuses on virtualized applications that are affected by the double-scheduling problem. Process promotion allows a virtualized application to be temporarily promoted to the host space, running with host-space physical CPUs, and directly scheduled by the host scheduler. Process promotion effectively reduces the number of active threads inside the guest VM and mitigates the double-scheduling problem. We implement PROMOTEDKVM to achieve process promotion. PROMOTEDKVM improves the performance of virtualized applications in over-provisioned scenarios and is compatible with existing schedulers.
We finally explore the possibility of utilizing cross-domain network resources to solve the server-side blocking problem o fTor. Server-side blocking is an existential threat to Tor, as an increasing number of websites discriminate against users who arrive via the anonymity network. We introduce exit bridges to mitigate the server-side blocking problem Exit bridges are short-lived nodes on cloud service providers or even regular users’ personal computers that can serve as alternative egress points in Tor circuits. We implement two systems: EEBT (Ephemeral Exit Bridge for Tor) and HEBTOR (Hidden Exit Bridge for Tor). EEBT utilizes cross-domain IP address resources from public cloud domains and focuses on providing ephemerality. HEBTOR significantly extends the possible domains of candidate IP address resources for solving the Tor blocking problem by recognizing and utilizing the idle IP address resources from the Internet where Tor is accessible.
In summary, this thesis proposal explores the possibilities of transparently utilizing cross-domain resources to solve real-world problems. This thesis proposal shows that DIFFUSED improves the performance of monolithic programs and CPU utilization of a cluster, PROMOTEDKVM helps mitigate the double scheduling problem of VM, EEBT and HEBTOR help mitigate the Tor blocking problem effectively.
Committee members:
Micah Sherr (co-adviser)
Ben Ujcich (co-adviser)
Clay Shields
Wenchao Zhou (Alibaba Group)