2019 August |
Zooming in on Wide-area Latencies to a Global Cloud Provider.
Proceedings of the ACM SIGCOMM Conference (SIGCOMM). |
2019 August |
iPipe: A Framework for Building Distributed Applications on Multicore SoC SmartNICs.
Proceedings of the ACM SIGCOMM Conference (SIGCOMM). |
2019 April |
TCP Acceleration as an OS Service.
Proceedings of the European Conference on Computer Systems (Eurosys). |
2019 March |
Teaching Rigorous Distributed Systems With Efficient Model Checking.
Proceedings of the European Conference on Computer Systems (Eurosys). abstract pdf slides
Writing correct distributed systems code is difficult, especially for novice programmers. The inherent asynchrony and need for fault-tolerance make errors almost inevitable. Industrial-strength testing and model checking have been shown to be effective at uncovering bugs, but they come at a cost -- in both time and effort -- that is far beyond what - in both time and effort - that is far beyond what students can afford. To address this, we have developed an efficient model checking framework and visual debugger for distributed systems, with the goal of helping students find and fix bugs in near real-time. We identify two novel techniques for reducing the search state space to more efficiently find bugs in student implementations. We report our experiences using these tools to help over two hundred students build a correct, linearizable, fault-tolerant, dynamically-sharded key--value store.
|
2019 February |
Slim: OS Kernel Support for a Low-Overhead Container Overlay Network.
Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI). |
2019 February |
Stable and Practical AS Relationship Inference with ProbLink.
Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI). |
2018 October |
TVM: An Automated End-to-End Optimizing Compiler for Deep Learning.
Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI). |
2018 August |
Revisiting Network Support for RDMA.
Proceedings of the ACM SIGCOMM Conference (SIGCOMM). |
2018 April |
Towards Causal Datacenter Networks.
Proceedings of the ACM Workshop on Principles and Practice of Consistency for Distributed Data (PaPoC). abstract pdf
Traditionally, distributed systems conservatively assume an asynchronous network. However, recent work on the co-design of networks and distributed systems has shown that stronger ordering properties are achievable in datacenter networks and yield performance improvements for the distributed systems they support. We build on that trend and ask whether it is possible for the datacenter network to order all messages in a protocol-agnostic way. This approach, which we call omnisequencing, would ensure causal delivery of all messages, making consistency a network- level guarantee.
|
2018 April |
MultiNyx: A Multi-level Abstraction Framework for Systematic Analysis of Hypervisors.
Proceedings of the European Conference on Computer Systems (Eurosys). |
2018 April |
Approximating Fair Queueing on Reconfigurable Switches.
Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI). |
2018 April |
Deepview: Virtual Disk Failure Diagnosis and Pattern Detection for Azure.
Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI). |
2017 October |
Hyperkernel: Push-Button Verification of an OS Kernel.
Proceedings of the ACM Symposium on Operating Systems Principles (SOSP). abstract pdf slides video |
2017 October |
Eris: Coordination-Free Consistent Transactions Using In-Network Concurrency Control.
Proceedings of the ACM Symposium on Operating Systems Principles (SOSP). abstract pdf slides video
Distributed storage systems aim to provide strong consistency and isolation guarantees on an architecture that is partitioned across multiple shards for scalability and replicated for fault tolerance. Traditionally, achieving all of these goals has required an expensive combination of atomic commitment and replication protocols -- introducing extensive coordination overhead. Our system, Eris, takes a different approach. It moves a core piece of concurrency control functionality, which we term multi-sequencing, into the datacenter network itself. This network primitive takes on the responsibility for consistently ordering transactions, and a new lightweight transaction protocol ensures atomicity. The end result is that Eris avoids both replication and transaction coordination overhead: we show that it can process a large class of distributed transactions in a single round-trip from the client to the storage system without any explicit coordination between shards or replicas in the normal case. It provides atomicity, consistency, and fault tolerance with less than 10% overhead -- achieving throughput 3.6--35x higher and latency 72--80% lower than a conventional design on standard benchmarks.
|
2017 October |
Eris: Coordination-Free Consistent Transactions Using In-Network Concurrency Control (Extended Version).
Technical Report UW-CSE-TR-17-01-01, University of Washington CSE. abstract pdf
Distributed storage systems aim to provide strong consistency and isolation guarantees on an architecture that is partitioned across multiple shards for scalability and replicated for fault tolerance. Traditionally, achieving all of these goals has required an expensive combination of atomic commitment and replication protocols -- introducing extensive coordination overhead. Our system, Eris, takes a different approach. It moves a core piece of concurrency control functionality, which we term multi-sequencing, into the datacenter network itself. This network primitive takes on the responsibility for consistently ordering transactions, and a new lightweight transaction protocol ensures atomicity. The end result is that Eris avoids both replication and transaction coordination overhead: we show that it can process a large class of distributed transactions in a single round-trip from the client to the storage system without any explicit coordination between shards or replicas in the normal case. It provides atomicity, consistency, and fault tolerance with less than 10% overhead -- achieving throughput 3.6--35x higher and latency 72--80% lower than a conventional design on standard benchmarks.
|
2017 October |
Recovering Shared Objects Without Stable Storage.
Proceedings of the International Symposium on Distributed Computing (DISC). abstract pdf |
2017 August |
Recovering Shared Objects Without Stable Storage (Extended Version).
Technical Report UW-CSE-TR-17-08-01, University of Washington CSE. abstract pdf |
2017 July |
Fast Video Classification via Adaptive Cascading of Deep Models.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Spotlight. abstract pdf
Recent advances have enabled "oracle" classifiers that can classify across many classes and input distributions with high accuracy without retraining. However, these classifiers are relatively heavyweight, so that applying them to classify video is costly. We show that day-to-day video exhibits highly skewed class distributions over the short term, and that these distributions can be classified by much simpler models. We formulate the problem of detecting the short-term skews online and exploiting models based on it as a new sequential decision making problem dubbed the Online Bandit Problem, and present a new algorithm to solve it. When applied to recognizing faces in TV shows and movies, we realize end-toend classification speedups of 2.4-7.8x/2.6-11.2x (on GPU/CPU) relative to a state-of-the-art convolutional neural network, at competitive accuracy.
|
2017 April |
IncBricks: Toward In-Network Computation with an In-Network Cache.
Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). |
2017 April |
An Empirical Study on the Correctness of Formally Verified Distributed Systems.
Proceedings of the European Conference on Computer Systems (Eurosys). |
2017 March |
RAIL: A Case for Redundant Arrays of Inexpensive Links in Data Center Networks.
Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI). |
2017 March |
Evaluating the Power of Flexible Packet Processing for Network Resource Allocation.
Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI). |
2017 March |
SCL: Simplifying Distributed SDN Control Planes.
Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI). |
2017 March |
Curator: Self-Managing Storage for Enterprise Clusters.
Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI). |
2016 November |
Just Say NO to Paxos Overhead: Replacing Consensus with Network Ordering.
Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI). abstract pdf slides
Distributed applications use replication, implemented by protocols like Paxos, to ensure data availability and transparently mask server failures. This paper presents a new approach to achieving replication in the data center without the performance cost of traditional methods. Our work carefully divides replication responsibility between the network and protocol layers. The network orders requests but does not ensure reliable delivery -- using a new primitive we call ordered unreliable multicast (OUM). Implementing this primitive can be achieved with near-zero-cost in the data center. Our new replication protocol, Network-Ordered Paxos (NOPaxos), exploits network ordering to provide strongly consistent replication without coordination. The resulting system not only outperforms both latency- and throughput-optimized protocols on their respective metrics, but also yields throughput within 2% and latency within 16 us of an unreplicated system -- providing replication without the performance cost.
|
2016 November |
Just Say NO to Paxos Overhead: Replacing Consensus with Network Ordering (Extended Version).
Technical Report UW-CSE-2016-09-02, University of Washington CSE. abstract pdf
Distributed applications use replication, implemented by protocols like Paxos, to ensure data availability and transparently mask server failures. This paper presents a new approach to achieving replication in the data center without the performance cost of traditional methods. Our work carefully divides replication responsibility between the network and protocol layers. The network orders requests but does not ensure reliable delivery -- using a new primitive we call ordered unreliable multicast (OUM). Implementing this primitive can be achieved with near-zero-cost in the data center. Our new replication protocol, Network-Ordered Paxos (NOPaxos), exploits network ordering to provide strongly consistent replication without coordination. The resulting system not only outperforms both latency- and throughput-optimized protocols on their respective metrics, but also yields throughput within 2% and latency within 16 us of an unreplicated system -- providing replication without the performance cost.
|
2016 November |
Push-Button Verification of File Systems via Crash Refinement.
Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI). Best Paper Award. abstract pdf |
2016 November |
Diamond: Automating Data Management and Storage for Wide-area, Reactive Applications.
Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI). abstract pdf slides
Users of today's popular wide-area apps (e.g., Twitter, Google Docs, and Words with Friends) must no longer save and reload when updating shared data; instead, these applications are reactive, providing the illusion of continuous synchronization across mobile devices and the cloud. Achieving this illusion poses a complex distributed data management problem for programmers. This paper presents the first reactive data management service, called Diamond, which provides persistent cloud storage, reliable synchronization between storage and mobile devices, and automated execution of application code in response to shared data updates. We demonstrate that Diamond greatly simplifies the design of reactive applications, strengthens distributed data sharing guarantees, and supports automated reactivity with low performance overhead.
|
2016 October |
Disciplined Inconsistency with Consistency Types.
Proceedings of the ACM Symposium on Cloud Computing (SoCC). |
2016 October |
Radiatus: Strong User Isolation for Scalable Web Applications.
Proceedings of the ACM Symposium on Cloud Computing (SoCC). |
2016 October |
Characterizing Private Clouds: A Large-Scale Empirical Analysis of Enterprise Clusters.
Proceedings of the ACM Symposium on Cloud Computing (SoCC). |
2016 August |
Providing Stable Storage for the Diskless Crash-Recovery Failure Model.
Technical Report UW-CSE-TR-16-08-02, University of Washington CSE. abstract pdf
Many classic protocols in the fault tolerant distributed computing literature assume a Crash-Fail model in which processes either are up, or have crashed and are permanently down. While this model is useful, it does not fully capture the difficulties many real systems must contend with. In particular, real-world systems are long-lived and must have a recovery mechanism so that crashed processes can rejoin the system and restore its fault-tolerance. When processes are assumed to have access to stable storage that is persistent across failures, the Crash-Recovery model is trivial. However, because disk failures are common and because having a disk on a protocol's critical path is often performance concern, diskless recovery protocols are needed. While such protocols do exist in the state machine replication literature, several well-known protocols have flawed recovery mechanisms. We examine these errors to elucidate the problem of diskless recovery and present our own protocol for providing virtual stable storage, transforming any protocol in the Crash-Recovery with stable storage model into a protocol in the Diskless Crash-Recover model.
|
2016 June |
MCDNN: An Approximation-Based Execution Framework for Deep Stream Processing Under Resource Constraints.
Proceedings of the ACM International Conference on Mobile Systems, Applications, and Services (MobiSys). abstract pdf
We consider applying computer vision to video on cloud-backed mobile devices using Deep Neural Networks (DNNs). The computational demands of DNNs are high enough that, without careful resource management, such applications strain device battery, wireless data, and cloud cost budgets. We pose the corresponding resource management problem, which we call Approximate Model Scheduling, as one of serving a stream of heterogeneous (i.e., solving multiple classification problems) requests under resource constraints. We present the design and implementation of an optimizing compiler and runtime scheduler to address this problem. Going beyond traditional resource allocators, we allow each request to be served approximately, by systematically trading off DNN classification accuracy for resource use, and remotely, by reasoning about on-device/cloud execution trade-offs. To inform the resource allocator, we characterize how several common DNNs, when subjected to state-of-the art optimizations, trade off accuracy for resource use such as memory, computation, and energy. The heterogeneous streaming setting is a novel one for DNN execution, and we introduce two new and powerful DNN optimizations that exploit it. Using the challenging continuous mobile vision domain as a case study, we show that our techniques yield significant reductions in resource usage and perform effectively over a broad range of operating conditions.
|
2016 June |
Satellite: Joint Analysis of CDNs and Network-Level Interference.
Proceedings of the USENIX Annual Technical Conference (ATC). Best Student Paper Award. |
2016 April |
Specifying and Checking File System Crash-Consistency Models.
Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). |
2016 April |
High Performance Packet Processing with FlexNIC.
Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). abstract pdf
The recent surge of network I/O performance has put enormous pressure on memory and software I/O processing subsystems. We argue that the primary reason for high memory and processing overheads is the inefficient use of these resources by current commodity network interface cards (NICs). We propose FlexNIC, a flexible network DMA interface that can be used by operating systems and applications alike to reduce packet processing overheads. FlexNIC allows services to install packet processing rules into the NIC, which then executes simple operations on packets while exchanging them with host memory. Thus, our proposal moves some of the packet processing traditionally done in software to the NIC, where it can be done flexibly and at high speed. We quantify the potential benefits of FlexNIC by emulating the proposed FlexNIC functionality with existing hardware or in software. We show that significant gains in application performance are possible, in terms of both latency and throughput, for several widely used applications, including a key-value store, a stream processing system, and an intrusion detection system.
|
2016 March |
Speeding up Web Page Loads with Shandian.
Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI). abstract pdf
Web page loads are slow due to intrinsic inefficiencies in the page load process. Our study shows that the inefficiencies are attributable not only to the contents and structure of the Web pages (e.g., three-fourths of the CSS resources are not used during the initial page load) but also the way that pages are loaded (e.g., 15% of page load times are spent waiting for parsing-blocking resources to be loaded). To address these inefficiencies, this paper presents Shandian (which means lightening in Chinese) that restructures the page load process to speed up page loads. Shandian exercises control over what portions of the page gets communicated and in what order so that the initial page load is optimized. Unlike previous techniques, Shandian works on demand without requiring a training period, is compatible with existing latency-reducing techniques (e.g., caching and CDNs), supports security features that enforce same-origin policies, and does not impose additional privacy risks. Our evaluations show that Shandian reduces page load times by more than half for both mobile phones and desktops while incurring modest overheads to data usage.
|
2016 March |
When Is Operation Ordering Required in Replicated Transactional Storage?.
IEEE Data Engineering Bulletin 39(1). abstract pdf
Today's replicated transactional storage systems typically have a layered architecture, combining protocols for transaction coordination, consistent replication, and concurrency control. These systems generally require costly strongly-consistent replication protocols like Paxos, which assign a total order to all operations. To avoid this cost, we ask whether all replicated operations in these systems need to be strictly ordered. Recent research has yielded replication protocols that can avoid unnecessary ordering, e.g., by exploiting commutative operations, but it is not clear how to apply these to replicated transaction processing systems. We answer this question by analyzing existing transaction processing designs in terms of which replicated operations require ordering and which simply require fault tolerance. We describe how this analysis leads to our recent work on TAPIR, a transaction protocol that efficiently provides strict serializability by using a new replication protocol that provides fault tolerance but not ordering for most operations.
|
2015 |
Building Consistent Transactions with Inconsistent Replication.
Proceedings of the ACM Symposium on Operating Systems Principles (SOSP). abstract pdf
Application programmers increasingly prefer distributed storage systems with strong consistency and distributed transactions (e.g., Google's Spanner) for their strong guarantees and ease of use. Unfortunately, existing transactional storage systems are expensive to use -- in part because they require costly replication protocols, like Paxos, for fault tolerance. In this paper, we present a new approach that makes transactional storage systems more affordable: we eliminate consistency from the replication protocol while still providing distributed transactions with strong consistency to applications. <p> We present TAPIR -- the Transactional Application Protocol for Inconsistent Replication -- the first transaction protocol to use a novel replication protocol, called inconsistent replication, that provides fault tolerance without consistency. By enforcing strong consistency only in the transaction protocol, TAPIR can commit transactions in a single round-trip and order distributed transactions without centralized coordination. We demonstrate the use of TAPIR in a transactional key-value store, TAPIR-KV. Compared to conventional systems, TAPIR-KV provides better latency and throughput.
|
2015 |
Building Consistent Transactions with Inconsistent Replication (Extended Version).
Technical Report UW-CSE-2014-12-01 v2, University of Washington CSE. abstract pdf
Application programmers increasingly prefer distributed storage systems with strong consistency and distributed transactions (e.g., Google's Spanner) for their strong guarantees and ease of use. Unfortunately, existing transactional storage systems are expensive to use -- in part because they require costly replication protocols, like Paxos, for fault tolerance. In this paper, we present a new approach that makes transactional storage systems more affordable: we eliminate consistency from the replication protocol while still providing distributed transactions with strong consistency to applications. <p> We present TAPIR -- the Transactional Application Protocol for Inconsistent Replication -- the first transaction protocol to use a novel replication protocol, called inconsistent replication, that provides fault tolerance without consistency. By enforcing strong consistency only in the transaction protocol, TAPIR can commit transactions in a single round-trip and order distributed transactions without centralized coordination. We demonstrate the use of TAPIR in a transactional key-value store, TAPIR-KV. Compared to conventional systems, TAPIR-KV provides better latency and throughput.
|
2015 September |
Enhancing mobile apps to use sensor hubs without programmer effort.
Proceedings of the ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp). Best Paper Award. abstract pdf
Always-on continuous sensing apps drain the battery quickly because they prevent the main processor from sleeping. Instead, sensor hub hardware, available in many smartphones today, can run continuous sensing at lower power while keeping the main processor idle. However, developers have to divide functionality between the main processor and the sensor hub. We implement MobileHub, a system that automatically rewrites applications to leverage the sensor hub without additional programming effort. MobileHub uses a combination of dynamic taint tracking and machine learning to learn when it is safe to leverage the sensor hub without affecting application semantics. We implement MobileHub in Android and prototype a sensor hub on a 8-bit AVR micro-controller. We experiment with 20 applications from Google Play. Our evaluation shows that MobileHub significantly reduces power consumption for continuous sensing apps.
|
2015 July |
MetaSync: File Synchronization Across Multiple Untrusted Storage Services.
Proceedings of the USENIX Annual Technical Conference (ATC). abstract pdf
Cloud-based file synchronization services, such as Dropbox, are a worldwide resource for many millions of users. However, individual services often have tight resource limits, suffer from temporary outages or even shutdowns, and sometimes silently corrupt or leak user data. We design, implement, and evaluate MetaSync, a secure and reliable file synchronization service that uses multiple cloud synchronization services as untrusted storage providers. To make MetaSync work correctly, we devise a novel variant of Paxos that provides efficient and consistent updates on top of the unmodified APIs exported by existing services. Our system automatically redistributes files upon reconfiguration of providers. Our evaluation shows that MetaSync provides low update latency and high update throughput while being more trustworthy and available. MetaSync outperforms its underlying cloud services by 1.2-10X on three realistic workloads.
|
2015 May |
FlexNIC: Rethinking Network DMA.
Proceedings of the USENIX Workshop on Hot Topics in Operating Systems (HotOS). abstract
We propose FlexNIC, a flexible network DMA interface that can be used by operating systems and applications alike to reduce packet processing overheads. The recent surge of network I/O performance has put enormous pressure on memory and software I/O processing subsystems. Yet even at high speeds, flexibility in packet handling is still important for security, performance isolation. and virtualization. Thus, our proposal moves some of the packet processing traditionally done in software to the NIC DMA controller, where it can be done flexibly and at high speed. We show how FlexNIC can benefit widely used data center server applications, such as key-value stores.
|
2015 May |
Designing Distributed Systems Using Approximate Synchrony in Data Center Networks.
Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI). Best Paper Award. abstract pdf
Distributed systems are traditionally designed independently from the underlying network, making worst-case assumptions (e.g., complete asynchrony) about its behavior. However, many of today's distributed applications are deployed in data centers, where the network is more reliable, predictable, and extensible. In these environments, it is possible to co-design distributed systems with their network layer, and doing so can offer substantial benefits. This paper explores network-level mechanisms for providing Mostly-Ordered Multicast (MOM): a best-effort ordering property for concurrent multicast operations. Using this primitive, we design Speculative Paxos, a state machine replication protocol that relies on the network to order requests in the normal case. This approach leads to substantial performance benefits: under realistic data center conditions, Speculative Paxos can provide 40% lower latency and 2.6x higher throughput than the standard Paxos protocol. It offers lower latency than a latency-optimized protocol (Fast Paxos) with the same throughput as a throughput-optimized protocol (batching).
|
2015 |
Exploring Cyberbullying and Other Toxic Behavior in Team Competition Online Games.
Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI). abstract pdf
In this work we explore cyberbullying and other toxic behavior in team competition online games. Using a dataset of over 10 million player reports on 1.46 million toxic players along with corresponding crowdsourced decisions, we test several hypotheses drawn from theories explaining toxic behavior. Besides providing large-scale, empirical based understanding of toxic behavior, our work can be used as a basis for building systems to detect, prevent, and counter-act toxic behavior.
|
2015 April |
Claret: Using Data Types for Highly Concurrent Distributed Transactions.
Proceedings of the ACM Workshop on Principles and Practice of Consistency for Distributed Data (PaPoC). |
2014 November |
Tales of the Tail: Hardware, OS, and Application-level Sources of Tail Latency.
Proceedings of the ACM Symposium on Cloud Computing (SoCC). abstract pdf
Interactive services often have large-scale parallel implementations. To deliver fast responses, the median and tail latencies of a service's components must be low. In this paper, we explore the hardware, OS, and application-level sources of poor tail latency in high throughput servers executing on multi-core machines. We model these network services as a queuing system in order to establish the best-achievable latency distribution. Using fine-grained measurements of three different servers (a null RPC service, Memcached, and Nginx) on Linux, we then explore why these servers exhibit significantly worse tail latencies than queuing models alone predict. The underlying causes include interference from background processes, request re-ordering caused by poor scheduling or constrained concurrency models, suboptimal interrupt routing, CPU power saving mechanisms, and NUMA effects. We systematically eliminate these factors and show that Memcached can achieve a median latency of 11 us and a 99.9th percentile latency of 32 us at 80% utilization on a four-core system. In comparison, a naive deployment of Memcached at the same utilization on a single-core system has a median latency of 100 us and a 99.9th percentile latency of 5 ms. Finally, we demonstrate that tradeoffs exist between throughput, energy, and tail latency.
|
2014 October |
Jitk: A Trustworthy In-Kernel Interpreter Infrastructure.
Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI). abstract pdf
Modern operating systems run multiple interpreters in the kernel, which enable user-space applications to add new functionality or specialize system policies. The correctness of such interpreters is critical to the overall system security: bugs in interpreters could allow adversaries to compromise user-space applications and even the kernel. Jitk is a new infrastructure for building in-kernel interpreters that guarantee functional correctness as they compile user-space policies down to native instructions for execution in the kernel. To demonstrate Jitk, we implement two interpreters in the Linux kernel, BPF and INET-DIAG, which are used for network and system call filtering and socket monitoring, respectively. To help application developers write correct filters, we introduce a high-level rule language, along with a proof that Jitk correctly translates high-level rules all the way to native machine code, and demonstrate that this language can be integrated into OpenSSH with tens of lines of code. We built a prototype of Jitk on top of the CompCert verified compiler and integrated it into the Linux kernel. Experimental results show that Jitk is practical, fast, and trustworthy.
|
2014 October |
Customizable and Extensible Deployment for Mobile/Cloud Applications.
Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI). abstract pdf
Modern applications face new challenges in managing today's highly distributed and heterogeneous environment. For example, they must stitch together code that crosses smartphones, tablets, personal devices, and cloud services, connected by variable wide-area networks, such as WiFi and 4G. This paper describes Sapphire, a distributed programming platform that simplifies the programming of today's mobile/cloud applications. Sapphire's key design feature is its distributed runtime system, which supports a flexible and extensible deployment layer for solving complex distributed systems tasks, such as fault-tolerance, code-offloading, and caching. Rather than writing distributed systems code, programmers choose deployment managers that extend Sapphire's kernel to meet their applications' deployment requirements. In this way, each application runs on an underlying platform that is customized for its own distribution needs.
|
2014 October |
Arrakis: The Operating System is the Control Plane.
Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI). Best Paper Award. abstract pdf
Recent device hardware trends enable a new approach to the design of network server operating systems. In a traditional operating system, the kernel mediates access to device hardware by server applications, to enforce process isolation as well as network and disk security. We have designed and implemented a new operating system, Arrakis, that splits the traditional role of the kernel in two. Applications have direct access to virtualized I/O devices, allowing most I/O operations to skip the kernel entirely, while the kernel is re-engineered to provide network and disk protection without kernel mediation of every operation. We describe the hardware and software changes needed to take advantage of this new abstraction, and we illustrate its power by showing 2-5x end-to-end latency and 9x throughput improvements for a popular persistent NoSQL store relative to a well-tuned Linux implementation.
|
2014 June |
Towards High-Performance Application-Level Storage Management.
Proceedings of the USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage). abstract pdf
We propose a radical re-architecture of the traditional operating system storage stack to move the kernel off the data path. Leveraging virtualized I/O hardware for disk and flash storage, most read and write I/O operations go directly to application code. The kernel dynamically allocates extents, manages the virtual to physical binding, and performs name translation. The benefit is to dramatically reduce the CPU overhead of storage operations while improving application flexibility.
|
2014 June |
Machine Fault Tolerance for Reliable Datacenter Systems.
Proceedings of the ACM Asia-Pacific Workshop on Systems (APSys). abstract pdf
Although rare in absolute terms, undetected CPU, memory, and disk errors occur often enough at data center scale to significantly affect overall system reliability and availability. In this paper, we propose a new failure model, called Machine Fault Tolerance, and a new abstraction, a replicated write-once trusted table, to provide improved resilience to these types of failures. Since most machine failures manifest in application server and operating system code, we assume a Byzantine model for those parts of the system. However, by assuming that the hypervisor and network are trustworthy, we are able to reduce the overhead of machine-fault masking to be close to that of non-Byzantine Paxos.
|
2013 November |
Exploring Storage Class Memory with Key Value Stores.
Proceedings of the Workshop on Interactions of NVM/Flash with Operating Systems and Workloads (INFLOW). abstract pdf
In the near future, new storage-class memory (SCM) technologies - such as phase-change memory and memristors - will radically change the nature of long-term storage. These devices will be cheap, non-volatile, byte addressable, and near DRAM density and speed. While SCM offers enormous opportunities, profiting from them will require new storage systems specifically designed for SCM's properties. This paper presents Echo, a persistent key-value storage system designed to leverage the advantages and address the challenges of SCM. The goals of Echo include high performance for both small and large data objects, recoverability after failure, and scalability on multicore systems. Echo achieves its goals through the use of a two-level memory design targeted for memory systems containing both DRAM and SCM, exploitation of SCM's byte addressability for fine-grained transactions in non-volatile memory, and the use of snapshot isolation for concurrency, consistency, and versioning. Our evaluation demonstrates that Echo's SCM-centric design achieves the durability guarantees of the best disk-based stores with the performance characteristics approaching the best in-memory key-value stores.
|
2013 September |
NLify: Lightweight Spoken Natural Language Interfaces via Exhaustive Paraphrasing.
Proceedings of the ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp). Honorable Mention. abstract pdf
This paper presents the design and implementation of a programming system that enables third-party developers to add spoken natural language (SNL) interfaces to standalone mobile applications. The central challenge is to create statistical recognition models that are accurate and resource-efficient in the face of the variety of natural language, while requiring little specialized knowledge from developers. We show that given a few examples from the developer, it is possible to elicit comprehensive sets of paraphrases of the examples using internet crowds. The exhaustive nature of these paraphrases allows us to use relatively simple, automatically derived statistical models for speech and language understanding that perform well without per-application tuning. We have realized our design fully as an extension to the Visual Studio IDE. Based on a new benchmark dataset with 3500 spoken instances of 27 commands from 20 subjects and a small developer study, we establish the promise of our approach and the impact of various design choices.
|
2013 August |
Expressive Privacy Control with Pseudonyms.
Proceedings of the ACM SIGCOMM Conference (SIGCOMM). abstract pdf
As personal information increases in value, the incentives for remote services to collect as much of it as possible increase as well. In the current Internet, the default assumption is that all behavior can be correlated using a variety of identifying information, not the least of which is a user's IP address. Tools like Tor, Privoxy, and even NATs, are located at the opposite end of the spectrum and prevent any behavior from being linked. Instead, our goal is to provide users with more control over linkability---which activites of the user can be correlated at the remote services---not necessarily more anonymity. We design a cross-layer architecture that provides users with a pseudonym abstraction. To the user, a pseudonym represents a set of activities that the user is fine with linking, and to the outside world, a pseudonym gives the illusion of a single machine. We provide this abstraction by associating each pseudonym with a unique, random address drawn from the IPv6 address space, which is large enough to provide each device with multiple globally-routable addresses. We have implemented and evaluated a prototype that is able to provide unlinkable pseudonyms within the Chrome web browser in order to demonstrate the feasibility, efficacy, and expressiveness of our approach.
|
2013 May |
The Case for Onloading Continuous High-Datarate Perception to the Phone.
Proceedings of the USENIX Workshop on Hot Topics in Operating Systems (HotOS). abstract pdf
Much has been said recently on off-loading computations from the phone. In particular, workloads such as speech and visual recognition that involve models based on ``big data'' are thought to be prime candidates for cloud processing. We posit that the next few years will see the arrival of mobile usages that require continuous processing of audio and video data from wearable devices. We argue that these usages are unlikely to flourish unless substantial computation is moved back on to the phone. We outline possible solutions to the problems inherent in such a move. We advocate a close partnership between perception and systems researchers to realize these usages.
|
2011 May |
Operating Systems Implications of Fast, Cheap, Non-Volatile Memory.
Proceedings of the USENIX Workshop on Hot Topics in Operating Systems (HotOS). abstract pdf
The existence of two basic levels of storage (fast/volatile and slow/non-volatile) has been a long-standing premise of most computer systems, influencing the design of OS components, including file systems, virtual memory, scheduling, execution models, and even their APIs. Emerging resistive memory technologies - such as phase-change memory (PCM) and memristors - have the potential to provide large, fast, non-volatile memory systems, changing the assumptions that motivated the design of current operating systems. This paper examines the implications of non-volatile memories on a number of OS mechanisms, functions, and properties.
|