List of Accepted Papers
All papers are available for download on SpringerLink.
Scaling Microblogging Services with Divergent Traffic Demands.
Tianyin Xu, Yang Chen, Lei Jiao, Ben Zhao, Pan Hui and Xiaoming Fu.
Abstract: Today's microblogging services such as Twitter have long outgrown their initial designs as SMS-based social networks. Instead, a massive and steadily-growing user population of more than 100 million is using Twitter for everything from capturing the mood of the country to detecting earthquakes and Internet service failures. It is unsurprising that the traditional centralized client-server architecture has not scaled with user demand, leading to server overload and significant impairment of availability.
In this paper, we argue that the divergence in usage models of microblogging services can be best addressed using complementary mechanisms, one that provides reliable messages between friends, and another that delivers events from popular celebrities and media outlets to their thousands or even millions of followers. We present Cuckoo, a new microblogging system that offloads processing and bandwidth costs away from a small centralized server base while ensuring reliable message delivery. We use a 20-day Twitter availability measurement to guide our design, and trace-driven emulation of 30,000 Twitter users to evaluate our Cuckoo prototype. Compared to a centralized approach, Cuckoo achieves 30-50% server bandwidth saving and 50-60% CPU load reduction, while guaranteeing reliable message delivery.
Contrail: Enabling Decentralized Social Networks on Smartphones.
Patrick Stuedi, Iqbal Mohomed, Mahesh Balakrishnan, Rama Ramasubramanian, Ted Wobber, Doug Terry and Morley Mao.
Abstract: Mobile devices are increasingly used for social networking applications, where data is shared between devices belonging to different users. Today, such applications are implemented as centralized services, forcing users to trust corporations with their personal data. While decentralized designs for such applications can provide privacy, they are difficult to achieve on current devices due to constraints on connectivity, energy and bandwidth. Contrail is a communication platform that allows decentralized social networks to overcome these challenges. In Contrail, a user installs content filters on her friends' devices that express her interests; she subsequently receives new data generated by her friends that match the filters. Both data and filters are exchanged between devices via cloud-based relays in encrypted form, giving the cloud no visibility into either. In addition to providing privacy, Contrail enables applications that are very efficient in terms of energy and bandwidth.
Confidant: Protecting OSN Data without Locking it Up.
Dongtao Liu, Amre Shakimov, Ramón Cáceres, Alexander Varshavsky and Landon Cox.
Abstract: Online social networks (OSNs) are immensely popular, but participants are increasingly uneasy with centralized services’ handling of user data. Decentralized OSNs offer the potential to address user’s anxiety while also enhancing the features and scalability offered by existing, centralized services.
In this paper, we present Confidant, a decentralized OSN designed to support a scalable application framework for OSN data without compromising users’ privacy. Unlike previous decentralized OSNs, which assume that storage servers are untrusted, Confidant replicates a user’s data on servers controlled by her friends. Because data is stored on trusted servers, Confidant allows application code to run directly on these storage servers. The key challenge in realizing this vision is managing access-control policies under weakly-consistent replication. Confidant addresses this challenge by eliminating write conflicts through a lightweight cloud-based state manager and through a simple mechanism for updating the bindings between access policies and replicated data.
We have evaluated Confidant using trace-driven simulation and experiments with a prototype implementation. Simulation results show that typical OSN users should expect read and write success rates of between 99.5% and 100%, while we demonstrate that representative applications can run up to 30 times faster on Confidant than the same tasks implemented with untrusted storage.
Live Deduplication Storage of Virtual Machine Images in an Open-Source Cloud.
Chun-Ho Ng, Mingcao Ma, Tsz-Yeung Wong, Patrick P. C. Lee and John C. S. Lui.
Abstract: Deduplication is an approach of avoiding storing data blocks with identical content, and has been shown to effectively reduce the disk space for storing multi-gigabyte virtual machine (VM) images. However, it remains challenging to deploy deduplication in a real system, such as a cloud platform, where VM images are regularly inserted and retrieved. We propose LiveDFS, a live deduplication file system that enables deduplication storage of VM images in an open-source cloud that is deployed under low-cost commodity hardware settings with limited memory footprints. LiveDFS has several distinct features, including spatial locality, prefetching of metadata, and journaling. LiveDFS is POSIX-compliant and is implemented as a Linux kernel-space file system. We deploy our LiveDFS prototype as a storage layer in a cloud platform based on OpenStack, and conduct extensive experiments. Compared to an ordinary file system without deduplication, we show that LiveDFS can save at least 40\% of space for storing VM images, while achieving comparable throughput in importing and retrieving VM images. Our work justifies the feasibility of deploying LiveDFS in an open-source cloud.
Scalable Load Balancing in Cluster Storage Systems.
Gae-Won You, Seung-Won Hwang, Navendu Jain and Hua-Jun Zeng.
Abstract: Enterprise and cloud data centers are comprised of tens of thousands of servers providing petabytes of storage to a large number of users and applications. At such a scale, these storage systems face two key challenges: (a) hot-spots due to the dynamic popularity of stored objects and (b) high reconfiguration costs of data migration due to bandwidth oversubscription in the data center network. Existing storage solutions, however, are unsuitable to address these challenges because of the large number of servers and data objects. This paper describes the design, implementation, and evaluation of URSA, which scales to a large number of storage nodes and objects and aims to minimize latency and bandwidth costs during system reconfiguration. Toward this goal, URSA formulates an optimization problem that selects a subset of objects from hot-spot servers and performs topology-aware migration to minimize reconfiguration costs. As exact optimization is computationally expensive, we devise scalable approximation techniques for node selection and efficient divide-and-conquer computation. Our evaluation shows URSA achieves cost-effective load balancing while scaling to large systems and is time-responsive in computing placement decisions, e.g., about two minutes for 10k nodes and 10M objects.
Predico: A System for What-if Analysis in Complex Data Center Applications.
Rahul Singh, Prashant Shenoy, Maitreya Natu, Vaishali Sadaphal and Harrick Vin.
Abstract: Modern data center applications are complex distributed systems with tens or hundreds of interacting software components. An important management task in data centers is to predict the impact of a certain workload or reconfiguration change on the performance of the application. Such predictions require the design of "what-if" models of the application that take as input hypothetical changes in the application's workload or environment and estimate its impact on performance.
We present Predico, a workload-based what-if analysis system that uses commonly available monitoring information in large scale systems to enable the administrators to ask a variety of workload-based "what-if" queries about the system. Predico uses a network of queues to analytically model the behavior of large distributed applications. It automatically generates node-level queueing models and then uses model composition to build system-wide models. Predico employs a simple what-if query language and an intelligent query execution algorithm that employs on-the-fly model construction and a change propagation algorithm to efficiently answer queries on large scale systems. We have built a prototype of Predico and have used traces from two large production applications from a financial institution as well as real-world synthetic applications to evaluate its what-if modeling framework. Our experimental evaluation validates the accuracy of Predico's node-level resource usage, latency and workload-models and then shows how Predico enables what-if analysis in two different applications.
GreenWare: Greening Cloud-Scale Data Centers to Maximize the Use of Renewable Energy.
Yanwei Zhang, Yefu Wang and Xiaorui Wang.
Abstract: To reduce the negative environmental implications (e.g., CO2 emission and global warming) caused by the rapidly increasing energy consumption, many Internet service providers have started taking various initiatives to operate their cloud-scale data centers with renewable energy. Unfortunately, due to the intermittent nature of renewable energy sources such as wind turbines and solar panels, currently renewable energy is often more expensive than brown energy that is produced with conventional fossil-based fuel. As a result, utilizing renewable energy may impose a considerable pressure on the sometimes stringent operation budgets of Internet service providers. Therefore, two key questions faced by many cloud-service providers are 1) how to dynamically distribute service requests among data centers in different geographical locations, based on the local weather conditions, to maximize the use of renewable energy, and 2) how to do that within their allowed operation budgets.
In this paper, we propose GreenWare, a novel middleware system that conducts dynamic request dispatching to maximize the percentage of renewable energy used to power a network of distributed data centers, subject to the desired cost budget of the Internet service providers. Our solution first explicitly models the intermittent generation of renewable energy, e.g., wind power and solar power, with respect to varying weather conditions in the geographical location of each data center. We then formulate the core objective of GreenWare as a constrained optimization problem and propose an efficient request dispatching algorithm based on linear-fractional programming (LFP). We evaluate GreenWare with real-world weather, electricity price, and workload traces. Our experimental results show that GreenWare can significantly increase the use of renewable energy in cloud-scale data centers without violating the desired cost budget, despite the intermittent supplies of renewable energy in different locations and time-varying electricity prices and workloads.
Resource Provisioning Framework for MapReduce Jobs with Performance Goals.
Abhishek Verma, Ludmila Cherkasova and Roy Campbell.
Abstract: Many companies are increasingly using MapReduce for efficient large scale data processing such as personalized advertising, spam detection, and different data mining tasks. Cloud computing offers an attractive option for businesses to rent a suitable size Hadoop cluster, consume resources as a service, and pay only for resources that were utilized. One of the open questions in such environments is the amount of resources that a user should lease from the service provider. Often, a user targets specific performance goals and the application needs to complete data processing by a certain time deadline. However, currently, the task of estimating required resources to meet application performance goals is solely the users' responsibility. In this work, we introduce a novel framework and technique to address this problem and to offer a new resource sizing and provisioning service in MapReduce environments. For a MapReduce job that needs to be completed within a certain time, we build the job profile by using its past executions or by executing it on a smaller data set. Then, by applying scaling rules combined with a fast and efficient capacity planning model, we generate a set of resource provisioning options. Moreover, we design a model for estimating the impact of node failures on a job completion time to evaluate worst case scenarios. We validate the accuracy of our models using a set of realistic applications. The predicted completion times of generated resource provisioning options are within 10% of the measured times in our 66-node Hadoop cluster.
Resource-aware Adaptive Scheduling for MapReduce Clusters.
Jorda Polo, Claris Castillo, David Carrera, Yolanda Becerra, Ian Whalley, Malgorzata Steinder, Jordi Torres and Eduard Ayguade.
Abstract: We present a resource-aware scheduling technique for MapReduce multi-job workloads that aims at maximizing resource utilization across machines while observing completion time goals. To this end, we introduce the concept of `job slot management' into the core of MapReduce runtimes. Existing MapReduce schedulers define a static number of slots to represent the capacity of a cluster, creating a fixed number of execution slots per machine. These slots are later allocated to individual tasks from the workload. This abstraction is simple and may work for homogeneous workloads, but fails to capture the different resource requirements of individual jobs in multi-user environments, leading to resource under- or over-utilization. Our technique leverages job profiling information to provide dynamic adjustment of the number of slots to be created in each machine, as well as workload placement across them, to maximize the resource utilization of the cluster and reduce the execution makespan of the workload. At the same time, the technique is able to meet user-provided completion time goals for each job. We evaluate the proposed scheduling approach using an experimental implementation built on top of Hadoop, and using a real cluster.
A Content-Based Publish/Subscribe Matching Algorithm for 2D Spatial Objects.
Athanasios Konstantinidis, Antonio Carzaniga and Alexander Wolf.
Abstract: An important concern in the design of a publish/subscribe system is its expressiveness, which is the ability to represent various types of information in publications and to precisely select information of interest through subscriptions. We present an enhancement to existing content-based publish/subscribe systems with support for a 2D spatial data type and eight associated relational operators, including those to reveal overlap, containment, touching, and disjointedness between regions of irregular shape. We describe an algorithm for evaluating spatial relations that is founded on a new dynamic discretization method and region-intersection model. In order to make the data type practical for large-scale applications we provide an indexing structure for accessing spatial constraints and develop a simplification method for eliminating redundant constraints. Finally, we present the results of experiments evaluating the effectiveness and scalability of our approach.
FAIDECS Fair Decentralized Event Correlation.
Aaron Wilkin, K. R. Jayaram, Patrick Eugster and Ankur Khetrapal.
Abstract: Many distributed applications, including those for core dependability problems such as intrusion detection or fault localization, rely on a form of event correlation. Such applications, when not built as ad-hoc solutions, typically rely on broker overlay networks or centralized correlators. Redundancy in broker overlays may allow overcoming broker failures, but two processes interested in the same correlations can reach different outcomes. Centralized correlators constitute performance bottlenecks and single points of failure; duplicating them straightforwardly leads to nondeterminism like broker overlays.
This paper describes FAIDECS, a generic middleware system for fair decentralized correlation of events multicast among processes. We introduce a generic subset of FAIDECS’ predicate language, therein presenting the expression of correlation patterns. We present novel decentralized algorithms based on a distributed hashtable, yielding clear delivery properties in the presence of process failures. Our algorithms are compared under various workloads to naive solutions providing equivalent guarantees, illustrating benefits of FAIDECS.
AmbiStream: A Middleware for Multimedia Streaming on Heterogeneous Mobile Devices.
Emil Mircea Andriescu, Roberto Speicys Cardoso and Valérie Issarny.
Abstract: Multimedia streaming when smartphones act as both clients and servers is difficult. Indeed, multimedia streaming protocols and associated data formats supported by today's smartphones are highly heterogeneous. At the same time, multimedia processing is resource consuming while smartphones are resource-constrained devices. To overcome this complexity, we present AmbiStream, a lightweight middleware layer solution, which enables applications that run on smartphones to easily handle multimedia streams. Contrarily to existing multimedia-oriented middleware that propose a complete stack for multimedia streaming, our solution leverages the available highly-optimized multimedia software stack of the smartphones' platforms and complements them with additional, yet resource-efficient, layers to enable interoperability. We introduce the challenges, present our approach and discuss the experimental results obtained when executing AmbiStream on both Android and iOS smartphones. Our results show that it is possible to perform adaptation at run time and still obtain streams with satisfactory quality.
Virtualizing Stream Processing.
Michael Duller, Jan S. Rellermeyer, Gustavo Alonso and Nesime Tatbul.
Abstract: While stream processing systems have evolved into established solutions as standalone engines, they still lack flexibility in terms of large-scale deployment, integration, extensibility, and interoperability. In the last years, a substantial ecosystem of new applications has emerged that can potentially benefit from stream processing but introduces different requirements on how stream processing solutions can be integrated, deployed, extended, and federated. To address these needs, we present an exoengine architecture and the associated ExoP platform. Together, they provide the means for encapsulating components of stream processing systems as well as automating the data exchange between components and their distributed deployment. The proposed solution can be used, e.g., to connect heterogeneous streaming engines, replace operators at run time, and migrate operators across machines. Our experimental evaluation with the Linear Road benchmark indicates that the architecture has negligible overhead, while providing more flexibility in the deployment and management of streaming applications than it is possible with current systems.
Leader Election for Replicated Services Using Application Scores.
Diogo Becker, Flavio Junqueira and Marco Serafini.
Abstract: Replicated services often rely on a leader to order client requests and broadcast state updates. In this work, we present POLE, a leader election algorithm that uses scores to select leaders and application-specific functions to determine scores of servers. This flexibility given to the application enables the algorithm to tailor leader election according to metrics that are relevant to practical settings and that have been overlooked by existing approaches. Recovery time and request latency are examples of such metrics. To evaluate POLE, we use ZooKeeper, an open-source replicated service used for coordinating Web-scale applications. Our evaluation over realistic wide-area settings shows that application scores can have a significant impact on performance, and that just optimizing the latency of consensus does not translate into lower latency for clients. An important conclusion from our results is that obtaining a general strategy that satisfies a wide range of requirements is difficult, which implies that configurability is indispensable for practical leader election.
PolyCert: Polymorphic Self-Optimizing Replication for In-Memory Transactional Grids.
Maria Couceiro, Paolo Romano and Luis Rodrigues.
Abstract: In-memory NoSQL transactional data grids are emerging as an attractive alternative to conventional relational distributed databases. In these platforms, replication plays a role of paramount importance, as it represents the key mechanism to ensure data durability. In this work we focus on Atomic Broadcast (AB) based certification replication schemes, which have recently emerged as much more scalable alternative to classical replication protocols based on active replication or atomic commit protocols. We first show that, among the proposed AB-based certification protocols, no-one-fits-all solution exist that achieve optimal performance when considering heterogeneous workloads produced by complex transactional applications. Next, we present PolyCert, a polymorphic certification protocol that allows for the concurrent coexistence of different certification protocols and that relies on machine-learning techniques to determine the optimal certification scheme on a per transaction basis. We design and evaluate two alternative oracles, based on parameter-free machine learning techniques that rely both on off-line and on-line training approaches. Our experimental results demonstrate the effectiveness of the proposed approach, highlighting that PolyCert is capable of achieving performance extremely close to those of an optimal non-adaptive certification protocol in presence of non heterogeneous workloads, and significantly outperform any non-adaptive protocol when used with realistic, complex applications that generate heterogeneous workloads.
CacheGenie: A Trigger-Based Middleware Cache for ORMs.
Priya Gupta, Nickolai Zeldovich and Samuel Madden.
Abstract: Caching is an important technique in scaling storage for high-traffic web applications. Usually, building caching mechanisms involves significant effort from the application developer to maintain and invalidate data in the cache. In this work we present CacheGenie, a caching middleware which makes it easy for web application developers to use caching mechanisms in their applications. CacheGenie provides high-level caching abstractions for common query patterns in web applications. Using these abstractions, the developer does not have to worry about managing the cache (e.g., insertion and deletion) or maintaining consistency (e.g., invalidation or updates) when writing application code.
We designed and implemented CacheGenie in the popular Django web application framework, with PostgreSQL as the database backend and memcached as the caching layer. We use triggers inside the database to automatically invalidate or update cached data, as desired by the developer. CacheGenie requires no modifications to PostgreSQL or memcached. To evaluate our prototype, we ported several Pinax web applications to use our caching abstractions. Our results show that it takes little effort for application developers to use CacheGenie, and that caching improves throughput by 2–2.5x for read-mostly workloads.
Deploy, Adjust and Readjust: Supporting Dynamic Reconfiguration of Policy Enforcement.
Gabriela Gheorghe, Bruno Crispo, Lieven Desmet, Wouter Joosen and Roberto Carbone.
Abstract: In the context of large distributed applications, security and performance are two requirements often difficult to satisfy simultaneously. For instance, caching data needed for security decisions can lead to security violations when the data changes faster than the cache can refresh it. Retrieving such fresh data without caching it impacts performance, while combining fresh data with cached data can affect both security and performance. Typically, performance and security are addressed separately, which usually leads to fast systems with security holes, rather than secure systems with poor performance. In this paper, we examine how to dynamically configure an authorisation system to an application that needs to be fast and secure. We look at data caching, attribute retrieval and correlation, and propose a runtime management tool that, with input from a domain expert, can find and enact the configurations that enhance both security and performance needs. We show how such a tool can be integrated in a SOA environment at the middleware level by means of the enterprise service bus.
A Middleware Layer for Flexible and Cost-efficient Multi-Tenant Applications.
Stefan Walraven, Eddy Truyen and Wouter Joosen.
Abstract: Application-level multi-tenancy is an architectural design principle for Software-as-a-Service applications to enable the hosting of multiple customers (or tenants) by a single application instance. Despite the operational cost and maintenance benefits of application-level multi-tenancy, the current middleware component models for multi-tenant application design are inflexible with respect to providing different software variations to different customers.
In this paper we show that this limitation can be solved by a multi-tenancy support layer that combines dependency injection with middleware support for tenant data isolation. Dependency injection enables injecting different software variations on a per tenant basis, while dedicated middleware support facilitates the separation of data and configuration metadata between tenants. We implemented a prototype on top of Google App Engine and we evaluated by means of a case study that the improved flexibility of our approach has little impact on operational costs and upfront application engineering costs.
Bridging the Interoperability Gap: Overcoming Combined Application and Middleware Heterogeneity.
Yerom-David Bromberg, Paul Grace, Laurent Reveillere and Gordon S. Blair.
Abstract: Interoperability remains a significant challenge in today's distributed systems; it is necessary to quickly compose and connect (often at runtime) previously developed and deployed systems in order to build more complex systems of systems. However, such systems are characterised by heterogeneity at both the application and middleware-level, where application differences are seen in terms of incompatible interface signatures and data content, and at the middleware level in terms of heterogeneous communication protocols. Consider a Flickr client implemented upon the XML-RPC protocol being composed with Picassa's Service; here, the Flickr and Picassa APIs differ significantly, and the underlying communication protocols are different. A number of ad-hoc solutions exist to resolve differences at either distinct level, e.g., data translation technologies, service choreography tools, or protocol bridges; however, we argue that middleware solutions to interoperability should support developers in addressing these challenges using a unified framework. For this purpose we present the Starlink framework, which allows an interoperability solution to be specified using domain specific languages that are then used to generate the necessary executable software to enable runtime interoperability. We demonstrate the effectiveness of Starlink using a number of application case-studies and show that it successfully resolves combined application and middleware heterogeneity.
The Role of Ontologies in Emergent Middleware: Supporting Interoperability in Complex Distributed Systems.
Gordon S. Blair, Amel Bennaceur, Nikolaos Georgantas, Paul Grace, Valerie Issarny, Massimo Paolucci and Vatsala Nundloll.
Abstract: Interoperability is a fundamental problem in distributed systems, and an increasingly difficult problem given the level of heterogeneity and dynamism exhibited by contemporary systems. While some progress has been made, we argue that complexity is now at a level such that existing approaches are inadequate and that a major re-think is required to identify principles and associated techniques to achieve this central property of distributed systems. In this paper, we postulate that emergent middleware is the right way forward; that is, middleware which supports the generation of distributed system infrastructure dynamically for the current operating environment and context. In particular, the paper focuses on the key role of ontologies in supporting this process and in providing underlying meaning and associated reasoning capabilities to allow the right run-time choices to be made. The paper presents the CONNECT middleware architecture as an example of emergent middleware and highlights the role of ontologies as a cross-cutting concern throughout this architecture. Three experiments are also described as initial evidence of the potential role of ontologies in middleware architectures. Important remaining challenges are also documented.
Co-Managing Software and Hardware Modules through the Juggle Middleware.
Jan S Rellermeyer and Ramon Kuepfer.
Abstract: Reprogrammable hardware like Field-Programmable Gate Arrays (FPGAs) is becoming increasingly powerful and affordable. Modern FPGA chips can be reprogrammed at runtime and with low latency which makes them attractive to be used as a dynamic resource in systems. For instance, on mobile devices FPGAs can help to accelerate the performance of critical tasks and at the same time increase the energy-efficiency of the device. The integration of FPGA resources into commodity software, however, is a highly involved task. On the one hand, there is an impedance mismatch between the hardware description languages in which FPGAs are programmed and the high-level languages in which many mobile applications are nowadays developed. On the other hand, the FPGA is a limited and shared resource and as such requires explicit resource management. In this paper, we present the Juggle middleware which leverages the ideas of modularity and service-orientation to facilitate a seamless exchange of hardware and software implementations at runtime. Juggle is built around the well-established OSGi standard for software modules in Java and extends it with support for services implemented in reprogrammable hardware, thereby leveraging the same level of management for both worlds. We show that hardware-accelerated services implemented with Juggle can help to increase the performance of applications and reduce power consumption on mobile devices without requiring any changes to existing program code.
A generic solution for agile run-time inspection middleware.
Wouter De Borger, Bert Lagaisse and Wouter Joosen.
Abstract: Contemporary middleware offers powerful abstractions to construct distributed software systems. However, when inspecting the software at run-time, these abstractions are no longer visible. While inspection, monitoring and management are increasingly important in our always-online world, they are often only possible in terms of the lower-level abstraction of the underlying platform. Due to the complexity of current programming languages and middleware, this low-level information is too complex to handle and/or understand.
This paper presents a run-time inspection system based on dynamic model transformation capabilities that extends run-time entities with higher-level abstract views, in order to allow inspection in terms of the original and most relevant abstractions. Our solution is lightweight in terms of performance overhead and agile in the sense that it can selectively (and on-demand) generate these high-level views.
Our prototype implementation has been applied to inspect distributed applications using RMI. In this case study, we inspect the distributed RMI system using our integrated overview over the collection of distributed objects that interact using remote method invocation.
A Comparison of Secure Multi-tenancy Architectures for Filesystem Storage Clouds.
Anil Kurmus, Moitrayee Gupta, Roman Pletka, Christian Cachin and Robert Haas.
Abstract: A filesystem-level storage cloud offers network-filesystem access to multiple customers at low cost over the Internet. In this paper, we investigate two alternative architectures for achieving multi-tenancy securely and efficiently in such storage cloud services. They isolate the customers in virtual machines at the hypervisor level and through mandatory access-control checks in one shared operating-system kernel, respectively. We compare and discuss the practical security guarantees of these architectures. We have implemented both approaches and report on performance measurements to compare them.
SafeWeb: A Middleware for Securing Ruby-based Web Applications.
Petr Hosek, Matteo Migliavacca, Ioannis Papagiannis, David Eyers, David Evans, Brian Shand, Jean Bacon and Peter Pietzuch.
Abstract: Web applications in many domains such as healthcare and finance must process sensitive data, while complying with legal policies regarding the release of different classes of data to different parties. Currently, software bugs may lead to irreversible disclosure of confidential data in multi-tier web applications. An open challenge is how developers can guarantee these web applications only ever release sensitive data to authorised users without costly, recurring security audits.
Our solution is to provide a trusted middleware that acts as a “safety net” to event-based web applications by preventing harmful data disclosure before it happens. We describe the design and implementation of SafeWeb, a Ruby-based middleware that associates data with security labels and transparently tracks their propagation at different granularities across a multi-tier web architecture with storage and complex event processing. For efficiency and ease-of-use, SafeWeb exploits the dynamic features of the Ruby programming language to achieve a low performance overhead and require few code changes in legacy applications. We evaluate SafeWeb by reporting our experience of implementing a web-based cancer treatment application and deploying it as part of the UK National Health Service (NHS).
Industry Short Papers
Scalable Real Time Data Management for Smart Grid.
Abstract: This presents GridMW, a scalable and reliable data middleware for smart grids. Smart grids promise to improve the efficiency of power grid systems and reduce green house emissions through incorporating power generation from renewable sources and shaping demands to match the supply. As a result, power grid systems will becomes much more dynamic and require constant adjustments, which offer opportunities to develop analysis and decision making applications to improve the efficiency and reliability of smart grid systems. However, these applications must rely on the data gathered from smart grids. Millions of sensors, including phase measurement units and smart meters, are being deployed over the smart grid system. Existing data middleware does not have the capability to collect, store, retrieve, and deliver the enormous amount of data to analysis and control applications. We observed that most existing systems use high-level systems for flexibility so that these software systems can provide general functionality for a range of applications. However, indirection from these high level APIs cause high overhead and unpredictability, which in turn prevents us from achieve real time and high throughput. By tailing our system specifically to smart grids, we are able to eliminate much of indirection while still keep the implementation effort reasonable. Using a log structure inspired architecture, we are able to directly access the block device layer, eliminating much indirection incurred by file systems. We also leverage RDMA to eliminate operating system overhead, copying between application and network software stacks. Our preliminary results show that we improve the performance over existing system by 3 orders of magnitudes.
Enhancing Traceability and Industrial Process Automation through the VIRTUS Middleware.
Paolo Brizzi, Antonio Lotito, Enrico Ferrera, Davide Conzon, Riccardo Tomasi and Maurizio A. Spirito.
Abstract: Information and Communication Technologies (ICT) are considered a key instrument to improve efficiency and flexibility of industrial processes. This paper provides an experience report about the application of an ICT-based approach, derived from the Internet-of-Things (IoT) concept, to logistics in industrial manufacturing environments, aimed at enhancing awareness and control of logistic flows. The described solution performs assets management and inbound-outbound monitoring of goods by interconnecting business processes entities and devices providing physical-world data through an existing IoT-oriented middleware named VIRTUS. The VIRTUS Middleware, based on the open XMPP standard protocol and leveraging the OSGi framework, provides a scalable, agile, event-driven, network independent tool to manage an ecosystem of heterogeneous interconnected objects. The described solution has been validated within an actual industrial environment made of geographically-separated production plants.
INSIGHT: Interoperability and Service Management for the Digital Home.
Charbel El Kaed, Loic Petit, Maxime Louvel, Antonin Chazalet, Yves Denneulin and François Gael Ottogali.
Abstract: The emergence of plug-n-play protocols, multimedia and ubiquitous applications is shaping the human habitat into a digital one. The actual plug-n-play device proliferation encourages the development of ubiquitous applications providing the user with a wide set of services to accomplish his everyday tasks. Moreover, the user requires a smooth and transparent interaction with each of his devices and a high quality multimedia experience. However, the digital home is more than ever an heterogeneous and complex environment, devices differ in terms of resources and networking protocols turning the user requirements into an almost impossible task. This paper presents the INSIGHT middleware involving four modules to simplify the digital home complexity. DOXEN provides a cross-device interaction by automatically generating adequate proxies. The DomVision scans the digital home, processes and stores valuable information for management and troubleshooting. The Service Level Checking analyzes the DomVision's information to provide a personalized service offer for a specific home. And finally, the Resource Manager guarantees the end-user QoS. We also show, through a real experimentation, the ability of INSIGHT to tackle the digital home's complexity.
Experimental Evaluation of Software Aging Effects on the Eucalyptus Cloud Computing Infrastructure.
Jean Araujo, Rubens Matos, Paulo Maciel, Rivalino Matias and Ibrahim Beicker.
Abstract: The need for reliability and availability has increased in modern applications, which need to handle rapidly growing demands while providing uninterrupted service. This work investigates the memory leak and memory fragmentation aging effects on the Eucalyptus cloud-computing framework, which considers workloads composed of intensive requests addressing different virtual machines. We experimentally show the existence of the investigated aging effects in the cloud environment under study. Also, a software rejuvenation strategy to mitigate the observed aging effects is proposed and its benefits are evaluated.
Advanced Adaptive Application (A3) Environment- Initial Experience.
Partha Pal, Richard Schantz, Aaron Paulos, John Regehr and Mike Hibler.
Abstract: This paper introduces the notion of execution-containing security-focused adaptive middleware that mediates the protected application’s interactions with the environment. We describe the prevention-focused and adaptive middleware-based mechanisms implemented as part of the Advanced Adaptive Applications (A3) Environment along with initial evaluation results. A3 is a near-application and application-focused cyber-defense technology being developed under the DARPA Clean-slate design of Resi-lient, Adaptive, Secure Hosts (CRASH) program. The prevention-focused mechanisms represent the progress made in the first year of a four year project, and constitute one of the three pillars of A3, the other two being advanced state management and replay with modification, which are currently under development.
An Empirical Analysis of Similarity in Virtual Machine Images.
K. R. Jayaram, Chunyi Peng, Zhe Zhang, Mikyong Kim, Han Chen and Hui Lei.
Abstract: Large virtual machine (VM) image files necessitate the expenditure of large amounts of storage space and I/O bandwidth to handle cloud management tasks such as VM image creation and VM instance instantiation. Fortunately, VM images exhibit similarity among each other, primarily due to similar operating systems or applications that they contain. This similarity can be leveraged by cloud management middleware to reduce the total amount of image data to be stored, as well as to facilitate content-aware caching of VM images on hypervisor hosts. To efficiently design such deduplication and caching mechanisms, it is essential to understand how much similarity can be found in real-world cloud environments. This paper empirically analyzes the similarity within and between 525 VM images from a production IaaS cloud. Besides presenting the overall level of content similarity, we have also discovered interesting insights on multiple factors affecting the similarity pattern, including the image creation time and the location in the image’s address space. Moreover, it is found that similarities between pairs of images exhibit high variance, and an image is very likely to be more similar to a small subset of images than all other images in the repository. Groups of data chunks often appear in the same image. These image and chunk “clusters” can help predict future data accesses, and therefore provide important hints to cache placement, eviction, and prefetching.