January 26, 2012

Attending FAST'12 - Presenting WiP/Poster about "Trusted Storage"

The technical program of FAST'12  includes very exiting papers. Among others I', especially looking forward to session "Implications of New Storage Technology", "Back It Up" and "Flash and SSDs".

Our work on "Trusted Storage" is part of the Work in progress session with a poster.
Abstract:
We study the properties, design, implementation and performance of trusted storage, an architecture that ensures the integrity, confidentiality and accountability of data, by enforcing storage policies at the lowest layer of a storage system, within the hardware and firmware of disk enclosures. The guarantees provided by trusted storage depend only on the integrity and correctness of the trusted device/enclosure firmware and hardware, not on the absence of bugs and security vulnerabilities in any higher level software of a system and operator error or malice.

I look forward to see you at FAST!!!

August 31, 2011

My summer drink/cocktail

Mix together

1/3 Cherry Juice
1/3 Maracuja Juice
1/3 Sparkling Water

It works fine as nonalcoholic drink as well as with various liquors (e.g. vodka or rum).

March 10, 2011

Recent diskmodels for DiskSim

Using Dixtrac, a tool which is part of DiskSim, we extracted models for two recent SCSI disks. Please see the following list to choose your favorite model.

Hybrid disk simulation using DiskSim

DiskSim is a well known simulator for storage developed at CMU. It simulates disks or SSD devices (with an extension from Microsoft). Unfortunately it does not support to combine both, having a hybrid disk like Seagates Momentus. The following few paragraphs show how you to implement a hybrid disk.

For convenience you can also download the hybrid disksim package here including all changes.

First download DiskSim and the SSD extension to a 32-bit Linux machine. Extract Disksim and extract the ssdmodel folder from the extension into the disksim-4.0 folder. Follow the directions of the README file in the ssdmodel folder. Run make in the disksim-4.0 folder and run the validation scripts of disksim as well as from the ssd extension.

Download this file (download) which includes the diff of the original disksim version and the hybrid disksim. Apply the changes. Recompile the entire package.

Now you should be able to instantiate the following layout of components:
topology disksim_iodriver driver0 [
   disksim_bus bus_dr2c [
      disksim_ctlr ctlr0 [ 
         disksim_bus bus_c2di [
            disksim_disk disk0 []
         ],
         disksim_bus bus_c2f [
             ssdmodel_ssd ssd0 []
         ]
      ]
   ]
]

Here the controller has two different buses which separate the accesses to the disk or the SSD. A complete parameter file can be found here.

Now you're ready to use it.

Important Downloads:

February 28, 2011

MTBF (Mean Time Between Failures) Explained

Just found an interesting description from  Kevin Daly (http://www.ece.cmu.edu/~ganger/ece546.spring02/readings/mtbf.description).

"What does MTBF have to do with lifetime? Nothing at all! It is not at all unusual for things to have MTBF's which significantly exceed their lifetime as defined by wearout, ... . A "thirty-something" American has a failure (death) rate of about 1.1 deaths per 1000 person-years and, therefore, has an MTBF of 900 years (of course its really 900 person-years per death). Even the best ones, however, wear out long before that."

February 15, 2011

Fine-grained parallelism in Determinator

At this years OSDI Brain Ford and his group presented Determinator, a proof-of-concept operating system.[1] They give evidence that Determinator can enforce determinism at the operating system level. They introduce a race-free model of shared state by coping the state at fork and explicitly join. When a new thread is forked, it creates its own memory copy which uses the copy-on-write method for memory accesses. On each explicit join operation of the child thread will be merged into the parent space. Conflicts that might occur will be reported as an error.

Today's operating systems do not provide any guarantees about determinism of their parallel execution.  For example debugging, fault tolerance and security would benefit from a deterministic operating system. All program failures due to race conditions would basically vanish. Current approaches achieve determinism only at high costs. Many of these approaches rely on a deterministic scheduler.

In the original evaluations Determinator performs well, despite for fine-grained parallelism. Especially their LU-decomposition performance is low due to fine-grained parallelism. This issue mainly rises from the fact that no shared memory is available by default and if it is used, the costs to maintain determinism increases. The authors are looking forward to future hardware enhancements. In the following sections we discuss how a change within the architecture could tackle these issues.

Our key idea is that the system could benefit for these workloads from additional parallel programming patterns. Other parallel patterns than fork join could potentially tend to communicate and reuse the forked threads in a more efficient way.

The idea is to integrate additional parallel programming patterns directly into the architecture of the system. Instead of forking only a single thread, we suggest to fork several threads which are connected using a programming pattern. The determinism is enforced by the operating system which supports the programming pattern and has knowledge about its structure. We show how this could be achieved for the pipeline pattern. In [2, 4] two ways are described how LU-decompositions could be turned into a pipelined execution.

Integrating these patterns into the operating system offers several advantages which are led by the usability for the programmer. The determinism would be applied by the underlying system without any additional effort by the programmer. Furthermore optimizations and statistics about programs could be analyzed by the operating system and turned into a performance gain during the execution.

A pipeline consists of many compute units and their ordering. The input of each compute unit is linked to the predecessor's output. Usually this is supported by a shared double-ended queue. Recent research has evolved an algorithm that does not block and use mutual exclusion.[3] In the current fork join programming model each compute unit will be a separate space with a shared-memory region with the predecessor and successor. Inputs for the first compute unit will be provided by an external service. It splits the input into chunks and gives them to the first compute unit. The overall output is delivered after each compute unit has executed the inputs one after the other and the last compute unit delivered the chunk. The more compute units are used, the more parallelism can be achieved.

A pipeline itself is deterministic by definition, when the input is supplied in the same order. This could be achieved by a deterministic split mechanism which is usually the case. We introduce non-deterministic behavior by having multiple instances of each compute unit (compute group). Each input chunk would have to be identified by a total order. The compute group would then compute chunks in the given order. The input queue of the compute group would not apply the ordering by itself. The compute group has to keep a shared counter which is set using the test-and-set command of the processor. Hence the compute group has to share a small memory region too.

Advantages of integrating additional parallel programming patterns are the different characteristics of patterns. We have shown how this applies to the pipeline pattern. It might also apply to patterns like master worker or work stealing.

The characteristics of the pipeline pattern would allow finer grained parallelism than with the fork join pattern. It also allows to reuse the forked pipeline for later use without forking and synchronizing them again. Besides, adding the pipeline pattern also helps to process streams of input data. Currently these workloads would require extensive communication between the child and its parent.

[1] A. Aviram, S.-C.Weng, S. Hu, and B. Ford. Efficient system-enforced deterministic parallelism. In Proceedings of the 9th USENIX conference on Operating systems design and implementation, OSDI'10. USENIX Association, 2010.
[2] F. Desprez, B. Tourancheau, and J. J. Dongarra. Performance complexity of lu factorization with efficient pipelining and overlap on a multiprocessor. Technical report, University of Tennessee, Oak Ridge National Laboratory, LIP, 1994.
[3] D. Hendler, Y. Lev, M. Moir, and N. Shavit. A dynamic-sized nonblocking work stealing deque. Technical report, Tel-Aviv University, Brown University, Sun Microsystems Laboratories, 2005. 
[4] M. Mosleh, S. Setayeshi, and M. Kheyrandish. Presenting a systematic method for lu decomposition of a matrix with linear systolic arrays. In Proceedings of the 2008 International Conference on Advanced Computer Theory and Engineering.

February 08, 2011

First!

Welcome to my blog.


I'll write about my current ideas, read papers & attended talks and other things that might come along.

Anjo