Published Papers

First Next Last

Page 1 of 19

    Bibtex for ANGEL: A Hierarchical Approach to Multi-Objective Online Auto-Tuning:

    @inproceedings{
     	author = "Ray Chen and Jeffrey K. Hollingsworth",
     	title = "ANGEL: A Hierarchical Approach to Multi-Objective Online Auto-Tuning",
     	month = "06",
     	year = "2015",
    	booktitle=Proceedings of the 5th International Workshop on Runtime and Operating Systems for Supercomputers,
    	series=ROSS '15,
    	year=2015,
    	isbn=978-1-4503-3606-2,
    }

  • Ray Chen and Jeffrey K. Hollingsworth, ANGEL: A Hierarchical Approach to Multi-Objective Online Auto-Tuning, Proceedings of the 5th International Workshop on Runtime and Operating Systems for Supercomputers, June, 2015, PDF linkBibTeX
  • Barton P. Miller and Xiaozhu Meng, Binary Code Is Not Easy, March 2015, PDF linkBibTeX
    Bibtex for Detecting Code Reuse Attacks with a Model of Conformant Program Execution:

    @inproceedings{
     	author = "Emily R. Jacobson, Andrew R.  Bernat, William R. Williams and Barton P. Miller",
     	title = "Detecting Code Reuse Attacks with a Model of Conformant Program Execution",
     	month = "03",
     	year = "2014",
    	booktitle=European Symposium on Secure Software and Systems (ESSoS),
    	address=Munich Germany,
    }

  • Emily R. Jacobson, Andrew R. Bernat, William R. Williams and Barton P. Miller, Detecting Code Reuse Attacks with a Model of Conformant Program Execution, European Symposium on Secure Software and Systems (ESSoS), Munich Germany, March, 2014, PDF linkBibTeX
  • Jeffrey K. Hollingsworth and Sukhyun Song, Designing and Auto-Tuning Parallel 3-D FFT for Computation-Communication Overlap, Feb. 2014 PPOPP'14, PDF link
    Bibtex for Poster Abstract: Detecting Code Reuse Attacks with a Model of Conformant Program Execution:

    @inproceedings{
     	author = "Emily R. Jacobson, Andrew R Bernat, William R. Williams and Barton P. Miller",
     	title = "Poster Abstract: Detecting Code Reuse Attacks with a Model of Conformant Program Execution",
     	month = "10",
     	year = "2013",
    	booktitle=Research in Attacks Intrusions and Defenses,
    	location=St. Lucia,
    }

  • Emily R. Jacobson, Andrew R Bernat, William R. Williams and Barton P. Miller, Poster Abstract: Detecting Code Reuse Attacks with a Model of Conformant Program Execution, Research in Attacks Intrusions and Defenses, October, 2013, PDF linkBibTeX
    Bibtex for Mining Software Repositories for Accurate Authorship:

    @inproceedings{
     	author = "Xiaozhu Meng, Barton P. Miller, William R. Williams and Andrew R.  Bernat",
     	title = "Mining Software Repositories for Accurate Authorship",
     	month = "09",
     	year = "2013",
    	booktitle=29th IEEE International Conference on Software Maintenance,
    	location=Eindhoven Netherlands,
    }

  • Xiaozhu Meng, Barton P. Miller, William R. Williams and Andrew R. Bernat, Mining Software Repositories for Accurate Authorship, 29th IEEE International Conference on Software Maintenance, September, 2013, PDF linkBibTeX
  • Jeffrey K. Hollingsworth and Ray Chen, Towards Fully Automatic Auto-tuning: Leveraging language features of Chapel, Nov. 2013 International Journal of High Performance Computing Applications, 27(4), PDF link
  • Jeffrey K. Hollingsworth and Michael O. Lam, Dynamic Floating-point cancellation detection, March 2013 Parallel Computing (39)3, PDF link
  • Jeffrey K. Hollingsworth and Nick Rutar, Software Techniques for Negating Skid and Approximating Cache Miss Measurements, March 2013 Parallel Computing, (39)3, PDF link
  • Matthew Legendre , Bronis R. de Supinski, Jeffrey K. Hollingsworth and Michael O. Lam, Automatically Adapting Programs for Mixed-Precision Floating-Point Computation, June 2013 Proceedings of the International Conference on Supercomputing (ICS), PDF link
    Bibtex for Binary-Code Obfuscations in Prevalent Packer Tools:

    @article{
     	author = "Barton P. Miller and Kevin A. Roundy",
     	title = "Binary-Code Obfuscations in Prevalent Packer Tools",
     	month = "June",
     	year = "2012",
    	journal=ACM Computing Surveys,
    }

  • Barton P. Miller and Kevin A. Roundy. (2012, June). Binary-Code Obfuscations in Prevalent Packer Tools. ACM Computing SurveysPDF linkBibTeX
    Bibtex for "Structured Binary Editing with a CFG Transformation Algebra:

    @article{
     	author = "Barton P. Miller and Andrew R.  Bernat",
     	title = ""Structured Binary Editing with a CFG Transformation Algebra",
     	month = "June",
     	year = "2012",
    	journal=under submission,
    }

  • Barton P. Miller and Andrew R. Bernat. (2012, June). "Structured Binary Editing with a CFG Transformation Algebra. under submissionPDF linkBibTeX
    Bibtex for A Scalable Failure Recovery Model for Tree-based Overlay Networks:

    @techreport{
     	author = "Dorian C. Arnold and Barton P. Miller",
     	title = "A Scalable Failure Recovery Model for Tree-based Overlay Networks",
     	month = "May",
     	year = "2012",
    	institution=University of Wisconsin Madison,
    	number=Technical Report #TR1626,
    }

  • Dorian C. Arnold and Barton P. Miller, "A Scalable Failure Recovery Model for Tree-based Overlay Networks", University of Wisconsin Madison, Technical Report #TR1626, 2012PDF linkBibTeX
    Bibtex for Data Centric Techniques for Mapping Performance Data to Program Variables:

    @article{
     	author = "Jeffrey K. Hollingsworth and Nick Rutar",
     	title = "Data Centric Techniques for Mapping Performance Data to Program Variables",
     	month = "01",
     	year = "2012",
    	journal="Parallel Computing",
    	volume="38",
    	number="1&2",
    	pages="2 - 14",
    }

  • Jeffrey K. Hollingsworth and Nick Rutar. (2012, January). Data Centric Techniques for Mapping Performance Data to Program Variables. "Parallel Computing", "38" No. "1&2", PDF linkBibTeX
  • Jeffrey K. Hollingsworth and Tugrul Ince, Compiler Help for Binary Manipulation Tools, August 2012 5th Workshop on Productivity and Performance, Rhodes GR, PDF link
    Abstract for ANGEL: A Hierarchical Approach to Multi-Objective Online Auto-Tuning:

    The majority of auto-tuning research in HPC to date has focused on improving the performance of a single objective function. More recently, interest has grown in optimizing multiple objective functions simultaneously, for example balancing the trade-off between execution time and energy consumption. Evolutionary algorithms for multi-objective optimization attempt to quickly and accurately discover the set of Pareto-optimal solutions, or the Pareto frontier. However, a single solution (as opposed to a set) is desirable for online end-to-end auto-tuning. As an alternative, we use the hierarchical method based on prioritizing the objective functions and iteratively optimizing them in isolation to efficiently produce a single Pareto-optimal result. This approach has several advantages over evolutionary or even scalarization techniques for auto-tuning HPC problems. In this paper, we introduce the Automated Navigation Given Enumerated Leeways (ANGEL) auto-tuning method. We demonstrate the quality of ANGEL using a multi-objective benchmark test suite. We also demonstrate its utility by approximating various sections of a Pareto front from a real-world proxy application. ANGEL successfully navigates trade-offs that range from a 15% reduction in kernel run-time to a 3% savings in energy us.

    Abstract for Binary Code Is Not Easy:

    Binary code analysis is an enabling technique for many applications. Binary code analysis tool kits provide several capabilities to automatically analyze binary code, including decoding machine instructions, building control flow graphs of the program, and determining function boundaries. Modern compilers and run-time libraries have introduced significant complexities to binary code. These complexities negatively affect the capabilities of binary analysis tool kits, which may cause tools to report inaccurate information about binary code. Analysts may hence be confused and applications based on binary analysis tool kits may have degrading quality. We examine the problem of constructing control flow graphs from binary code and labeling the graphs with accurate function boundary annotations. We identify eight challenging code constructs that represent hard-to-analyze aspects of the code from modern compilers, and show code examples that illustrate each code construct. As part of this discussion, we discuss how the new code parsing algorithms in the open source Dyninst tool kit support these constructs. In particular, we present a new model for describing jump tables that improves our ability to precisely determine the control flow targets, a new interprocedural analysis to determine when a function is non-returning, and techniques for handling tail calls, code overlapping between functions, and code overlapping within instructions. We created test binaries patterned after each challenging code construct and evaluated how tool kits, including Dyninst, fare when parsing these binaries.

    Abstract for Detecting Code Reuse Attacks with a Model of Conformant Program Execution:

    Code reuse attacks circumvent traditional program protection mechanisms such as W^X by constructing exploits from code already present within a process. Existing techniques to defend against these attacks provide ad hoc solutions or lack in features necessary to provide comprehensive and adoptable solutions. We present a systematic approach based on first principles for the efficient, robust detection of these attacks; our work enforces expected program behavior instead of defending against anticipated attacks. We define conformant program execution (CPE) as a set of requirements on program states. We demonstrate that code reuse attacks violate these requirements and thus can be detected; further, new exploit variations will not circumvent CPE. To provide an efficient and adoptable solution, we also define observed conformant program execution, which validates program state at system call invocations; we demonstrate that this relaxed model is sufficient to detect code reuse attacks. We implemented our algorithm in a tool, ROPStop, which operates on unmodified binaries, including running programs. In our testing, ROPStop accurately detected real exploits while imposing low overhead on a set of modern applications: 5.4% on SPEC CPU2006 and 6.3% on an Apache HTTP Server.

    Abstract for Designing and Auto-Tuning Parallel 3-D FFT for Computation-Communication Overlap:

    This paper presents a method to design and auto-tune a new parallel 3-D FFT code using the non-blocking MPI all-to-all operation. We achieve high performance by optimizing computation-communication overlap. Our code performs fully asynchronous communication without any support from special hardware. We also improve cache perfor- mance through loop tiling. To cope with the complex tradeoff regarding our optimization techniques, we parameterize our code and auto-tune the parameters efficiently in a large parameter space. Experimental results from two systems confirm that our code achieves a speedup of up to 1.76× over the FFTW library.

    Abstract for Poster Abstract: Detecting Code Reuse Attacks with a Model of Conformant Program Execution:

    Code reuse attacks circumvent traditional program protection mechanisms by constructing exploits from code already present within a process. Security analysts require new techniques to detect these attacks. We define conformant program execution as a set of requirements on program states; we demonstrate that code reuse attacks violate these requirements and thus can be detected. Monitoring each program state during execution would have high overhead, so we define observed conformant program execution, which validates program state at system call invocations, and demonstrate that this relaxed model is still sufficient to detect code reuse attacks. We have implemented our model of observed conformant program execution in a tool, ROPStop. Unlike previous work, ROPStop does not rely on known attack characteristics and runs on unmodified binaries. In our testing, ROPStop accurately detected real exploits while imposing an average 5.42% overhead on a reference set of conventional binaries drawn from SPEC CPU2006.

    Abstract for Mining Software Repositories for Accurate Authorship:

    Code authorship information is important for analyzing software quality, performing software forensics, and improving software maintenance. However, current tools assume that the last developer to change a line of code is its author regardless of all earlier changes. This approximation loses important information. We present two new line-level authorship models to overcome this limitation. We first define the repository graph as a graph abstraction for a code repository, in which nodes are the commits and edges represent the development dependencies. Then for each line of code, structural authorship is defined as a subgraph of the repository graph recording all commits that changed the line and the development dependencies between the commits; weighted authorship is defined as a vector of author contribution weights derived from the structural authorship of the line and based on a code change measure between commits, for example, best edit distance. We have implemented our two authorship models as a new git built-in tool git-author. We evaluated git-author in an empirical study and a comparison study. In the empirical study, we ran git-author on five open source projects and found that git-author can recover more information than a current tool (git-blame) for about 10\% of lines. In the comparison study, we used git-author to build a line-level model for bug prediction. We compared our line-level model with an existing file-level model. The results show that our line-level model performs consistently better than the file-level model when evaluated on our data sets produced from the Apache HTTP server project.

    Abstract for Towards Fully Automatic Auto-tuning: Leveraging language features of Chapel:

    Application auto-tuning has produced excellent results in a wide range of computing domains. Yet adapting an application to use this technology remains a predominately manual and labor intensive process. This paper explores first steps towards reducing adoption cost by focusing on two tasks: parameter identification and range selection. We show how these traditionally manual tasks can be automated in the context of Chapel, a parallel programming language developed by Cray Inc. Potential auto-tuning parameters may be inferred from existing Chapel applications by leveraging features unique to this language. After verification, these parameters may then be passed to an auto-tuner for an automatic search of the induced parameter space. To further automate adoption, we also present Tuna: an auto-tuning shell designed to tune applications by manipulating their command-line arguments. Finally, we demonstrate the immediate utility of this system by tuning two Chapel applications with little or no internal knowledge of the program source.

    Abstract for Dynamic Floating-point cancellation detection:

    As scientific computation continues to scale, it is crucial to use floating-point arithmetic processors as efficiently as possible. Lower precision allows streaming architectures to perform more operations per second and can reduce memory bandwidth pressure on all architectures. However, using a precision that is too low for a given algorithm and data set will result in inaccurate results. Thus, developers must balance speed and accuracy when choosing the floating-point precision of their subroutines and data structures. We are building tools to help developers learn about the runtime floating-point behavior of their programs, and to help them make implementation decisions regarding this behavior. We propose a tool that performs automatic binary instrumentation of floating-point code to detect mathematical cancellations. In particular, we show how our prototype can detect the variation in cancellation patterns for different pivoting strategies in Gaussian elimination, as well as how our prototype can detect a program’s sensitivity to ill-conditioned input sets.

    Abstract for Software Techniques for Negating Skid and Approximating Cache Miss Measurements:

    Data centric analysis using direct measurements has been established as a successful performance analysis technique. Information gathered with this technique can map cache misses to program variables. These mappings can then be used to address data locality problems and other issues. Existing approaches rely on special hardware support which is needed to negate a ‘skid’ factor. Our approach is viable when the special hardware support is not present, but where skid is still an issue. Prior methods also rely on maintaining runtime information about memory allocation addresses for variables, which may lead to program perturbation. Our approach uses software analysis to eliminate the need for maintaining allocation and free records. We show that by using heuristics our technique can attribute cache misses to program variables while maintaining the approximate rank-order found by using traditional techniques. We also show that there exists a high correlation between the misses attributed by our approximation and the misses assigned by examining direct measurements.

    Abstract for Automatically Adapting Programs for Mixed-Precision Floating-Point Computation:

    As scientific computation continues to scale, efficient use of floating-point arithmetic processors is critical. Lower precision allows streaming architectures to perform more operations per second and can reduce memory bandwidth pressure on all architectures. However, using a precision that is too low for a given algorithm and data set leads to inaccurate results. In this paper, we present a framework that uses binary instrumentation and modification to build mixed-precision configurations of existing binaries that were originally developed to use only double-precision. This framework allows developers to explore mixed-precision configurations without modifying their source code, and it permits autotuning of floating-point precision. We include a simple search algorithm to automate identification of code regions that can use lower precision. Our results for several benchmarks show that our framework is effective and incurs low overhead (less than 10X in most cases). In addition, we demonstrate that our tool can replicate manual conversions and suggest further optimization; in one case, we achieve a speedup of 2X.

    Abstract for Binary-Code Obfuscations in Prevalent Packer Tools:

    The first steps in analyzing defensive malware are understanding what obfuscations are present in real-world malware binaries, how these obfuscations hinder analysis, and how they can be overcome. While some obfuscations have been reported independently, this survey consolidates the discussion while adding substantial depth and breadth to it. This survey also quantifies the relative prevalence of these obfuscations by using the Dyninst binary analysis and instrumentation tool that was recently extended for defensive malware analysis. The goal of this survey is to encourage analysts to focus on resolving the obfuscations that are most prevalent in real-world malware.

    Abstract for "Structured Binary Editing with a CFG Transformation Algebra:

    Binary modification allows users to alter existing code or inject new code into programs without requiring source code, symbols, or debugging information. It is critically important that such modification not accidentally create a structurally invalid binary that has illegal control flow or executes invalid instructions. Unfortunately, current modification tools do not make this guarantee, instead relying on the user to manually ensure the modified binary is valid. In addition, they fail to provide high-level abstractions of the binary (e.g., functions), instead requiring the user to have a deep understanding of the idiosyncrasies of the instruction set and the behavior of the program. We present structured binary editing, which allows users to modify a program binary by modifying its CFG. We define an algebra of CFG transformations that is closed under a CFG validity constraint, thus ensuring that users can arbitrarily compose these transformations while preserving structural validity. We have implemented these transformations in the Dyninst binary analysis and instrumentation framework, and demonstrate them by using hot patching to apply a fix for the CVE-2011-3368 vulnerability to an executing Apache HTTPD server without interrupting the server's execution.

    Abstract for A Scalable Failure Recovery Model for Tree-based Overlay Networks:

    Tree-based overlay networks (TB¯ONs) exploit the logarithmic scaling property of trees to provide scalable data multicast, gather, and aggregation services. We leverage the characteristics of many TB¯ON computations to develop a new failure recovery model, state compensation. State compensation uses: (1) inherently redundant information from surviving processes to compensate for information lost due to failures, (2) weak data consistency to simplify recovery mechanisms, and (3) localized protocols for autonomous process failure recovery. State compensation incurs no overhead in the absence of failures, and when failures occur, only a small number of processes participate in recovery. We use a formal specification of our data aggregation model to validate state compensation and identify its requirements and limitations. Generally, state compensation requires that data aggregation operations be commutative and associative. We describe an implementation of one state compensation mechanisms and evaluate its recovery performance: for a TB¯ON that can support millions of application processes, state compensation can yield millisecond failure recovery latencies and inconsequential application perturbation.

    Abstract for Data Centric Techniques for Mapping Performance Data to Program Variables:

    Traditional methoeds of performance analysis offer a code centric view, presening performance data in terms of blocks of contiguous code (statement, basic block, loop, function). Data centric techinques combine with hardware counter information allow various program properties inculding cache misses and cycle count to be mapped directly to variables. We introduce mechaimism for efficiently collecting data centric performance numbers independent of hardware support. We create extended data centric mappings, which we call variable blame that relates data centric information to high level data structures. Finally, we show performance data gathered from three parallel programs using our technique.

    Abstract for Compiler Help for Binary Manipulation Tools:

    Parsing machine code is the first step for most analyses performed on binary files. These analyses build control flow graphs (CFGs). In this work we propose a compilation mechanism that augments binary files with information about where each basic block is located and how they are connected to each other. This information makes it unnecessary to analyze most instructions in a binary during the initial CFG build process. As a result, these binary analysis tools experience dramatically increased parsing speeds - 3.8x on average.

    First Next Last

    Page 1 of 19