TY - GEN
T1 - Supporting software engineering practices in the development of data-intensive HPC applications with the JuML framework
AU - Götz, Markus
AU - Book, Matthias
AU - Bodenstein, Christian
AU - Riedel, Morris
N1 - Publisher Copyright: © 2017 Copyright held by the owner/author(s).
PY - 2017/11/12
Y1 - 2017/11/12
N2 - The development of high performance computing applications is considerably different from traditional software development. This distinction is due to the complex hardware systems, inherent parallelism, different software lifecycle and workflow, as well as (especially for scientific computing applications) partially unknown requirements at design time. This makes the use of software engineering practices challenging, so only a small subset of them are actually applied. In this paper, we discuss the potential for applying software engineering techniques to an emerging field in high performance computing, namely large-scale data analysis and machine learning. We argue for the employment of software engineering techniques in the development of such applications from the start, and the design of generic, reusable components. Using the example of the Juelich Machine Learning Library (JuML), we demonstrate how such a framework can not only simplify the design of new parallel algorithms, but also increase the productivity of the actual data analysis workflow. We place particular focus on the abstraction from heterogeneous hardware, the architectural design as well as aspects of parallel and distributed unit testing.
AB - The development of high performance computing applications is considerably different from traditional software development. This distinction is due to the complex hardware systems, inherent parallelism, different software lifecycle and workflow, as well as (especially for scientific computing applications) partially unknown requirements at design time. This makes the use of software engineering practices challenging, so only a small subset of them are actually applied. In this paper, we discuss the potential for applying software engineering techniques to an emerging field in high performance computing, namely large-scale data analysis and machine learning. We argue for the employment of software engineering techniques in the development of such applications from the start, and the design of generic, reusable components. Using the example of the Juelich Machine Learning Library (JuML), we demonstrate how such a framework can not only simplify the design of new parallel algorithms, but also increase the productivity of the actual data analysis workflow. We place particular focus on the abstraction from heterogeneous hardware, the architectural design as well as aspects of parallel and distributed unit testing.
KW - Architecture design
KW - Data analysis
KW - High performance computing
KW - Software engineering
KW - Testing
UR - https://www.scopus.com/pages/publications/85054787116
U2 - 10.1145/3144763.3144765
DO - 10.1145/3144763.3144765
M3 - Conference contribution
SN - 9781450351355
T3 - Proceedings of SE-CoDeSE 2017: 1st International Workshop on Software Engineering for High Performance Computing in Computational and Data-Enabled Science and Engineering - Held in conjunction with SC 2017: The International Conference for High Performance Computing, Networking, Storage and Analysis
SP - 1
EP - 8
BT - Proceedings of SE-CoDeSE 2017
PB - Association for Computing Machinery, Inc
T2 - 1st International Workshop on Software Engineering for High Performance Computing in Computational and Data-Enabled Science and Engineering, SE-CoDeSE 2017 - Held in conjunction with the International Conference for High Performance Computing, Networkin...
Y2 - 12 November 2017 through 17 November 2017
ER -