| |
- Testing and Benchmarking:
PARAM Series of Supercomputers
- Betatesting
Group: List of Project Activities at Glance
- Free
Downloadable P-COMS (HPC) software
- Development &
testing of HPC Tools & Libraries on PARAM Series
of Supercomputers
- Parallel Computing
Workshops & Training, and HPC Modules Course Programme
Testing
and Benchmarking: PARAM series of Supercomputers
The project includes
the design testing methodology and execution plan for
C-DAC developed HPC Clusters, writing test cases, packaging
the test suites, verifying testability of requirements,
and proposing the automation of test cases, and extracting
performance of Application and System Benchmarks for
HPC Systems. (Refer Btest-HPC-Projects.pdf)
The
idea is to provide a common ground to test these suites
on the PARAM series; some of them investigate the use
of real application programs while others employ short
kernel codes to evaluate the sustained performance.
The test suites are organized in five Levels. The decision
as to which types of codes to include in each Level
package must be made based on the following important
characteristics that serve as goals for the test constructor.
| - |
Most importantly, the benchmarks
or test suites must be representative of various types
of applications. The types, patterns, and rates of
computation, communication, and input/output of the
programs in the test package must match that of programs
actually in use, to as great a degree as practical.
Furthermore, the programs in test package must be
imposed on the system under test in a manner similar
to that in practice. |
| |
|
| - |
The test package must be maintainable:
the size of the test package must be kept to minimum
and constructed in a modular, easy-to modify fashion. |
| |
|
| - |
The test package must be designed
and implemented according to standard software engineering
practices. |
| |
|
| - |
The test package must be scalable
and it must be possible to vary the number of processors
to be used and the size of the test problem to be
solved. The description of each level test suites
as well as benchmarks used are explained. |
Test Suites and Application and System
Benchmarks
Performance
of Application and System Benchmarks on PARAM 10000
and the PARAM Padma message-passing Cluster is a major
activity, undertaken by the Betatesting Group. Macro
and Micro benchmarks have been used to test and extract
the sustained performance of PARAM series. Benchmarks
are used not only to test but also to measure and to
predict the sustained performance of the computer system.
(Refer Btest-HPC-Projects.pdf)
The
benchmarks are organized in five levels focusing on
architectural features of parallel computers (Clusters,
SMP Machines), performance issues of sequential and
parallel programs on Uni/Multi processors of Parallel
Computing Systems, optimization of parallel programs,
performance issues of MPI-1 & MPI-2, and quantification
of Communication Overheads and System Area Networks.
Level-1: test suites/benchmarks focus on evaluating
the performance on uni-processor/Multi-processors of
one node of Cluster.
- First:
| - |
LAPACK: Linear Algebra
PACKage for dense matrix computations |
| |
|
| - |
NAS: NAS Sequential Benchmarks-Computational
Fluid Dynamics |
| |
|
| - |
ScaLAPACK: Scalable Linear
Algebra PACKage dense matrix computations |
| |
|
| - |
LINPACK: High Performance
linear system of matrix equations |
| |
|
| - |
CFD: Computational Fluid
Dynamics |
| |
|
| - |
SPEC: Standard Performance
Evaluation Corporation benchmark family: Measure
CPU performance, memory system, and the client/server
computing, commercial applications, I/O subsystems. |
| |
|
| - |
LFK: Livermore Fortran
Kernels (LFK) test |
- Second
| - |
STREAM: Simple synthetic
benchmark measures sustainable memory bandwidth
in (MB/s) and the corresponding computation rate. |
| |
|
| - |
LMBENCH: To measure operating
system overheads, and the capability of data transfer
between processor, cache, memory, network, and
the disk |
| |
|
| - |
LLCBench : Benchmarks
(BLAS-Bench, Cache Bench and MP Bench) |
| |
|
| - |
EuroBen : Benchmark to
measure basic characteristics of machine |
| |
|
| - |
Dhrystone : To measure
CPU performance |
| |
|
| - |
TPC: Transaction Processing
performance Council benchmarks: Transaction Processing
and Data base Benchmarks. |
- Third
| - |
P-SIMPLE: PARAM Simple
MPI programs |
| |
|
| - |
P-MPISEM: PARAM - Programs
for testing MPI Semantics |
Level-2: The Level-2 test suites/benchmarks
determine the overhead measurement of communication
time for various point-to-point communication and collective
communication test suites that are supported in MPI.
| - |
P-COMS: PARAM -Communication
Overhead Measurement Suites (Free downloadable from
the C-DAC web-site) |
| |
|
| - |
P-GCOMS: PARAM-Generalized
Communication Overhead Measurement Suites (Software
developed by the Betatesting Group) |
| |
|
| - |
MPI forum: Free downloadable
MPI performance benchmarks used in mpich installation. |
| |
|
| - |
PALLAS: Free downloadable
MPI performance benchmarks |
| |
|
| - |
LLC - MPBench: Free downloadable
MPI performance benchmarks in LLCBench - Low Level
Characterization benchmarks. |
| |
|
| - |
PARKBENCH: Free downloadable
selective micro benchmarks |
| |
|
| - |
Sphinx: An integrated parallel
micro benchmark suite |
| |
|
| - |
SKaMPI : Designed to measure
the performance of MPI. |
Level 3: The level 3
test suites/benchmarks involve structured communication
using different MPI library calls and computation that
arise in dense matrix computations. These are packaged
in a test suite P-MACS (PARAM Matrix Computation Suites).
Level-4: The Level-4
test suites/benchmarks involve unstructured communication
and computations that arise in sparse matrix computations
using different MPI library calls.
| - |
P-PARDES: Programs based
on solution of partial differential equations |
| |
|
| - |
P-UCCL: Programs based on
parallel unstructured communications and computations |
Level-5: The Level-5
test suites/benchmarks involve widely used off-the shelf
codes or package kernels, such as LINPACK (TOP-500),
ScaLAPACK, PARKBENCH, EuroBen, and Application kernels
| - |
NAS: NAS Parallel Benchmark-Computational Fluid
Dynamics codes |
| |
|
| - |
ScaLAPACK: Scalable Linear Algebra PACKage dense
matrix computations |
| |
|
| - |
PARKBENCH : Micro and Macro benchmarks |
| |
|
| - |
LINPACK : High Performance linear system of matrix
equations |
| |
|
| - |
TOP-500 : List of Top-500 Supercomputers in the
World |
| |
|
| - |
P-CFD: PARAM Computational Fluid Dynamics |
| |
|
| - |
PERFECT : A suite of scientific and engineering
programs |
| |
|
| - |
SPLASH : Benchmark suites |
| |
|
| - |
EuroBen: Benchmark programs for scientific and
technical computing to assess the performance of computers |
Development
of HPC Benchmarks (P-COMS) (downloadable Software)
P-COMS: The Betatesting
Group developed the P-COMS Benchmarks, which is a set
of MPI benchmarks that measure communication overheads
on large message passing clusters (such as the PARAM
10000, PARAM Padma and teraflop clusters). The benchmarks
measure the overhead time for MPI point-to-point communication
library calls, collective communication library calls,
collective communication and computation library calls
for various message sizes ranging from 0 bytes to 10
megabytes. P-COMS can be used to compare the performance
of various MPI library calls on different message passing
clusters. Click
here to download
Testing High Performance
Computing and communication (HPCC) Tools
Performance evaluation and visualization
is an important and useful technique that helps the
user to understand and improve complex parallel performance
phenomena. C-DAC's HPCC software is the programming
environment for PARAM Padma /PARAM 10000 and supports
the development and execution of both sequential and
message passing programs. The HPCC software contains
a rich set of high performance tools for clusters. The
Betatesting Group contributes towards the testing of
HPCC Tools, which includes verifying testability of
requirements, defining guidelines for test results,
reporting and suggesting test optimization techniques,
and the development of test scripts. C-DAC HPCC tools
and their features with respect to usage and performance
are investigated on the PARAM series.
Our proposed set of valuation
criteria consists of Robustness, Usability, Scalability,
Portability and Versatility. The investigation of testing
tools that are either publicly or commercially available
including HPCC software tools will be carried out. Tools
that work across different platforms, rather than on
vendor tools that work only on one platform will be
investigated. The list of HPCC tools is given below.
| - |
Performance Evaluation and
Visualization tool |
| |
|
| - |
Parallel debuggers |
| |
|
| - |
Data Representation: Performance
Visualization of MPI Programs |
| |
|
| - |
Cluster management and monitoring
tools (Portability; Scalability; High |
| |
|
| - |
Sampling rate Extensibility;
Flexibility; Event Monitoring and Management; |
| |
|
| - |
History and Reporting GUI
Monitoring and Configuring; Network Tests. |
| |
|
| - |
Job Management Software |
Porting MPICH over VIPL on PARAM Padma
The Betatesting Group has worked on Porting MPICH
over VIPL on the PARAM Padma using PARAMNet-II interconnect
technology, in collaboration with other groups of C-DAC.
Experiments on Performance of system and application
benchmarks on PARAM Padma with different cluster configuration
such as Gigabit & MPICH; PARAMNet-II with MPICH
over VIPL; and PARAMNet-II with C-DAC MPI over VIPL
have been carried out.
Development of HPC Libraries and
Tools on PARAM 10000
P-MACS: The Betatesting Group developed the
Parallel Matrix Computation Suites (P-MACS) on PARAM
10000 in which different partitioning of matrix techniques
have been used for dense matrix computations. The P-MACS
suite uses MPICH (Message Passing Interface) library
calls and the aim is to execute all simple matrix computations
algorithms. The parallel programs on vector-vector multiplication,
matrix-vector multiplication, matrix-matrix multiplication
with different partitioning techniques, and solution
of matrix system of linear equations by direct and iterative
methods have been considered. The aim is to test the
correctness of results for different parallel matrix
computation algorithms using different types of Point-Point,
Advanced Point -to-Point Collective Communication MPI
library Calls.
Dedicated Slot Booking Software on PARAM 10000
We have developed Dedicated Slot Booking (DSB) Software
which partitions the message passing cluster such as
PARAM 10000 with respect to specific time. We explain
design issues and features of this software that helps
user to book a set of nodes exclusively for single/multi
users in given time frame dedicated slot on PARAM 10000.
Using this software, it is possible to book nodes at
different intervals of time to execute the jobs in dedicated
mode. This software reduces dependency on system administrators
and operators, freeing them to focus on other important
activities. Dedicated slot booking software is a part
of Load Scheduling and Job Management software, which
is available on many parallel computers. Although cluster
computing has emerged as favourable alternative to large
scale SMPs and MPPs, there are several issues regarding
usability and manageability. Specifically, when benchmarking
the cluster for performance and scalability, the entire
cluster or some nodes of the cluster is to be used in
dedicated mode to reduce several overheads. So we need
to partition the cluster logically according to the
requirements. The Dedicated Slot Booking is a software
tool, which will partition the cluster according to
the requirements of the users.
By logical partitioning we mean that the partitioning
is done only by implementing certain policies about
the usage of the cluster by means of some software.
Another approach is to disallow any user to logon to
any of the nodes of the cluster except the job submission
node. From this node the user has to submit his/her
job under the control of some job management software
like LSF - load-sharing facility, portable batch system
(PBS), Condor. This approach does answer the needs of
job management for a fixed production environment but
not that of testing and development - especially in
case of system software developers. For the latter case
it is essential for the users to logon to the nodes
to perform various activities. Also, such an approach
cannot give flexibility of runtime environment selection
to the users. To address these issues a software tool
is designed which will create and delete the partition
of the cluster. The software is designed in modular
form and web-interface is provided. We assume that the
cluster is configured as some servers and compute nodes
and the server handles the time slot booking information
and interfaces with user's requirements. Issues such
as accessing the slot booking software through web-interface,
organization of various modules, identifying analysis
mechanisms to check the node status during booking have
been taken into consideration.
Job Accounting Tool (PJOBA) on PARAM 10000:
The Betatesting Group and NPSF members developed the
PARAM Job Accounting Tool (PJOBA), an accounting software
package for PARAM 10000 by which the external users
can be charged accordingly. The accounting package is
web enabled and provides information on the essence
utilization by individual users or all users on the
PARAM 10000 Clusters. This software on the PARAM 10000
reduces the dependency on system administrators and
operators, freeing them to focus on other activities.
The system accounting utilities are a family of mechanisms
that collect data on system usage by the CPU, by the
user, and by the process.
Parallel
Computing Workshops & Training, and HPC Modules
Course Programme.
The Betatesting Group was actively involved in the
development of the state-of-the-art web-enabled parallel
computing training program on the PARAM 10000 and PARAM
Padma for internal and external users. Members of the
Group conducted several parallel computing workshops
at C-DAC with the support of the other group members
of C-DAC.
The Betatesting Group members conducted state-of-the-art
parallel computing workshops, covering various theoretical
aspects of parallel computing, architectural features
of parallel computers (Clusters, SMP Machines, CC-NUMA
machines), system area networks and inter-connection
networks, performance and scalability issues of applications
and detailed hands-on message passing clusters.
The hands-on session PARAM 10000 and PARAM Padma is
focused on performance issues of sequential and parallel
programs on Uni/Multi processors of parallel computing
systems, different algorithms on dense matrix computations,
optimization of sequential and parallel programs, performance
issues of MPI-1 on clusters, and quantification of MPI
Communication library calls overheads on large teraflop
computing systems.
Listed below are the Parallel Computing workshops &
High Performance Computing that have been conducted
at C-DAC, other Academic Institutes and R&D Organizations
in India.
| - |
Five day workshop on Parallel Computing: Algorithms
and Applications (PCAA-99) June 21-25, 1999. (Refer PCAA-1999.pdf) |
| |
|
| - |
|
| |
|
| - |
Four day workshop on Parallel Computing - Optimizing
Performance of Parallel programs (PCOPP-2002) on the
PARAM 10000 at C-DAC, Pune, June 03 -06, 2002 (Refer PCOPP-2002.pdf) |
| |
|
| - |
Three day Parallel Computing workshops on the PARAM
10000 at IIT-Delhi Campus, December 2003 |
| |
|
| - |
Four day Parallel Computing workshops on the PARAM
10000 at IUCAA, Pune University Campus, Pune, October
05-09, 2004 |
| |
|
| - |
Three day Parallel Computing workshop on the PARAM
10000 Cluster at the National Metallurgical Laboratory,
Jamshedpur, Oct 12-14, 2004 |
| |
|
| - |
Four day workshop on Practical aspects of Parallel
Computing IBM Clusters at the Centre for Modelling,
Simulation and Design, University of Hyderabad, October
21 -24, 2004. (Refer UofHYD-ParComp-2004.pdf) |
| |
|
| - |
Parallel Computing/High Performance Computing (HPC)
module with hands-on session on the PARAM 10000 &
PARAM Padma for the Advanced Course in Bio Informatics
(ACB-2002, ACB-2003, ACB-2004, ACB-2005) (Refer ACB2004-HPC.pdf, ACB2005-HPC.pdf) |

|
|