Home | Other Centers | Sitemap
Search
C-DAC Pune
   Testing: High Performance Computing Activities  
 


- Testing and Benchmarking: PARAM Series of Supercomputers

- Betatesting Group: List of Project Activities at Glance
- Free Downloadable P-COMS (HPC) software
- Development & testing of HPC Tools & Libraries on PARAM Series of Supercomputers
- Parallel Computing Workshops & Training, and HPC Modules Course Programme

Testing and Benchmarking: PARAM series of Supercomputers

The project includes the design testing methodology and execution plan for C-DAC developed HPC Clusters, writing test cases, packaging the test suites, verifying testability of requirements, and proposing the automation of test cases, and extracting performance of Application and System Benchmarks for HPC Systems. (Refer Btest-HPC-Projects.pdf)

The idea is to provide a common ground to test these suites on the PARAM series; some of them investigate the use of real application programs while others employ short kernel codes to evaluate the sustained performance. The test suites are organized in five Levels. The decision as to which types of codes to include in each Level package must be made based on the following important characteristics that serve as goals for the test constructor.

-
Most importantly, the benchmarks or test suites must be representative of various types of applications. The types, patterns, and rates of computation, communication, and input/output of the programs in the test package must match that of programs actually in use, to as great a degree as practical. Furthermore, the programs in test package must be imposed on the system under test in a manner similar to that in practice.
 
-
The test package must be maintainable: the size of the test package must be kept to minimum and constructed in a modular, easy-to modify fashion.
 
-
The test package must be designed and implemented according to standard software engineering practices.
 
-
The test package must be scalable and it must be possible to vary the number of processors to be used and the size of the test problem to be solved. The description of each level test suites as well as benchmarks used are explained.

Test Suites and Application and System Benchmarks

Performance of Application and System Benchmarks on PARAM 10000 and the PARAM Padma message-passing Cluster is a major activity, undertaken by the Betatesting Group. Macro and Micro benchmarks have been used to test and extract the sustained performance of PARAM series. Benchmarks are used not only to test but also to measure and to predict the sustained performance of the computer system. (Refer Btest-HPC-Projects.pdf)

The benchmarks are organized in five levels focusing on architectural features of parallel computers (Clusters, SMP Machines), performance issues of sequential and parallel programs on Uni/Multi processors of Parallel Computing Systems, optimization of parallel programs, performance issues of MPI-1 & MPI-2, and quantification of Communication Overheads and System Area Networks.

Level-1: test suites/benchmarks focus on evaluating the performance on uni-processor/Multi-processors of one node of Cluster.

- First:

-
LAPACK: Linear Algebra PACKage for dense matrix computations
 
-
NAS: NAS Sequential Benchmarks-Computational Fluid Dynamics
 
-
ScaLAPACK: Scalable Linear Algebra PACKage dense matrix computations
 
-
LINPACK: High Performance linear system of matrix equations
 
-
CFD: Computational Fluid Dynamics
 
-
SPEC: Standard Performance Evaluation Corporation benchmark family: Measure CPU performance, memory system, and the client/server computing, commercial applications, I/O subsystems.
 
-
LFK: Livermore Fortran Kernels (LFK) test

- Second

-
STREAM: Simple synthetic benchmark measures sustainable memory bandwidth in (MB/s) and the corresponding computation rate.
 
-
LMBENCH: To measure operating system overheads, and the capability of data transfer between processor, cache, memory, network, and the disk
 
-
LLCBench : Benchmarks (BLAS-Bench, Cache Bench and MP Bench)
 
-
EuroBen : Benchmark to measure basic characteristics of machine
 
-
Dhrystone : To measure CPU performance
 
-
TPC: Transaction Processing performance Council benchmarks: Transaction Processing and Data base Benchmarks.

- Third

-
P-SIMPLE: PARAM Simple MPI programs
 
-
P-MPISEM: PARAM - Programs for testing MPI Semantics

 

Level-2: The Level-2 test suites/benchmarks determine the overhead measurement of communication time for various point-to-point communication and collective communication test suites that are supported in MPI.

-
P-COMS: PARAM -Communication Overhead Measurement Suites (Free downloadable from the C-DAC web-site)
 
-
P-GCOMS: PARAM-Generalized Communication Overhead Measurement Suites (Software developed by the Betatesting Group)
 
-
MPI forum: Free downloadable MPI performance benchmarks used in mpich installation.
 
-
PALLAS: Free downloadable MPI performance benchmarks
 
-
LLC - MPBench: Free downloadable MPI performance benchmarks in LLCBench - Low Level Characterization benchmarks.
 
- PARKBENCH: Free downloadable selective micro benchmarks
   
- Sphinx: An integrated parallel micro benchmark suite
   
- SKaMPI : Designed to measure the performance of MPI.

Level 3: The level 3 test suites/benchmarks involve structured communication using different MPI library calls and computation that arise in dense matrix computations. These are packaged in a test suite P-MACS (PARAM Matrix Computation Suites).

Level-4: The Level-4 test suites/benchmarks involve unstructured communication and computations that arise in sparse matrix computations using different MPI library calls.

-
P-PARDES: Programs based on solution of partial differential equations
 
-
P-UCCL: Programs based on parallel unstructured communications and computations

Level-5: The Level-5 test suites/benchmarks involve widely used off-the shelf codes or package kernels, such as LINPACK (TOP-500), ScaLAPACK, PARKBENCH, EuroBen, and Application kernels

-
NAS: NAS Parallel Benchmark-Computational Fluid Dynamics codes
 
-
ScaLAPACK: Scalable Linear Algebra PACKage dense matrix computations
 
-
PARKBENCH : Micro and Macro benchmarks
 
-
LINPACK : High Performance linear system of matrix equations
 
-
TOP-500 : List of Top-500 Supercomputers in the World
 
- P-CFD: PARAM Computational Fluid Dynamics
   
- PERFECT : A suite of scientific and engineering programs
   
- SPLASH : Benchmark suites
   
- EuroBen: Benchmark programs for scientific and technical computing to assess the performance of computers

Development of HPC Benchmarks (P-COMS) (downloadable Software)

P-COMS: The Betatesting Group developed the P-COMS Benchmarks, which is a set of MPI benchmarks that measure communication overheads on large message passing clusters (such as the PARAM 10000, PARAM Padma and teraflop clusters). The benchmarks measure the overhead time for MPI point-to-point communication library calls, collective communication library calls, collective communication and computation library calls for various message sizes ranging from 0 bytes to 10 megabytes. P-COMS can be used to compare the performance of various MPI library calls on different message passing clusters. Click here to download

Testing High Performance Computing and communication (HPCC) Tools

Performance evaluation and visualization is an important and useful technique that helps the user to understand and improve complex parallel performance phenomena. C-DAC's HPCC software is the programming environment for PARAM Padma /PARAM 10000 and supports the development and execution of both sequential and message passing programs. The HPCC software contains a rich set of high performance tools for clusters. The Betatesting Group contributes towards the testing of HPCC Tools, which includes verifying testability of requirements, defining guidelines for test results, reporting and suggesting test optimization techniques, and the development of test scripts. C-DAC HPCC tools and their features with respect to usage and performance are investigated on the PARAM series.

Our proposed set of valuation criteria consists of Robustness, Usability, Scalability, Portability and Versatility. The investigation of testing tools that are either publicly or commercially available including HPCC software tools will be carried out. Tools that work across different platforms, rather than on vendor tools that work only on one platform will be investigated. The list of HPCC tools is given below.

-
Performance Evaluation and Visualization tool
 
-
Parallel debuggers
 
-
Data Representation: Performance Visualization of MPI Programs
 
-
Cluster management and monitoring tools (Portability; Scalability; High
 
-
Sampling rate Extensibility; Flexibility; Event Monitoring and Management;
 
- History and Reporting GUI Monitoring and Configuring; Network Tests.
   
- Job Management Software

Porting MPICH over VIPL on PARAM Padma

The Betatesting Group has worked on Porting MPICH over VIPL on the PARAM Padma using PARAMNet-II interconnect technology, in collaboration with other groups of C-DAC. Experiments on Performance of system and application benchmarks on PARAM Padma with different cluster configuration such as Gigabit & MPICH; PARAMNet-II with MPICH over VIPL; and PARAMNet-II with C-DAC MPI over VIPL have been carried out.

Development of HPC Libraries and Tools on PARAM 10000

P-MACS: The Betatesting Group developed the Parallel Matrix Computation Suites (P-MACS) on PARAM 10000 in which different partitioning of matrix techniques have been used for dense matrix computations. The P-MACS suite uses MPICH (Message Passing Interface) library calls and the aim is to execute all simple matrix computations algorithms. The parallel programs on vector-vector multiplication, matrix-vector multiplication, matrix-matrix multiplication with different partitioning techniques, and solution of matrix system of linear equations by direct and iterative methods have been considered. The aim is to test the correctness of results for different parallel matrix computation algorithms using different types of Point-Point, Advanced Point -to-Point Collective Communication MPI library Calls.

Dedicated Slot Booking Software on PARAM 10000

We have developed Dedicated Slot Booking (DSB) Software which partitions the message passing cluster such as PARAM 10000 with respect to specific time. We explain design issues and features of this software that helps user to book a set of nodes exclusively for single/multi users in given time frame dedicated slot on PARAM 10000. Using this software, it is possible to book nodes at different intervals of time to execute the jobs in dedicated mode. This software reduces dependency on system administrators and operators, freeing them to focus on other important activities. Dedicated slot booking software is a part of Load Scheduling and Job Management software, which is available on many parallel computers. Although cluster computing has emerged as favourable alternative to large scale SMPs and MPPs, there are several issues regarding usability and manageability. Specifically, when benchmarking the cluster for performance and scalability, the entire cluster or some nodes of the cluster is to be used in dedicated mode to reduce several overheads. So we need to partition the cluster logically according to the requirements. The Dedicated Slot Booking is a software tool, which will partition the cluster according to the requirements of the users.

By logical partitioning we mean that the partitioning is done only by implementing certain policies about the usage of the cluster by means of some software. Another approach is to disallow any user to logon to any of the nodes of the cluster except the job submission node. From this node the user has to submit his/her job under the control of some job management software like LSF - load-sharing facility, portable batch system (PBS), Condor. This approach does answer the needs of job management for a fixed production environment but not that of testing and development - especially in case of system software developers. For the latter case it is essential for the users to logon to the nodes to perform various activities. Also, such an approach cannot give flexibility of runtime environment selection to the users. To address these issues a software tool is designed which will create and delete the partition of the cluster. The software is designed in modular form and web-interface is provided. We assume that the cluster is configured as some servers and compute nodes and the server handles the time slot booking information and interfaces with user's requirements. Issues such as accessing the slot booking software through web-interface, organization of various modules, identifying analysis mechanisms to check the node status during booking have been taken into consideration.

Job Accounting Tool (PJOBA) on PARAM 10000:

The Betatesting Group and NPSF members developed the PARAM Job Accounting Tool (PJOBA), an accounting software package for PARAM 10000 by which the external users can be charged accordingly. The accounting package is web enabled and provides information on the essence utilization by individual users or all users on the PARAM 10000 Clusters. This software on the PARAM 10000 reduces the dependency on system administrators and operators, freeing them to focus on other activities. The system accounting utilities are a family of mechanisms that collect data on system usage by the CPU, by the user, and by the process.

 

Parallel Computing Workshops & Training, and HPC Modules Course Programme.

The Betatesting Group was actively involved in the development of the state-of-the-art web-enabled parallel computing training program on the PARAM 10000 and PARAM Padma for internal and external users. Members of the Group conducted several parallel computing workshops at C-DAC with the support of the other group members of C-DAC.

The Betatesting Group members conducted state-of-the-art parallel computing workshops, covering various theoretical aspects of parallel computing, architectural features of parallel computers (Clusters, SMP Machines, CC-NUMA machines), system area networks and inter-connection networks, performance and scalability issues of applications and detailed hands-on message passing clusters.

The hands-on session PARAM 10000 and PARAM Padma is focused on performance issues of sequential and parallel programs on Uni/Multi processors of parallel computing systems, different algorithms on dense matrix computations, optimization of sequential and parallel programs, performance issues of MPI-1 on clusters, and quantification of MPI Communication library calls overheads on large teraflop computing systems.

Listed below are the Parallel Computing workshops & High Performance Computing that have been conducted at C-DAC, other Academic Institutes and R&D Organizations in India.

-
Five day workshop on Parallel Computing: Algorithms and Applications (PCAA-99) June 21-25, 1999. (Refer PCAA-1999.pdf)
 
-
Four/Three day workshop "Parallel Computing on PARAM 10000 at Premier Institutes" (All IITs, IISc-Bangalore, BITS-Pilani, BIT, Mesra, Ranchi, REC Suratkal in India; April 2000 - Sep 2001) (Refer BITS-PILANI-2000.pdf, IISC-2000.pdf, IIT-MADRAS.pdf, & IIT-KANPUR.pdf)
 
-
Four day workshop on Parallel Computing - Optimizing Performance of Parallel programs (PCOPP-2002) on the PARAM 10000 at C-DAC, Pune, June 03 -06, 2002 (Refer PCOPP-2002.pdf)
 
-
Three day Parallel Computing workshops on the PARAM 10000 at IIT-Delhi Campus, December 2003
 
-
Four day Parallel Computing workshops on the PARAM 10000 at IUCAA, Pune University Campus, Pune, October 05-09, 2004
 
-
Three day Parallel Computing workshop on the PARAM 10000 Cluster at the National Metallurgical Laboratory, Jamshedpur, Oct 12-14, 2004
 
-
Four day workshop on Practical aspects of Parallel Computing IBM Clusters at the Centre for Modelling, Simulation and Design, University of Hyderabad, October 21 -24, 2004. (Refer UofHYD-ParComp-2004.pdf)
 
-
Parallel Computing/High Performance Computing (HPC) module with hands-on session on the PARAM 10000 & PARAM Padma for the Advanced Course in Bio Informatics (ACB-2002, ACB-2003, ACB-2004, ACB-2005) (Refer ACB2004-HPC.pdf, ACB2005-HPC.pdf)