Skip Navigation

IEICE Transactions on Communications 2008 E91-B(4):1015-1024; doi:10.1093/ietcom/e91-b.4.1015
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Request Permissions
Google Scholar
Right arrow Articles by NZIGOU MAMADOU, H.
Right arrow Articles by MURAKAMI, K.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Copyright © 2008 The Institute of Electronics, Information and Communication Engineers

Regular Section -- Papers -- Network

Performance Models for MPI Collective Communications with Network Contention

Hyacinthe NZIGOU MAMADOU1,2, Takeshi NANRI3 and Kazuaki MURAKAMI1,2

1 The authors are with the Department of Informatics, Kyushu University, Fukuoka-shi, 814-0001 Japan. E-mail: hnzigoum{at}c.csce.kyushu-u.ac.jp, 2 The authors are with Institute of Systems & Information Technologies/KYUSHU, Fukuoka-shi, 814-0043 Japan., 3 The author is with Computing and Communications Center, Kyushu University, Fukuoka-shi, 814-0001 Japan.

The paper presents a novel approach to estimate the performance of MPI collective communications. Our objective is to help researchers to make appropriate decisions on their message-passing applications. For each collective communication, we attempt to apply LogGP and P-LogP standard point-to-point models. The resulted models are compared with the empirical data in order to identify the most suitable for performance characterization of collective operations. For the communications on large clusters with large size messages, the network contention problem can significantly affect the performance. Hence, to reduce the relative gap between the prediction and the measured runtime, the contention issue is also modeled, by a queuing theory analysis method, and taken in account with the total performance estimation. The experiments performed on a cluster which consists of 64 processors interconnected by Gigabit Ethernet network show encouraging results. For any collective operation, given a number of processors and a range of message sizes, there is at least one model that predicts the performance precisely. We could achieve a gap between the predicted and the measured run-time around 15%. Thus, by handling the contention problem, we could reduce around 80% of the relative gap.

Key Words: MPI, collective communications, performance prediction, queuing theory, contention issue


Manuscript received April 27, 2007. Manuscript revised August 8, 2007.

Reference

[1] H. Nzigou M., T. Nanri, and K. Murakami, "Collective communication costs analysis over gigabit ethernet and infiniband," Proc. 13th annual IEEE International Conference on High Performance Computing, pp.547–559, Bangalore, India, Dec. 2006.

[2] A. Alexandrov, M.F. Ionescu, K.E. Schauser, and C. Scheiman, "LogGP: Incorporating long messages into the LogP model — One step closer towards a realistic model for parallel computation," Proc. 7th annual ACM symposium on Parallel algorithms and architectures, pp.95–105, 1995.

[3] T. Kielmann, H. Bal, and K. Verstoep, "Fast measurement of LogP parameters for message passing platforms," IPDPS Workshops, volume 1800 of Lecture Notes in Computer Science, pp.1176–1183, Cancun, Mexico, May 2000.

[4] Intel MPI Benchmarks version 2.3. http://www.intel.comcdsoftwareproductsasmo-na/eng/clustermpi/219848.htm

[5] R. Thakur and W. Gropp, "Improving the performance of collective operations in MPICH," 10th annual European PVM/MPI User's Group Meeting, pp.257–267, Venice, Italy 2003.

[6] R. Rabenseifner and J.L. Traff, "More efficient reduction algorithms for non-power-of-two number of processors in message-passing parallel system," Proc. Annual EuroPVM/MPI, Lect. Notes Comput. Sci., Budapest, Hungary, Sept. 2004.

[7] S.S. Vadhiyar, G.E. Fagg, and J.J. Dongara, "Automatically tuned collective communications," Proc. 2000 ACM/IEEE conference on Supercomputing (CDROM), p.3, 2000.

[8] J.P. Sivac-Grbovi'c, T. Angskun, G. Bosilca, G.E. Fagg, E. Gabriel, and Jack J. Dongarra, "Performance analysis of MPI collective operations," 4th International Workshop on Performance Modeling Evaluation and Optimization, April 2005.

[9] R. Hockney, "The communication challenge for MPP: Intel Paragon and Meiko CS-2," Parallel Computing, vol.20, no.3, pp.389–398, March 1994.

[10] D. Culler, R. Karp, D. Patterson, A. Sahay, K.E. Schauser, E. Santos, R. Subramonian, and T. von Eiken, "LogP: Towards a realistic model of parallel computation," Proc. 4th ACM SIGPLAN symposium on principles and pratice of parallel programming, ACM 1993.

[11] A. Fajad and X. Yuan, "Automatic generation and tuning of MPI collective communication routines," 19th annual ACM International Conference on Supercomputing, June 2005.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Request Permissions
Google Scholar
Right arrow Articles by NZIGOU MAMADOU, H.
Right arrow Articles by MURAKAMI, K.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?