ICPP2013 - Program

ICPP 2013 Program

Tuesday, October 1
08:00-17:00	ENS Lyon Main Entrance: Registration
	WORKSHOPS
Location	Theater B	Thesis Room	Theater E	Exam Room
08:00-13:00	6th International Workshop on Parallel Programming Models and Systems Software for High-End Computing (P2S2)	9th International Workshop on Scheduling and Resource Management for Parallel and Distributed Systems (SRMPDS) End at 12:30	2nd International Workshop on Power-Aware Algorithms, Systems, and Architectures (PASA)	International Workshop on Heterogeneous and Unconventional Cluster Architectures and Applications (HUCAA) End at 13:10
13:00-14:00	Lunch at the Cafetaria (tickets provided)
14:00-18:00	6th International Workshop on Parallel Programming Models and Systems Software for High-End Computing (P2S2)	International Workshop on Applications of Wireless Ad-hoc and Sensor Networks (AWASN)	2nd International Workshop on Power-Aware Algorithms, Systems, and Architectures (PASA)

Wednesday, October 2

08:00-17:00

Mérieux Theater: Registration

09:15-09:45

Mérieux Theater: ICPP 2013 Opening Session

Laurent Lefèvre, Yves Robert and Gilles Villard

09:45-10:45

Mérieux Theater: Keynote 1: "The Coming Era of Adaptive Control Systems in HPC", Laxmikant (Sanjay) Kale, Parallel Programming Laboratory, University of Illinois at Urbana-Champaign, USA.

Keynote chair: Jack Dongarra

Abstract

Complex systems, both natural and engineered, function well when they employ feedback control strategies. Complexity of upcoming parallel machines, combined with sophistication of next generation HPC applications, is creating a landscape where such feedback control systems are essential to attain the scientific and engineering breakthroughs we expect. The runtime systems of today need to evolve into smart, introspective and adaptive control systems. We will examine this idea in the context of two related systems: a single parallel job, and an entire parallel machine. These can be informally analyzed using vocabulary from control systems and discrete optimization, leading to a conclusion that it is necessary to create a rich set of control variables to facilitate effective control. Programming models will play a large role in enabling and empowering such adaptive control. To this end, this talk will also present an abstract programming paradigm called XMAPP, which captures essential properties a programming language must posses to create a specific, rich set of adaptive control variables.

Professor Laxmikant Kale is the director of the Parallel Programming Laboratory and a Professor of Computer Science at the University of Illinois at Urbana-Champaign. Prof. Kale has been working on various aspects of parallel computing, with a focus on enhancing performance and productivity via adaptive runtime systems, and with the belief that only interdisciplinary research involving multiple CSE and other applications can bring back well-honed abstractions into Computer Science that will have a long-term impact on the state-of-art. His collaborations include the widely used Gordon-Bell award winning (SC'2002) biomolecular simulation program NAMD, and other collaborations on computational cosmology, quantum chemistry, rocket simulation, space-time meshes, and agent based simulations. His group develops and supports software embodying their research ideas, including Charm++, Adaptive MPI and the BigSim framework.
L. V. Kale received the B.Tech degree in Electronics Engineering from Benares Hindu University, Varanasi, India in 1977, and a M.E. degree in Computer Science from Indian Institute of Science in Bangalore, India, in 1979. He received a Ph.D. in computer science in from State University of New York, Stony Brook, in 1985.
He worked as a scientist at the Tata Institute of Fundamental Research from 1979 to 1981. He joined the faculty of the University of Illinois at Urbana-Champaign in 1985. Prof. Kale is a fellow of the IEEE, and a winner of the 2012 IEEE Sidney Fernbach award. He and his team won the HPC Challenge award at Supercomputing 2011, and were a finalist in 2012, for their entry based on Charm++.

10:45-11:15

Break

Conference

Workshops

Location

Mérieux Theater

Thesis Room

Mérieux Room 1

Mérieux Room 2

11:15-12:45

Session 1: Algorithms 1
Session chair: Haiying, Chen

Session 2: Applications 1
Session chair: Xiangyu Lin

Session 3: Architectures 1

Session chair: Alberto Ros Bardisa

4th International Workshop on Parallel Software Tools and Tool Infrastructures (PSTI)

12:45-14:00

Lunch at the Mérieux Theater's Atrium

14:00-15:30

Session 4: Algorithms 2
Session chair: Henning Meyerhenke

Session 5: Networking 1
Session chair: Loris Marchal

Session 6: Performance Models 1

Session chair: Zhiyi Huang

4th International Workshop on Parallel Software Tools and Tool Infrastructures (PSTI)

15:30-16:00

Break

16:00-18:00

Session 7: Algorithms 3
Session chair: Reza Hozeiny

Session 8: Applications 2
Session chair: Bora Uçar

Session 9: Software 1

Session chair: Pavan Balaji

4th International Workshop on Parallel Software Tools and Tool Infrastructures (PSTI)

18:00-20:30

Cocktail at Mérieux Theater

Thursday, October 3

08:00-17:00

Mérieux Theater: Registration

09:30-10:30

Mérieux Theater: Keynote 2: "Distributed Routing Algorithms in Networks: A Game Theory Approach.", Bruno Gaujal, large-scale computing project (Mescal), INRIA Grenoble Rhones-Alpes, France.

Keynote chair: Jean-Yves L'Excellent

Abstract

In this talk, I will show how game theory can be used to design self-optimizing algorithms for minimizing end-to-end delays for multi-class flows in a general communication network. These algorithms are based on a new class of game theory dynamics made of a pay-off replicator-like term modulated by an entropy barrier. The interplay between these continuous dynamics and their discrete counterparts is the main ingredient to design algorithms that only require partial information to converge to optimal configurations. This convergence is resilient to stochastic perturbations and observation errors, and does not require any synchronization between the flows.
This is a common work with Pierre Coucheney and Panayotis Mertikopoulos.

Bruno Gaujal is a Research Director in INRIA Grenoble Rhones-Alpes, where he is the head of the large-scale computing project, Mescal.
He has held several positions in AT&T Bell Labs, INRIA Sophia-Antipolis, Loria and Ecole Normale Supérieure in Lyon.
His main interests are in performance evaluation, optimization and
control of large discrete event dynamic systems with applications to telecommunications networks and large computing infrastructures.

10:30-11:00

Break

Conference

Workshops

Location

Mérieux Theater

Thesis Room

Mérieux Room1

Mérieux Room 2

11:00-12:30

Session 10: Algorithms 4
Session chair: Anne Benoit

Session 11: Applications 3
Session chair: Kamer Kaya

Session 12: Performance Models 2

Session chair: Suzanne Rivoire

International Workshop on Embedded Multi-core Systems (EMS)

12:30-14:00

Lunch at the Mérieux Theater's Atrium

14:00-15:30

Session 13: Algorithms 5
Session chair: Qing Yi

Session 14: Applications 4
Session chair: Jean-François Méhaut

Session 15: Networking 2

Session chair: Xiaoyi Lu

International Workshop on Embedded Multi-core Systems (EMS)

15:30-16:00

Break

16:00-18:30

Mérieux Theater

Short Papers Session

Session chair: Jack Dongarra

18:30-23:00

Conference banquet : Abbaye de Collonges (one of the Paul Bocuse restaurant ! It will be a great evening ! Don't miss it !)
(We walk 8 minutes from ICPP conference to take the boat to the restaurant. Abbaye de Collonges.
The meeting point is at 18:30 outside the Mérieux theatre.
The banquet starts at 20:00.)

Friday, October 4

08:00-17:00

Mérieux Theater: Registration

09:15-10:15

Mérieux Theater: Keynote3 : "Many-core GPUs: Achievements and future perspectives", Manuel Ujaldon, Computer Architecture Department, University of Malaga, Spain.

Keynote chair: Eddy Caron

Abstract

After a decade being used as hardware accelerators, GPUs constitute nowadays a solid alternative for high performance computing at an affordable cost. Increasing volumes of data managed by large-scale applications make GPUs
very attractive for scientific computing, deploying SIMD parallelism in an unprecedented way.
This talk will review current achievements of many-core GPUs, recent and future hardware enhancements, and emerging challenges to leverage GPUs as accelerators within general-purpose exascale computing.

Manuel Ujaldon is a Professor at the Computer Architecture Department, University of Malaga (Spain) and Conjoint Senior Lecturer at the School of Electrical Engineering and Computer Science of the University of Newcastle (Australia).
He started working on parallelizing compilers, finishing his PhD Thesis in 1996 by developing a data-parallel compiler for sparse matrix and irregular applications. During this time, he was part of the HPF and MPI Forums, working as post-doc in the Computer Science Department of the University of Maryland, College Park. In 2003, he joined the GPGPU movement using Cg while working as invited researcher in the Biomedical Informatics Department at Ohio State University, where he held an adjunct appointment until 2008.

In 2005, he wrote the first book in Spanish about programming GPUs for general purpose computing. He adopted CUDA when it was first released, focusing on irregular and biomedical applications. Over the past five years, he has published more than 50 papers in journals and international conferences in these two areas. He has also given more than 50 tutorials, seminars and invited talks about GPU computing worldwide, including ACM and IEEE conferences and academic programs in European, North American and Australian Universities.

Prof. Ujaldon has been awarded as NVIDIA Academic Partnership 2008-2011, NVIDIA Teaching Center 2011-2013, NVIDIA Research Center 2012-2013, and finally CUDA Fellow in 2012. His research interest includes many-core architectures and CUDA programming for accelerating bio-inspired, data mining and image processing applications on GPUs.

10:15-10:45

Break

Conference

Workshops

Location

Mérieux Theater

Thesis Room

Mérieux Room 1

Mérieux Room 2

10:45-12:45
(not WATCC)

Session 16: Architectures 2

Session chair: Frédéric Vivien

Session 17: Networking 3

Session chair: Nikos Tziritas

Session 18: Software 2

Session chair: Satoshi Matsuoka

10:30-13:00

International Workshop on Advanced Technologies of Cloud Computing (WATCC)

12:45-14:00

Lunch at Mérieux Theater's Atrium

ICPP2013 Conference Program

Session 1: Algorithms 1 Wednesday Oct. 2 - 11:15-12:45 Mérieux Theater Session chair: Haiying, Chen	Session 2: Applications 1 Wednesday Oct. 2 - 11:15-12:45 Thesis Room Session chair: Xiangyu Lin	Session 3: Architectures 1 Wednesday Oct. 2 - 11:15-12:45 Mérieux Room 1 Session chair: Alberto Ros Bardisa

A. Kasagi, K. Nakano and Y. Ito. An Optimal Offline Permutation Algorithm on the Hierarchical Memory Machine, with the GPU implementation M. Maggioni and T. Berger-Wolf. AdELL: An Adaptive Warp-Balancing ELL Format for Efficient Sparse Matrix-Vector Multiplication on GPUs M. Deveci, K. Kaya, B. Uçar and U. Catalyurek. A Push-Relabel-based Maximum Cardinality Bipartite Matching Algorithm on GPUs	D. Ozog, J. Hammond, J. Dinan, P. Balaji, S. Shende and A. Malony. Inspector-Executor Load Balancing Algorithms for Block-Sparse Tensor Contractions M. Hofmann and G. Rünger. Efficient Data Redistribution Methods for Coupled Parallel Particle Codes P. Malakar, V. Natarajan, S. Vadhiyar and R. Nanjundia. A Diffusion-Based Processor Reallocation Strategy for Tracking Multiple Dynamically Varying Weather Phenomena	A. Holey, V. Mekkat and A. Zhai. HAccRG: Hardware-Accelerated Data Race Detection in GPUs J. F. Dollinger and V. Loechner. Adaptive Runtime Selection for GPU+CPU S. Potluri, K. Hamidouche, A. Venkatesh, D. Bureddy and D. Panda. Efficient Inter-node MPI Communication using GPUDirect RDMA for InfiniBand Clusters with NVIDIA GPUs

Session 4: Algorithms 2 Wednesday Oct. 2 - 14:00-15:30 Mérieux Theater Session chair: Henning Meyerhenke	Session 5: Networking 1 Wednesday Oct. 2 - 14:00-15:30 Thesis Room Session chair: Loris Marchal	Session 6: Performance Models 1 Wednesday Oct. 2 - 14:00-15:30 Mérieux Room 1 Session chair: Zhiyi Huang

X. Lin and C. Wu. On Scientific Workflow Scheduling in Clouds under Budget Constraint J. Paudel, O. Tardieu and J. Amaral. On the merits of distributed work-stealing on selective locality-aware tasks S. Asghar, E. Aubanel and D. Bremner. A Dynamic Moldable Job Scheduling Based Parallel SAT Solver	K. Wang, X. Mao and Y. Liu. Blind Date: A Neighbor Discovery Protocol H. Shen, A. Liu and L. Zhao. Freeweb: P2P-Assisted Collaborative Censorship-Resistant Web Browsing C. Y. Ho, M. C. Chung, L. H. Yen and C. C. Tseng. Churn: a Key Effect on Real-World P2P Software	M. Iqbal, J. Holt, J. H. Ryoo, G. de Veciana and L. John. Flow Migration on Multicore Network Processors: Load Balancing While Minimizing Packet Reordering C. Truchet, F. Richoux and P. Codognet. Prediction of Parallel Speed-ups for Las Vegas Algorithms D. DeFord and A. Kalyanaraman. Empirical Analysis of Space-Filling Curves for Scientific Computing Applications

Session 7: Algorithms 3 Wednesday Oct. 2 - 16:00-18:00 Mérieux Theater Session chair: Reza Hozeiny	Session 8: Applications 2 Wednesday Oct. 2 - 16:00-18:00 Thesis Room Session chair: Bora Uçar	Session 9: Software 1 Wednesday Oct. 2 - 16:00-18:00 Mérieux Room 1 Session chair: Pavan Balaji

C. Staudt and H. Meyerhenke. Engineering High-Performance Community Detection Heuristics for Massive Graphs K. Chen and H. Shen. Cont2: Social-Aware Content and Contact Based File Search in Delay Tolerant Networks M. Deveci, K. Kaya and U. Catalyurek. Hypergraph Sparsification and Its Application to Partitioning	G. Slota and K. Madduri. Fast Approximate Subgraph Counting and Enumeration R. Sin'ya, K. Matsuzaki and M. Sassa. Simultaneous Finite Automata: An Efficient Data-Parallel Model for Regular Expression Matching T. Müller, J. Weidendorfer and A. Blaszczyk. Expression Tree Evaluation by Dynamic Code Generation - Are Accelerators up for the Task ? K. Sarnowska-Upton and A. Grimshaw. Predicting Execution Readiness of MPI Binaries with FEAM, a Framework for Efficient Application Migration	E. Francesquini, A. Goldman and J. F. Méhaut. A NUMA-Aware Runtime Environment for the Actor Model T. Komoda, H. Nakamura, S. Miwa and N. Maruyama. Integrating Multi-GPU Execution in an OpenACC Compiler B. Medeiros and J. Sobral. AOmpLib: An Aspect Library for Large-Scale Multi-Core Parallel Programming J. Dokulil, E. Bajrovic, S. Benkner, M. Sandrieser and B. Bachmayer. HyPHI - task based hybrid execution C++ library for the Intel Xeon Phi coprocessor

Session 10: Algorithms 4 Thursday Oct. 3 - 11:00-12:30 Mérieux Theater Session chair: Anne Benoit	Session 11: Applications 3 Thursday Oct. 3 - 11:00-12:30 Thesis Room Session chair: Kamer Kaya	Session 12: Performance Models 2 Thursday Oct. 3 - 11:00-12:30 Mérieux Room 1 Session chair: Suzanne Rivoire

J. Lejeune, L. Arantes, J. Sopena and P. Sens. A prioritized distributed mutual exclusion algorithm balancing priority inversions and response time A. Luo, W. Wu, J. Cao and M. Raynal. A Generalized Mutual Exclusion Problem and Its Algorithm R. Hu, J. Sopena, L. Arantes, P. Sens and I. Demeure. Efficient Dissemination Algorithm for Scale-Free Topologies	H. Anzt, J. Aliaga, J. Perez and E. Quintana-Ortí. Reformulated Conjugate Gradient for the Energy-Aware Solution of Linear Systems on GPUs Z. Ul-Abdin, A. Ahlander and B. Svensson. Energy Efficient Synthetic-Aperture Radar Processing on a Manycore Architecture M. Delorme, T. Abdelrahman and C. Zhao. Parallel Radix Sort on the AMD Fusion Accelerated Processing Unit	C. H. Chang, P. Liu and J. J. Wu. Sampling-Based Phase Classification and Prediction for Multi-threaded Program Execution on Multi-core Architectures H. Yang, Qi Zhao, Z. Luan, D. Qian, J. Mars and L. Tang. iMeter: An Integrated VM Power Model Based on Performance Profiling T. Li, Y. Ren, D. Yu, S. Jin and T. G. Robertazzi. Characterization of Input/Output Bandwidth Performance Models in NUMA Architecture for Data Intensive Applications

Session 13: Algorithms 5 Thursday Oct. 3 - 14:00-15:30 Mérieux Theater Session chair: Qing Yi	Session 14: Applications 4 Thursday Oct. 3 - 14:00-15:30 Thesis Room Session chair: Jean-François Méhaut	Session 15: Networking 2 Thursday Oct. 3 - 14:00-15:30 Mérieux Room 1 Session chair: Xiaoyi Lu

A. Rosenberg. Finite-State Robots in a Warehouse: Achieving Linear Parallel Speedup while Rearranging Objects B. Zhou and J. Wen. Hysteresis Re-chunking Based Metadata Harnessing Deduplication of Disk Images M. Kardas, M. Klonowski and D. Pajak. Energy-Efficient Leader Election Protocols for Single-Hop Radio Networks	Y. Zhu and J. Masui. Backing Up Your Data to the Cloud:Want to Pay Less? M. Hoseiny Farahabady, H. R. Dehghani Samani, L. Leslie, Y. C. Lee and A. Zomaya. Handling Uncertainty: Pareto-Efficient BoT Scheduling on Hybrid Clouds C. Avenel, P. Fortin and D. Béréziat. Massively parallel birth and death process for cell nuclei extraction in histopathology images	X. Ren, W. Liang and W. Xu. Use of a Mobile Sink for Maximizing Data Collection in Energy Harvesting Sensor Networks N. Tziritas, C. Z. Xu, T. Loukopoulos, S. Khan and Z. Yu. Application-aware Workload Consolidation to Minimize both Energy Consumption and Network Load in Cloud Environments Si Zheng, Y. Liu, Li Shanshan, T. He and X. Liao. Risk Intelligence: Profitting from Uncertainty in Data Processing System

Session 16: Architectures 2 Friday Oct. 4 - 10:45-12:45 Mérieux Theater Session chair: Frédéric Vivien	Session 17: Networking 3 Friday Oct. 4 - 10:45-12:45 Thesis Room Session chair: Nikos Tziritas	Session 18: Software 2 Friday Oct. 4 - 10:45-12:45 Mérieux Room 1 Session chair: Satoshi Matsuoka

C. Wu and X. He. A Flexible Framework to Enhance RAID-6 Scalability via Exploiting the Similarities among MDS Codes X. Luo and J. Shun. Load-Balanced Recovery Schemes for Single-disk Failure in Storage Systems with Any Erasure Code A. Ros, B. Cuesta, M. E. Gómez, A. Robles and J. Duato. Temporal-Aware Mechanism to Detect Private Data in Chip Multiprocessors V. Nguyen, N. Le, I. Fujiwara and M. Koibuchi. Distributed Shortcut Networks: Layout-aware Low-degree Topologies Exploiting Small-world Effect	M. García, E. Vallejo, R. Beivide, M. Odriozola and M. Valero. Efficient Routing Mechanisms for Dragonfly Networks T. Schneider, T. Hoefler, R. Grant, B. Barrett and R. Brightwell. Protocols for Fully Offloaded Collective Operations on Accelerated Network Adapters Z. Yang, W. Wu, Y. Chen and J. Zhang. Efficient Information Dissemination in Dynamic Networks K. C. Kandalla, H. Subramoni, K. Tomko, D. Pekurovsky and D. Panda. A Novel Functional Partitioning Approach to Design High-Performance MPI-3 Non-Blocking Alltoallv Collective on Multi-core Systems	F. Xu, Li Shen, Z. Wang, H. Guo, Bo Su and W. Chen. HEUSPEC: A Software Speculation Parallel Model M. Haque, Q. Yi, J. Dinan and P. Balaji. Enhancing Performance Portability of MPI applications through Annotation-Based Transformations X. Lu, N. Islam, M. Wasi-ur-Rahman, J. Jose, H. Subramoni, H. Wang and D. Panda. High-Performance Design of Hadoop RPC with RDMA over InfiniBand Z. Cao and C. Verbrugge. Mixed Model Universal Software Thread-Level Speculation

Short Papers Session
Thursday Oct. 3 - 16:00-18:30
Mérieux Theater
Session chair: Jack Dongarra

S. Di, D. Kondo and F. Cappello. Characterizing and Modeling Cloud Applications/Jobs on a Google Data Center
F. Campeotto, A. Dovier and E. Pontelli. Protein Structure Prediction on GPU: a Declarative Approach in a Multi-agent Framework
M. Tsuji, M. Sato, M. Hugues and S. Petiton. Multiple-SPMD Programming Environment based on PGAS and Workflow toward Post-Petascale Computing
K. Arumugam, A. Godunov, D. Ranjan, B. Terzic and M. Zubair. An Efficient Deterministic Parallel Algorithm for Adaptive Multidimensional Numerical Integration on GPUs
A. Tino, G. Khan and F. Yuan. Towards Hardware Realizations of Intelligent Systems- a Cortical Column Approach
X. Lu. WormPlanar: Topological Planarization based Wormhole Detection in Wireless Networks
G. Han, C. Zhang, K. T. Lam and C. L. Wang. Japonica: Java with Auto-Parallelization on Graphics Coprocessing Architecture
S. Diersen, L. Wang and H. Ma. Symbolic Analysis of Concurrency Errors in OpenMP Programs
M. Manivannan, A. Negi and P. Stenstrom. Efficient Forwarding of Producer-Consumer Data in Task-based Programs
M. Korch, T. Ramming and G. Rein. Parallelization of Particle-in-Cell Codes for Nonlinear Kinetic Models from Mathematical Physics
R. Machado, V. Pedro and S. Abreu. On the Scalability of Constraint Programming on Hierarchical Multiprocessor Systems
A. S. M. H. Mahmud and S. Ren. Dynamic Server Provisioning for Carbon-Neutral Data Centers

oooooooooooooooooooooooooooo

ooooooooooo

oooo

ICPP 2013 Program

Tuesday, October 1

Wednesday, October 2

Conference

Workshops

Thursday, October 3

Conference

Workshops

Friday, October 4

Conference

Workshops

ICPP2013 Conference Program