Johannes Habich

Monday, December 8, 2008

Fast Network, Fast disconnects (Linksys WRT610N )

Fast Network, Fast disconnects (Linksys WRT610N )
Looking forward to fast streaming HD Media over my new wireless router (WRT610N) I got into serious trouble on having a stable connection at all.

Having my network set up for WPA2 and TKIP for compatibility reasons, I got random disconnects of the whole 5 GHz band, while 2.4 GHz performed flawlessly. Searching the internet I stumbled across some serious accusations, that the WRT610N is a flawed design and overheats a lot.
Whether this is right or not I cannot say for sure, however I expected much more from Linksys and a home premium line product.

Searching a little more I came across another users experience that a change from the TKIP encryption to AES solved the problem of occuring disconnects.

And voila the problems seems to be solved.

So for everyone who can live with an AES only encryption on the 5 GHz 11N band and TKIP or AES on the 2.4 GHz 11g band the router is a perfect catch in both performance and appearance.

Monday, November 24, 2008

Yeehhaa: NVIDIA GT200 rocks

An exemplar of the new NVIDIA Series GT200 based GTX280 Graphics card arrived at our Computing Center last Friday . The card was installed and set up right away and the first benchmark ran on Saturday 22nd of November and finished today.

Some preliminary figures show the great improvement of this new generation as I expected from the data sheets. Soon I will post some verified results here and some about the changes from the G80 generation to the current GT200 chip.

Friday, November 7, 2008

Running MPI Jobs on Windows CCS

In order to run only one MPI process per allocated node on the Windows CCS Cluster, you have to tweak the system variable set by the scheduler. For each allocated processor the system variable (CCP_NODES) contains the associated hostname once.
As a consequence, four MPI processes are started.

In order to remove the redundant hostnames you call your program the following way from inside the scheduler:
mpiexec.exe -hosts %CCP_NODES: 4= 1%

%CCP_NODES: 4= 1% removes three out of four lines, which reduces each hostname down to one occurence, as the same hostnames are always consecutive.

Tuesday, October 21, 2008

Distributed Revision System Mercurial

Converting CVS to HG

To get hand on knowledge on the distributed revision systems like Mercurial,
just export one of your CVS Reps to a test HG Rep. Important for any repository, the history should stay intact (and hopefully will)!

A more complete guide can be found here:

Generate the repository folder and enter:
mkdir -p /path/to/hg/repo
cd /path/to/hg/repo

Generate the config file:

tailor -v --source-kind cvs --target-kind hg --repository /path/to/CVS/REP --module YourModuleName -r INITIAL >Config.tailor

for SSH access to the repository change /path/to/CVS/REP to :
:ext:USERNAME@YOURSERVER:/path/to/cvsrep

Change configfile to your needs
vi Config.tailor

Now you will at least need to change subdir from . to MODULENAME, and remove /MODULENAME from root-directory in the MODULENAME.tailor file (if it is really there).

Add the line:

patch-name-format =

Generate Mercurial project

tailor --configfile MODULENAME.tailor

Cloning repositories with ssh

To clone the repository, ssh can be used easily.
Just type the following hg clone ssh://yourlogin@yourhost/
or insert ssh://yourlogin@yourhost// in your client program as the source path.

Distributed Revision System Mercurial

Converting CVS to HG

Cloning repositories with ssh

To clone the repository, ssh can be used easily.
Just type the following hg clone ssh://yourlogin@yourhost/
or insert ssh://yourlogin@yourhost// in your client program as the source path.

Thursday, October 16, 2008

Co-array Fortran and UPC

CAF and UPC are Fortran and C extensions for the Partitioned Global Adress Space (PGAS) model.
So independent of the hardware restrictions, each processor can access (read and write) data from other processors, without the need of additional communication libraries, e.g. MPI.

HLRS provided an introductory course about this.
At the current development stage I do not clearly see the benefit for production codes. However, some ideas might be implemented more quickly with these paradigms than with ordinary MPI for testing purposes.

Monday, October 13, 2008

Theses

Johannes Habich: Performance Evaluation of Numeric Compute Kernels on NVIDIA GPUs, Master's Thesis , RRZE-Erlangen, LSS-Erlangen, 2008.

Johannes Habich: Improving computational efficiency of Lattice Boltzmann methods on complex geometries , Bachelor's Thesis , RRZE-Erlangen, LSS-Erlangen, 2006.

Other publications (not fully reviewed)

G. Hager, J. Treibig, J. Habich, and G. Wellein: Exploring performance and power properties of modern multicore chips via simple machine models. Submitted. Preprint: arXiv:1208.2908

J. Habich, C. Feichtinger, G. Wellein: GPGPU implementation of the LBM: Architectural Requirements and Performance Result,
Parallel CFD Conference 2011, BSC, Barcelona, Spain, May 2011.

G. Wellein, J. Habich, G. Hager, T. Zeiser: Node-level performance of the lattice Boltzmann method on recent multicore CPUs,
Parallel CFD Conference 2011, BSC, Barcelona, Spain, May 2011.

C. Feichtinger, J. Habich, H. Köstler, U. Rüde, G. Wellein: WaLBerla: Heterogeneous Simulation of Particulate Flows on GPU Clusters,
Parallel CFD Conference 2011, BSC, Barcelona, Spain, May 2011.

J. Habich, C. Feichtinger, G. Hager, G. Wellein: Poster: Parallelizing Lattice Boltzmann Simulations on Heterogeneous GPU&CPU Clusters. 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (Supercomputing '10, New Orleans, 13.11. -- 19.11.2010) , 2010.

J. Habich, T. Zeiser, G. Hager, G. Wellein: Enabling temporal blocking for a lattice Boltzmann flow solver through multicore aware wavefront parallelization. Parallel CFD Conference 2009, NASA AMES, Moffet Field (CA, USA), Mai, 2009.

S. Donath, T. Zeiser, G. Hager, J. Habich, G. Wellein: Optimizing performance of the lattice Boltzmann method for complex geometries on cache-based architectures, (In: F. Hülsemann, M. Kowarschik, U. Rüde (editors), Frontiers in Simulation -- Simulationstechnique, 18th Symposium in Erlangen, September 2005 (ASIM)), SCS Publishing, Fortschritte in der Simulationstechnik, ISBN 3-936150-41-9, (2005) 728-735.

Given or co-authored talks and presentations (see also section on lectures below)

J. Habich, C. Feichtinger, G. Wellein, waLBerla: MPI parallele Implementierung eines LBM Lösers auf dem Tsubame 2.0 GPU Cluster, Seminar Talk, Leibniz Rechenzentrum, München, Germany, Feb. 29th 2012.

J. Habich, C. Feichtinger, G. Wellein, Hochskalierbarer Lattice Boltzmann Löser für GPGPU Cluster , High Performance Computing Workshop , Leogang, Austria, Feb. 27th 2012.

G. Wellein, J.Habich, G. Hager, T. Zeiser, Node-level performance of the lattice Boltzmann method on recent multicore CPUs I,
Parallel CFD Conference 2011, Barcelona, Spain, May 2011.

G. Wellein, J.Habich, G. Hager, T. Zeiser, Node-level performance of the lattice Boltzmann method on recent multicore CPUs II,
Parallel CFD Conference 2011, Barcelona, Spain, May 2011.

J.Habich, C. Feichtinger, G. Wellein, GPGPU implementation of the LBM: Architectural Requirements and Performance Result,
Parallel CFD Conference 2011, Barcelona, Spain, May 2011.

C. Feichtinger, J. Habich, H. Köstler, U. Rüde G. Wellein, WaLBerla: Heterogeneous Simulation of Particulate Flows on GPU Clusters,
Parallel CFD Conference 2011, Barcelona, Spain, May 2011.

J.Habich, Ch. Feichtinger and G. Wellein, GPU optimizations at RRZE,
invited Talk, ZISC GPU Workshop, Erlangen, Germany, April, 2011.

G. Wellein, G. Hager and J.Habich, The Lattice Boltzmann Method: Basic Performance Characteristics and Performance Modeling,
invited Minisymposia talk, SIAM CSE 2011, Reno, Nevada, USA, March, 2011.

J.Habich and Ch. Feichtinger, Performance Optimizations for Heterogeneous and Hybrid 3D Lattice Boltzmann Simulations on Highly Parallel On-Chip Architectures,
invited Minisymposia talk, SIAM CSE 2011, Reno, Nevada, USA, March, 2011.

J.Habich, Ch. Feichtinger, T. Zeiser, G. Wellein, Optimizations on Highly Parallel On-Chip Architectures: GPUs vs. Multi-Core CPUs (for stencil codes),
iRMB TU-Braunschweig, invited Seminar talk, Braunschweig, Germany, July 2010.

J.Habich, Ch. Feichtinger, T. Zeiser, G. Wellein, Optimizations on Highly Parallel On-Chip Architectures: GPUs vs. Multi-Core CPUs (for stencil codes),
iRMB TU-Braunschweig, invited Seminar talk, Braunschweig, Germany, July 2010.

J.Habich, Ch. Feichtinger, T. Zeiser, G. Hager, G. Wellein, Performance Modeling and Optimization for 3D Lattice Boltzmann Simulations on Highly Parallel On-Chip Architectures: GPUs Vs. Multi-Core CPUs,
ECCOMAS CFD Lisboa, Lisbon, Portugal, June 2010.

J.Habich, T. Zeiser, G. Hager, G. Wellein, Performance Modeling and Multicore-aware Optimization for 3D Parallel Lattice Boltzmann Simulations,
Facing the Multicore-Challenge, Heidelberger Akademie der Wissenschaften, Heidelberg, Germany, March 2010.

J. Habich, T. Zeiser, G. Hager, G. Wellein: Performance Evaluation of Numerical Compute Kernels on GPUs,
First International Workshop on Computational Engineering - Special Topic Fluid-Structure Interaction, Herrsching am Ammersee, Germany, October, 2009.

J.Habich, T. Zeiser, G. Hager, G. Wellein: Towards multicore-aware wavefront parallelization of a lattice Boltzmann flow solver,
5th Erlangen High-End-Computing Symposium, Erlangen, Germany, June 2009.

J. Habich, T. Zeiser, G. Hager, G. Wellein: Enabling temporal blocking for a lattice Boltzmann flow solver through multicore-aware wavefront parallelization, submitted to Parallel CFD Conference,
Moffett Field, California, USA, May 18-22, 2009.

J. Habich, T. Zeiser, G. Hager, G. Wellein: Speeding up a Lattice Boltzmann Kernel on nVIDIA GPUs,
First International Conference on Parallel, Distributed and Grid Computing for Engineering (PARENG09-S01), Pecs, Hungary, April 2009.

J. Habich, G. Hager: Erfahrungsbericht Windows HPC in Erlangen,
WindowsHPC User Group 2nd Meeting, Dresden, Germany, March 2009.

J. Habich, G. Hager: Windows CCS im Produktionsbetrieb und erste Erfahrungen mit HPC Server 2008,
WindowsHPC User Group 1st Meeting, Aachen, Germany, April 2008.

T. Zeiser, J. Habich, G. Hager, G. Wellein: Vector computers in a world of commodity clusters, massively parallel systems and many-core many-threaded CPUs: recent experience based on advanced lattice Boltzmann flow solvers,
HLRS Results and Review Workshop, Stuttgart, Germany, September 2008.

S. Donath, T. Zeiser, G. Hager, J. Habich, G. Wellein: On cache-optimized implementations of the lattice Boltzmann method on complex geometries,
ASIM, Erlangen, Germany, September 2005.

Conference, workshop and tutorial participation without own presentation

WindowsHPC User Group 3rd Meeting, St. Augustin, March 2010.

WindowsHPC User Group 2nd Meeting, Dresden, March 2009.

Introduction to Unified Parallel C (UPC) and Co-array Fortran (CAF) HLRS, October 2008

Course on Microfluidics University of Erlangen-Nuremberg Computer Science 10, System Simulation, October 2008

IBM Power6 Programming Workshop at RZG, September, 2008

PRACE Petascale Summer School (P2S2), Stockholm, Sweden, August, 2008.

Johannes Habich

Pages

Monday, December 8, 2008

Fast Network, Fast disconnects (Linksys WRT610N )

Monday, November 24, 2008

Yeehhaa: NVIDIA GT200 rocks

Friday, November 7, 2008

Running MPI Jobs on Windows CCS

Tuesday, October 21, 2008

Distributed Revision System Mercurial

Converting CVS to HG

Cloning repositories with ssh

Distributed Revision System Mercurial

Converting CVS to HG

Cloning repositories with ssh

Thursday, October 16, 2008

Co-array Fortran and UPC

Monday, October 13, 2008

Theses

Other publications (not fully reviewed)

Given or co-authored talks and presentations (see also section on lectures below)

Conference, workshop and tutorial participation without own presentation

Followers

About Me