Wednesday, September 10, 2008

PCI express bandwidth measurements

Benchmarking the PCI express capabilities with CUDA I stumbled across the weird behaviour that a 4 MB block seems to achieve the best sustainable bandwidth. At least when writing to the host.
However, transmitting more than 4 MB but with 4 MB data packets (let's call it blocked copy) does leave a gap in performance.
Although the performance is regained at the end with almost filling the whole GPU memory, the question is what causes the performance to drop to 2GB/s in the first place.

Another interesting question is the jump in performance at 1e6 bytes. Possibly a switch in protocols
Performance of PCI Express transfers to NVIDIA G80 8800 GTX card

HPC Server 2008 launch

The official launch of HPC Server 2008 is on 16th of October 2008, at the Frankfurt Rhein-Main Airport. More information on the official HPC 2008 launch website .

Tuesday, September 2, 2008

Towards Teraflops for Games

With the release of the next generation of GPUs, NVIDIA and AMD (former ATI) graphic boards deliver now performance in the order of one teraflop in single precision accuracy. NVIDIA nearly doubled both the count of processors and the memory bus width. Interesting for research is now, how the sustainable performance of programs and algorithms scales with the new platform.
Until now I was not able to test my own algorithms, the Streambenchmarks and the lattice Boltzmann method (see my Thesis for more details ), on the new NVIDIA GPUs.

Double precision also made its way into the GPU circuits, unfortunately with a huge performance loss to around a tenth of single precision performance.
In contrast to that current CPUs lose only about 50% of performance, which comes obvious from the doubled computational work.

Here a little demonstration about the key difference between CPU and GPU NVISION

Windows HPC Deployment

The Windows High Performance Cluster Competence Center located at the RWTH Aachen is giving tutorials for administrators on Windows HPC 2008 deployment. Please find more detailed information on their webpage.

Monday, September 1, 2008

Windows HPC Event at RWTH Aachen

The Windows High Performance Cluster Competence Center located at the RWTH Aachen is giving tutorials on using Windows HPC 2008, the upcoming version of Windows Compute Cluster Server. Please find more detailed information on their webpage.

PRACE Petascale Summer School

PRACE Summer School website

Taking place this week (25th to 29th of August) in Stockholm, Sweden, the Prace Summer School tries to evaluate the needs of the current academic HPC user community. The general aim is to get benchmarks and metrics for future petascale systems.

Current surveys show, that only a small portion of the overall leading HPC systems are used with large massive parallel jobs. A great deal stays under 10% of one supercomputers resources, not utilizing the parallel abilities of such a machine.

On the other side, profound knowledge is needed to implement common algorithms to scale accros 64 to 128 nodes towards 1K or even more nodes. Therefore a lot of stress is put on techniques and hands-on sessions to teach more knowledge about that topic.

Summed up it was a great event to get additional skills and training and as well as to get to know the different kinds of algorithms and user expectations in HPC.

First Shot

I'm currently with the HPC group @ RRZE and working on my master thesis about HPC on graphic cards regarding benchmark kernels and flow solvers.

So any remarks or hints? Drop them here!

<% image name="cuda hpc" %>


/edit 01.07.08

Thesis finished :-)