The recent developments of so called disruptive technologies always lead to some kind of everlasting discussion.
Today I want to say something about the hassle whether GPUs are feasible in any way for scientific computing as their double precision Performance is nowadays not too far away from standard CPUs. And single precision is not worth the discussion, as nobody wants to board a plane or a ship which was simulated just in single precision.
Detour
So for non-simulators first some explanation: single precision means a floating point representation of a given number using up to 4 bytes. Double precision uses up to 8 bytes and can therefore provide much more accuracy.
GPUs are originally designed for graphics applications that do not actually need single precision. There is a bunch of very fast FLOP commands just working on 24 bits instead of 32 bits (again 32 bits = 4 byte = single precision).
E.g. current NVIDIA cards just have 1 dp FLOP unit per 8 sp FLOP unit.
Till now its obvious why everyone complains about the worse dp performance in contrast to sp performance. However, nobody (well I do) complains about the low dp performance I actually get off a current x86 processor. There are some kinds of system configuration were you will just get about 10% or even less the performance.
This comes as data is brought much slower to the computing units than it is computed on there.
This is true for most scientific codes, e.g. stencil codes. Therefore you will see the usual breakdown to 50% of performance when switching from sp to dp on GPUs as you see on CPUs, because you simply transfer twice the data over the same system bus.
So, the dp units are most often not the limit of compute performance.
Friday, May 28, 2010
Thursday, May 20, 2010
LaTex: Floatflt.sty missing on ubuntu lucid 10.04
The recent upgrade to the new ubuntu stable version missed installing all tex-live ressources, I thought at first.
However license of floatflt.sty has been changed, thus it is no longer in ubuntu or tex-live.
Here's a quick guide to reenable it.
LaTeX Error: File `floatflt.sty' not found
sudo mkdir -p /usr/share/texmf-texlive/tex/latex/floatflt
cd /usr/share/texmf-texlive/tex/latex/floatflt
sudo rm -f floatflt.* float*.tex
sudo wget http://mirror.ctan.org/macros/latex/contrib/floatflt/floatflt.ins
sudo wget http://mirror.ctan.org/macros/latex/contrib/floatflt/floatflt.dtx
sudo latex floatflt.ins
sudo texhash /usr/share/texmf-texlive
Source of suggestion with discussion wether to use backport ........
I would appreciate any hint to any solution which does this more automatically please drop me a comment on your solution.
However license of floatflt.sty has been changed, thus it is no longer in ubuntu or tex-live.
Here's a quick guide to reenable it.
Problem:
LaTeX Error: File `floatflt.sty' not found
Solution (to be run as root ):
sudo mkdir -p /usr/share/texmf-texlive/tex/latex/floatflt
cd /usr/share/texmf-texlive/tex/latex/floatflt
sudo rm -f floatflt.* float*.tex
sudo wget http://mirror.ctan.org/macros/latex/contrib/floatflt/floatflt.ins
sudo wget http://mirror.ctan.org/macros/latex/contrib/floatflt/floatflt.dtx
sudo latex floatflt.ins
sudo texhash /usr/share/texmf-texlive
Source of suggestion with discussion wether to use backport ........
I would appreciate any hint to any solution which does this more automatically please drop me a comment on your solution.
Monday, May 10, 2010
JUROPA MPI Buffer on demand
To enable huge runs with lots of MPI ranks you have to disable the per default allocated all-to-all send buffer on the NEC- Nehalem Cluster Juropa at FZ Jülich.
Juropa Introduction @ FZJ
Here is an excerpt from the official docu:
Most MPI programs do not need every connection
- Nearest neighbor communication
- Scatter/Gather and Allreduce based on binary trees
- Typically just a few dozen connections when having hundreds of
processes - ParaStation MPI supports this with ?on demand connections?
- export PSP_ONDEMAND=1
- was used for the Linpack runs (np > 24000)
- mpiexec --ondemand
- Backdraw
- Late all-to-all communication might fail due to short memory
- Default on JuRoPA is not to use ?on demand connections?
Links
Juropa Introduction @ FZJ
Subscribe to:
Posts (Atom)