Johannes Habich: NVIDIA

Showing posts with label NVIDIA. Show all posts

Friday, January 31, 2014

Powerpoint 2010 / Office 2010 crash in Windows (8.1) due to Optimus

Recently I discovered that some of my presentations crash on my new laptop running Windows 8.1 and Powerpoint 2010. The error was: "Powerpoint stopped working" or "Powerpoint funktioniert nicht mehr".
Excel and Word worked like a charm, and some older presentations did as well.
So first of all I suspected an error in the presentation file itself and tried it successfully on other machines running Windows 7 and the same office.
I tried some other tests with converting the files back to *.ppt scheme and removing animations and slides.
However, nothing worked.

In the end the problem was GPU related. By default, Office gets launched on the integrated Intel GPU. However, Powerpoint, or at least my highly sophisticated presentation, requires the "real deal" GPU.

I started Powerpoint using the NVIDIA GPU and the Presentation worked on this machine as well

Tuesday, November 19, 2013

Windows 8.1 First impressions and first problems with freezing touchpad on Dell Inspiron 17 7000

Unpacking my brand new Dell Inspiron 17 7000 I was keen on test-driving this new touchy, feely stuff for the first time.
Acutally I was prepared for the worst and went out really impressed!
I, for myself, am using this machine as it would be with Win7, just nicer.
Hit the Start (Keyboard) button like before and the instant search returns your applications, files and more stuff just like before. With the metro area you have now even more space to arrange the most-often-used applications and give them a little bit of grouping and prioritizing by having larger or smaller icons.

Still, I need some time to get a grasp on full-screen apps and how to switch between full and desktop mode.
Alltogether, I cannot comprehend why everybody is having problems with this new kind of design and paradigm.

Now comes the bad news. Win 8 and Win 8.1 is still having a lot of pitfalls. I for myself had to disable hibernation or the new windows stand-by mode because my laptop kept freezing after the second or third resume. Furthermore, eventually after switching users or returning from stand-by, the mouse is unresponsive to left-click but everything else works just find.
For my type of laptop this can be solved by double-finger scrolling once, which rips the touchpad driver out of its misery and everything is back to normal again.

Friday, February 17, 2012

CUDA: Accessing pinned/pagelocked memory from different threads

Some time ago I asked the question here:
http://forums.nvidia.com/index.php?showtopic=201193

on how to access pinned or pagelocked memory from two threads in the same program.

The key is to allocate with the portable attribute.
Look at:
Nvidia Doxygen 4.0 cudaHostAlloc

for details.

Even more elegant is however, to register the memory only if you need it:
See details about cudaHostRegister cudaHostRegister cudaHostUnregister at:
Nvidia Doxygen 4.0 cudaHostRegister

Thursday, August 11, 2011

OpenCL programming

OpenCL Kernels get compiled at execution time (Just in Time, JIT).
This means that any error inside the kernel is discovered at that time.
The error message I get till now are very silent about what the problem of the kernel is.
So there is just one method of commenting and uncommenting to get a kernel debugged just for compilation.

Intel however recently released it's first beta version of OpenCL with a lot of tools.
As with early Parallel Studio, these tools are only available for Windows (in particular Win Vista + and Server 2008) and not for Linux.
Note, that the runtime for compiling and running OpenCL is available for Linux, Mac and Windows!

Included is the Intel OpenCL Offline Compiler where you can load your kernel and precompile it.
Here the error messages are much more helpfull (of course, helpfull in a way ordinary compiler messages are helpfull :-)).
Nevertheless, a great tool which makes daily programming a lot easier.

LINK: http://www.intel.com/go/opencl/

Thursday, August 4, 2011

TinyGPU upgrade to CUDA Toolkit 4.0

All nodes of the TinyGPU Cluster is now on the current CUDA Toolkit 4.0 and the appropriate driver.

Tuesday, June 21, 2011

Disable Fermi Cache

To disable Fermi L1 Cache in CUDA just compile with: -Xptxas -dlcm=cg

Any idea on how to do this with OpenCL?

Friday, December 10, 2010

Windows and CUDA; enabling TCC with nvidia-smi

Like in Linux you can use nvidia-smi to set different modes on the TESLA GPUs.
nvidia-smi is located usually at: C:\Program Files\NVIDIA Corporation\NVSMI\nvidia-smi.exe

Go there with a command prompt and administrative privileges and type nvidia-smi -s. This gives you the current status and the status of TCC mode after reboot.
Set exclusive compute mode enabled for the first GPU by nvidia-smi -g 0 -c 1
Set exclusive compute mode disabled for the first GPU by nvidia-smi -g 0 -c 0
For other GPUs increment the number after -g

/edit 24.12.2010:
Also look at the first comment on how to change between WDDM and TCC driver model.
Thanky Francoise for reporting my mistake. I corrected it above

Thursday, December 9, 2010

Win2008 HPC Server and CUDA TCC revisited

The release of the stable NVIDIA Driver 260.83 broke my Windows CUDA programming environment.
With the currently newst driver, 263.06, I gave it another shoot. Initially the CUDASDK sample programs did not recognize the GPU as CUDA capable and there was just some babbling about DRIVER and TK mismatch.
However this time searching the web got me to an IBM webpage which got a solution for their servers running Windows 2008 R2.
I tried this in Win2008 and it works like charm:

Enter the registry edit utility typing regedit in the run dialog and navigate to:

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Class\{4D36E968-E325-11CE-BFC1-08002BE10318}\

You will find subfolders named 0001 0002 aso. depending on the number of GPUs in your system.

For each card you want to enable CUDA go to that 000X directory and add the following reg key (32bit dword worked for me):

"AdapterType"=dword:00000002

If you access the system via RDP read my blog entry on Using nvidia-smi for TCC on how to set this up!

Source of this information is IBM and can be found here for further reference and even more details: IBM Support Site

Wednesday, October 13, 2010

NVIDIA CUDA TCC Driver Released 260.83

Just today Nvidia released the WHQL certified Tesla Compute Cluster driver TCC 260.83 for usage in e.g. Windows 2008 Server/HPC.
Till now only a beta version was available
With that special driver you have the ability to use GPGPU compute resources via RDP or via WindowsHPC batch processing mode.

Download the driver here

/edit:
Actually installing this driver broke my working environment. So be sure to keep a backup of the system. Even reinstalling the beta version did not solve the problem.

Tuesday, October 12, 2010

Win2008 HPC Server and CUDA TCC

Nvidia now provides a beta driver called Tesla Compute Cluster (TCC) in order to use CUDA GPUs within a windows cluster environment. Not only remotely via RDP but also in batch processing. Till now, the HPCServer lacked this ability, as Windows did not fire up the graphics driver inside the limited batch logon mode.

My first steps with TCC took a little bit longer than estimated.

First of all It is not possible to have a NVIDIA and AMD or INTEL GPU side by side as Windows needs to use one unified WDM and thats either one or the other vendor. This was completely new to me.

After this first minor setback and reequipped with only the tesla C2050 the BIOS did not finish, so be sure to be up to date with your BIOS revision.
Another NVIDIA card was the quick fix on my side.

Next thing is the setup. Install the 260 (currently beta) drivers and the exclamation mark in the device manager should vanish.
After that install the toolkit and SDK if you like.
With the nvidia-smi tool, which you find in one of the uncountable NVIDIA folders which are there now, you can have a look if the card is initally correctly recognized.
As well set the TCC mode of the Tesla card to enabled if you want to have remote cuda capabilities:

nvidia-smi -s --> shows the current status
nvidia-smi -g 0 -c 1 --> enables TCC on GPU 0

Next thing you want to test the device query coming with the SDK.
If it runs and everything looks fine, feel gifted!

Nothing did run on my setup. So first of all I tried to build the SDK example myself. Therefore first of all build the Cuda utilities, lying somewhere in the SDK within the folder "common".
Depending on the Nsight or TK version you have installed you get an error when opening the VS project fles . The you need to edit the visual studio with a text editor of your choice and replace the outdated build rule with the one actually installed.

In the error message get to the folder where VS does not find the file.

Copy the path and go there with your file browser

Find the file most equal to the one in the VS error message.

Once found open the VS file and replace the wrong filename there with the correct one

VS should open

In order to compile, add the correct include and library directories to the VS project.
Finally you can build deviceQuery or any other program.

Still this setup gave me the same error as the precompiled deviceQuery:
cudaGetDeviceCount FAILED CUDA Driver and Runtime version may be mismatched.

With the help of the DependencyWalker i found out that a missing DLL was the problem, namely:
linkinfo.dll.

You can get this by adding the feature named "Desktop Experience" through the server manager.
Once installed and rebooted the device query worked.

Friday, September 3, 2010

TinyGPU offers new hardware

TinyGPU has new hardware: tg010. The hardware configuration and the currently deployed software are different to the non-Fermi nodes:

Ubuntu 10.04 LTS (instead of 8.04 LTS) as OS.
Note: For using the Intel Compiler <= 11.1 locally on tg010, you have to use gcc/3.3.6 Module [currently]. If not, libstdc++.so.5 is missing , as Ubuntu 10.04 does no longer contain this version. This is necessary only for compilation. Compiled Intel binaries will run as expected.

/home/hpc and /home/vault are mounted [only] through NFS (and natively via GPFS-Cross-Cluster-Mount)

Dual-Socket-System with Intel Westmere X5650 (2.66 GHz) processor, having 6 native cores per socket (instead of Dual-Socket-System with Intel Nehalem X5550 (2.66 GHz), having 4 native cores per socket)

48 GB DDR3 RAM (instead of 24 GB DDR3 RAM)

1x NVidia Tesla C250 (“Fermi” with 3 GB GDDR5 featuring ECC)

1x NVidia GTX 280 (Consumer-Card with 1 GB RAM – formerly know as F22)

2 further PCIe2.0 16x slots will be equipped with NVidia C2070 Cards (“Fermi” with 6 GB GDDR5 featuring ECC) in Q4, instead of 2x NVidia Tesla M1060 (“Tesla” with 4 GB RAM) as in the remaining cluster nodes

SuperServer 7046GT-TRF / X8DTG-QF with dual Intel 5520 (Tylersburg) chipset instead of SuperServer 6016GT-TF-TM2 / X8DTG-DF with Intel 5520 (Tylersburg) chipset

To allocate the fermi node, specify :ppn=24 with your job (instead of :ppn=16) and explicitly submit to the TinyGPU-Queue fermi. The wallclock limit is set to the default of 24h . The ECC Memory status is shown on job startup.
This article tries to be a translation from the original posted here: Zuwachs im TinyGPU-Cluster

Thursday, September 24, 2009

PCI express pinned Host Memory

Retesting my benchmarks with the current release of Cuda 2.3 I finally incorporated new features like pinned host memory allocation. Specs say that this improves the host to device transfers and vice versa.
Due to the special allocation the arrays will stay at the same location in memory , will not be swapped and are faster available for DMA transfers. In the other case, most data is first copied to a pinned memory buffer and then to the ordinarily allocated memory space. This detour is omitted in this case here.

The performance plot shows, that pinned memory now offers a performance of up to 5.9 GB/s on the fastest currently available PCIe X16 Gen 2 Interface which has a peak transfer rate of 8 GB/s. This corresponds to 73% of Peak performance with almost no optimization applied. In contrast, optimization such as a blocked data transfer, which prooved to increase performance some time ago [PCIe revisited] have no positive effect on performance anymore.

Using only the blocked optimzations without pinned memory still is better then doing an unblocked transfer from unpinned memory, but it only transfers about 4.5 GB/s which corresponds to 56 % of peak to the device.
Reading from the device is far worst with only 2.3 GB/s.

PCIe Bandwidth Measurements GTX280 using pinned Memory

PCIe Bandwidth Measurements GTX280 using pinned Memory

Thursday, July 23, 2009

Cuda 2.3 released

NVIDIA just released Cuda Version 2.3 with the corresponding driver.
F22 @RRZE has already been updated to support this Version.

Johannes Habich

Pages