
- #Nvidia cuda driver windows 8.1 install
- #Nvidia cuda driver windows 8.1 drivers
- #Nvidia cuda driver windows 8.1 update
- #Nvidia cuda driver windows 8.1 driver
- #Nvidia cuda driver windows 8.1 for windows 10
There is a ‘nvidia-smi.ex_’ - I tried out renaming it and starting it, but it complained that it doesn’t match the OS (Windows 7 圆4).
#Nvidia cuda driver windows 8.1 driver
Can anyone verify this who has a Titan X using Windows with the 369.29 (ODE) driver for Quadro K6000, there is no ‘nvidia-smi.exe’ more installed - I searched for it.
#Nvidia cuda driver windows 8.1 for windows 10
I assume that for Windows 10 one can put these in TCC driver mode as well, and I also assume that WDDM 2.0 will not interfere with the TCC driver mode. Putting the Geforce Titan X (either Maxwell or Pascal) in TCC driver mode works wonders for my Windows 7 and Windows 8.1 systems. I know that some Quadro GPUs support the TCC driver mode, have you tried that for the K6000?
#Nvidia cuda driver windows 8.1 drivers
I couldn’t get newer drivers for Tesla K40c from the NVIDIA website.Ĭonclusion for me is that it seems to be a Geforce & Quadro issue, at least for now Tesla seems not to be affected. Also the Forceware driver 354.99 for the Tesla K40c works fine, no performance regression with the Tesla. On a Windows 8.1 Pro 圆4 system with Tesla K40c and with the 348.40 driver (TCC mode), the cudaMallocPitch runtime is OK (equal to Geforce cards with driver <= 350.12). After installation of the Forceware driver version 369.26 (ODE), I get the same bad runtime results as for the Geforce cards with driver version 372.20 (even slightly worse, cudaMallocPitch takes ~ 12 ms for a 400 MB image - maybe because the K6000 has more RAM than the 770). On a Windows 7 圆4 system with Quadro K6000 and with the ForceWare 347.62, the cudaMallocPitch function runtime is OK (equal to the Geforce cards runtime with a driver <= 350.12). I made additionally some tests on other systems regarding the significant performance regression in the cudaMalloc(Pitch) function and the (less significant) performance regression in the cudaFree function. Update: A bug report was filed to NVIDIA. Note: The issue seems to occur also on windows 7 圆4 system, and I think also on Quadro K6000 cards (but not 100% sure) 250 milliseconds on newest driver) - wondering whether there is some relation between this observation and the slowdown of the allocation routines. Note this seems to have been reported also inother posting, see the runtime numbers in thread Īdditional note: The cuda context overhead seems to have decreased significantly between driver version 350.12 and 372.70 (2.2 seconds on older driver vs. Unfortunately, this leaves us in a complicated situation, cause we either can use an older driver (meant for Windows 7/8 (!)) which does not support Pascal cards, or a newer driver which supports Pascal but where the slower cudaMalloc routine eats up a significant part of the speedup due to GPU acceleration …
#Nvidia cuda driver windows 8.1 install
I couldn’t install driver version 352.86, so I don’t know whether the slowdown is already in that version. Actually, the slowdown for this driver version, and also for driver version 355.82, is even much worse than for the 372.70 driver, so it seems that this issue has already been partially adressed. I made also experiments with other drivers (downloaded from guru3d): The slowdown seems to occur at least since driver version 353.49 (I took the windows 10 圆4 driver from ). Whereas for the old driver, the cudaMallocPitch always roughly takes a constant amount of time, regardless of the size of the allocated buffer. One can see with the new driver that cudaMallocPitch got slower by a factor of 5 - 20 (!) for images in the range between 20 and 400 MB. All times are in milliseconds (ms), for ‘Release’ configuration of Visual studio project.ĬudaMallocPitch for a image of size 1 MB / 20 MB / 400 MB : 0.6 ms / 0.3 ms / 0.4 msĬudaFree for a image of size 1 MB / 20 MB / 400 MB: 0.1 ms / 0.4 ms / 1.2 msĬudaMallocPitch for a image of size 1 MB / 20 MB / 400 MB : 0.5 ms / 1.5 ms / 9 msĬudaFree for a image of size 1 MB / 20 MB / 400 MB: 0.2 ms / 0.5 ms / 2 ms It can be installed also my Windows 10 圆4 system, and seems to works fine (but does not support Pascal generation cards). The 372.70 Windows 10 圆4 driver was taken from the NVIDIA website, whereas for 350.12 I took the Windows 7/8 driver 圆4 (!) from. Cuda Toolkit 7.0 is used and Visual Studio 2013 64-bit. In the following some measure runtime numbers for my GTX 960 (GTX 770 shows the same behaviour). The ‘cudaFree’ routine also got slower for big buffers, but not that much than the cudaMallocPitch function. Profiling revealed that the issue is that in newer drivers (in some driver version > 350.12 and <= 353.49 the ‘cudaMallocPitch’ (and I suppose also the cudaMalloc routine) got slower by a significant factor, which grows with the size of the allocation.
#Nvidia cuda driver windows 8.1 update
I investigated a significant slowdown of our GPU-accelerated software after a driver update from Forceware 347.XX to the newest driver, Forceware 372.70. I have a windows 10 圆4 system, with two GPUs (Geforce 960, Geforce 770).
