In the last few years, more and more technologies have been devoted to mobile devices; just think, for example, to augmented reality and mobile health apps for smartphones or devices for assisted driving.
The market of mobile applications is continuously expanding and requires increasingly higher computing capacity.
For example, in order to estimate some market numbers, consider that according to a Research2Guidance’s forecast, 3.4 billion people will possess a smartphone by 2017, and half of them will use mobile health app.
To meet the needs of an ever-increasing computing power on devices that require significant energy efficiency, NVIDIA has released in the first quarter of 2014 the mobile processor Tegra® K1.
The Tegra K1 processor is a real breakthrough for mobile graphics and computing; it features a Kepler GPU with 192 cores, an NVIDIA 4-plus-1 quad-core ARM Cortex-A15 CPU, integrated video encoding and decoding support, image/signal processing, and many other system-level features.
It’s the only mobile processor today that supports CUDA 6 for computing and full desktop OpenGL 4.4 and DirectX 11 for graphics.
Tegra K1 is a parallel processor capable of over 300 GFLOP/s of 32-bit floating point computation with a great power efficiency: it consumes less than two watts.
In order to allow developers to work out solutions with the Tegra K1, NVIDIA release the Jetson TK1 development kit that is a full-featured platform for Tegra K1 embedded applications.
It is is a 5″ wide by 5″ long PC board with a Tegra K1 processor, 2 GB of RAM, 16 GB 4.51 eMMC memory, and the following peripherals and ports:
• 1 Half mini-PCIE slot • 1 Full size SD/MMC connector • 1 Full-size HDMI port • 1 USB 2.0 port, micro AB • 1 USB 3.0 port, A • 1 RS232 serial port • 1 ALC5639 Realtek Audio codec with Mic in and Line out • 1 RTL8111GS Realtek GigE LAN • 1 SATA data port • SPI 4MByte boot flash
It runs Linux For Tegra (L4T), a modified Ubuntu 13.04 Linux distribution, provided with the CUDA Toolkit, OpenGL 4.4 drivers, and the NVIDIA VisionWorks Toolkit.
DeviceQuery and Benchmark
Following is the result of the CUDA DeviceQuery performed on Jetson Platform:
Detected 1 CUDA Capable device(s) Device 0: "GK20A" CUDA Driver Version / Runtime Version 6.0 / 6.0 CUDA Capability Major/Minor version number: 3.2 Total amount of global memory: 1746 MBytes (1831051264 bytes) ( 1) Multiprocessors, (192) CUDA Cores/MP: 192 CUDA Cores GPU Clock rate: 852 MHz (0.85 GHz) Memory Clock rate: 924 Mhz Memory Bus Width: 64-bit L2 Cache Size: 131072 bytes Maximum Texture Dimension Size (x,y,z): 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096) Maximum Layered 1D Texture Size, (num) layers: 1D=(16384), 2048 layers Maximum Layered 2D Texture Size, (num) layers: 2D=(16384, 16384), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 32768 Warp size: 32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 1 copy engine(s) Run time limit on kernels: No Integrated GPU sharing Host Memory: Yes Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Disabled Device supports Unified Addressing (UVA): Yes Device PCI Bus ID / PCI location ID: 0 / 0
In order to understand the performance of Tegra K1 we provide you a benchmark, comparing the execution time of a CUDA Particle Image Velocimetry Simulation on Jetson DevKit and a Geforce GT 540M for notebook.
As you can see, the performace of the mobile processor is comparable to the one of notebook, making the K1 the most powerful mobile processor actually in commerce.