NVIDIA Tegra K1 Processor and Jetson TK1 – A short introduction

In the last few years,  more and more technologies have been devoted to mobile devices; just think, for example, to augmented reality and mobile health apps for smartphones or devices for assisted driving.

The market of mobile applications  is continuously expanding and requires increasingly higher computing capacity.

For example, in order to estimate some market numbers, consider that according to a Research2Guidance’s forecast, 3.4 billion people will possess  a smartphone by 2017, and half of them will use mobile health app.
To meet the needs of an ever-increasing computing power on devices that require significant energy efficiency, NVIDIA has released in the first quarter of 2014 the mobile processor Tegra® K1.


NVIDIA Tegra K1 Mobile Processor (32 bit)


The Tegra K1 processor is a real  breakthrough for mobile graphics and computing; it features a Kepler GPU with 192 cores, an NVIDIA 4-plus-1 quad-core ARM Cortex-A15 CPU, integrated video encoding and decoding support, image/signal processing, and many other system-level features.
It’s the only mobile processor today that supports CUDA 6 for computing and full desktop OpenGL 4.4 and DirectX 11 for graphics.

Tegra K1 is a parallel processor capable of over 300 GFLOP/s of 32-bit floating point computation with a great power efficiency: it consumes less than two watts.


Jetson TK1 Development Kit


In order to allow developers to work out solutions with the Tegra K1, NVIDIA release the Jetson TK1 development kit that is a  full-featured platform for Tegra K1 embedded applications.


Jetson TK1 Details


It is is a 5″ wide by 5″ long PC board with a Tegra K1 processor, 2 GB of RAM, 16 GB 4.51 eMMC memory, and the following peripherals and ports:

  • 1 Half mini-PCIE slot
  • 1 Full size SD/MMC connector
  • 1 Full-size HDMI port
  • 1 USB 2.0 port, micro AB
  • 1 USB 3.0 port, A
  • 1 RS232 serial port
  • 1 ALC5639 Realtek Audio codec with Mic in and Line out
  • 1 RTL8111GS Realtek GigE LAN
  • 1 SATA data port
  • SPI 4MByte boot flash

It runs Linux For Tegra (L4T), a modified Ubuntu 13.04 Linux distribution, provided with the CUDA Toolkit, OpenGL 4.4 drivers, and the NVIDIA VisionWorks Toolkit.


Jetson TK1 Block Diagram


DeviceQuery and Benchmark

Following is the result of the CUDA DeviceQuery performed on Jetson Platform:

Detected 1 CUDA Capable device(s)

Device 0: "GK20A"
CUDA Driver Version / Runtime Version            6.0 / 6.0
CUDA Capability Major/Minor version number:      3.2
Total amount of global memory:                   1746 MBytes (1831051264 bytes)
( 1) Multiprocessors, (192) CUDA Cores/MP:       192 CUDA Cores
GPU Clock rate:                                  852 MHz (0.85 GHz)
Memory Clock rate:                               924 Mhz
Memory Bus Width:                                64-bit
L2 Cache Size:                                   131072 bytes
Maximum Texture Dimension Size (x,y,z):          1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers:   1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers:   2D=(16384, 16384), 2048 layers
Total amount of constant memory:                 65536 bytes
Total amount of shared memory per block:         49152 bytes
Total number of registers available per block:   32768
Warp size:                                       32
Maximum number of threads per multiprocessor:    2048
Maximum number of threads per block:             1024
Max dimension size of a thread block (x,y,z):    (1024, 1024, 64)
Max dimension size of a grid size    (x,y,z):    (2147483647, 65535, 65535)
Maximum memory pitch:                            2147483647 bytes
Texture alignment:                               512 bytes
Concurrent copy and kernel execution:            Yes with 1 copy engine(s)
Run time limit on kernels:                       No
Integrated GPU sharing Host Memory:              Yes
Support host page-locked memory mapping:         Yes
Alignment requirement for Surfaces:              Yes
Device has ECC support:                          Disabled
Device supports Unified Addressing (UVA):        Yes
Device PCI Bus ID / PCI location ID:             0 / 0

In order to understand the performance of Tegra K1 we provide you a benchmark,  comparing  the execution time of a CUDA Particle Image Velocimetry Simulation on Jetson DevKit and a Geforce GT 540M for notebook.

Particle Image Velocimetry – Execution Time

As you can see, the performace of the mobile processor is comparable to the one of notebook, making the K1 the most powerful mobile processor actually in commerce.

Leave a Reply

Your email address will not be published. Required fields are marked *