GitXplorerGitXplorer
d

learning-gpgpu

public
2 stars
0 forks
0 issues

Commits

List of commits on branch main.
Unverified
c1779676dc794802662c0399d40b1f4f18fe6712

src: add some comments and error handling to enqueue

ddanbev committed 4 years ago
Unverified
c6ee5c25bf7cbdf39b02c7946788004df364129f

doc: update OpenCL installation instructions

ddanbev committed 4 years ago
Unverified
24b3a1fd736d83dc436979f7bfc9a31f59fea80e

doc: add cpu architecture notes (wip)

ddanbev committed 4 years ago
Unverified
762566642840335b8ed7fa9d9af9d21eb82adc66

doc: add programming models section

ddanbev committed 4 years ago
Unverified
f69e8172de8d11355307c83ce1eba6f93d66f0c6

src: add more error handling and clarify code

ddanbev committed 4 years ago
Unverified
09219bbed87e3458a737089023b81fa5c46e1330

src: add error handling to context creation

ddanbev committed 4 years ago

README

The README file for this repository.

Learning General Purpose GPU

This project contains notes and code related to gpgpu.

Architecture

+---------------------------------------------------------------------------+
|                          GPU                                              |
| +---------------------------------------------------------------+         |
| |                     Core 0                                    |         | 
| | +------------------------------------------------------+      |         |
| | |    Fetch + Decode instruction from GPU device memory |----+ |         |
| | +------------------------------------------------------+    | |         |
| |     ↓              ↓            ↓            ↓              | |         |
| | +--------+    +---------+   +---------+                     | |         |
| | |  ALU   |    |  ALU    |   | ALU     |     ...             | |         |
| | +--------+    +---------+   +---------+                     | |   ...   |
| | +--------+    +---------+   +---------+                     | |         |
| | |Register|    |Registers|   |Registers|                     | |         |
| | +--------+    +---------+   +---------+                     | |         |
| |                                                             | |         |
| | +------------------------------------------------------+    | |         |
| | |       GPU Memory                                     |<---+ |         |
| | |                                                      |      |         |
| | |                                                      |      |         |
| | |                                                      |      |         |
| | +------------------------------------------------------+      |         |
| +---------------------------------------------------------------+         |
+--------------------------------==-----------------------------------------+

So the GPU will read instructions from its memory and decode them. Then it will pass the instrution to all the ALU's, so they will all be passed the same instruction. But each ALU has registers that are separate from each other, so the values that the instruction operated on can be different (Single Instruction Multiple Data).

work_in_progress

Programming models

CPU support

$ lspci | grep Graph
00:02.0 VGA compatible controller: Intel Corporation UHD Graphics 620 (rev 07)

So my machine has an integrated graphics (controller) which means that the CPU and GPU are on the same chip.

$ clinfo
  Platform Name                                   Intel(R) OpenCL HD Graphics
Number of devices                                 1
  Device Name                                     Intel(R) Gen9 HD Graphics NEO
  Device Vendor                                   Intel(R) Corporation
  Device Vendor ID                                0x8086
  Device Version                                  OpenCL 2.1 NEO 
  Driver Version                                  20.28.17293
  Device OpenCL C Version                         OpenCL C 2.0 
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               24
  Max clock frequency                             1150MHz
  Device Partition                                (core)
    Max number of sub-devices                     0
    Supported partition types                     None
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             256x256x256
  Max work group size                             256
  Preferred work group size multiple              32
  Max sub-groups per work group                   32
  Sub-group sizes (Intel)                         8, 16, 32

The one I've got has 24 Execution/Compute units.