Tutorial: Debugging with Intel® Distribution for GDB*
Consider the array-transform.cpp example again:
54 h.parallel_for(data_range, [=](id<1> index) { 55 size_t id0 = GetDim(index, 0); 56 int element = in[index]; // breakpoint-here 57 int result = element + 50; 58 if (id0 % 2 == 0) { 59 result = result + 50; // then-branch 60 } else { 61 result = -1; // else-branch 62 } 63 out[index] = result; 64 });
Start the debugger, set two breakpoints inside the kernel (one for each conditional branch) as follows:
break 59
Expected output:
Breakpoint 1 at 0x40583c: file /path/to/array-transform.cpp, line 59.
break 61
Expected output:
Breakpoint 2 at 0x40584a: file /path/to/array-transform.cpp, line 61.
Execute the run gpu command - you should see the following output:
Starting program: /path/to/array-transform gpu [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". [New Thread 0x7ffff37dc700 (LWP 9479)] intelgt: attached to device with id 0x5927 (Gen9) intelgt: inferior 2 (gdbserver-gt) created for process 9568. [New Thread 0x7fffe21e9700 (LWP 9599)] [SYCL] Using device: [Intel® Graphics Gen9] from [Intel® Level-Zero] [New Thread 1073741824] [New Thread 1073741888] [New Thread 1073742080] [New Thread 1073742144] [New Thread 1073742336] [New Thread 1073745920] [New Thread 1073746176] [New Thread 1073746432] [Switching to Thread 1073741824 lane 1] Thread 2.2 hit Breakpoint 2, with SIMD lanes [1 3 5 7], main::$_1::operator()<omitted> at array-transform.cpp:61 61 result = -1; // else-branch
The auto-attach mechanism is triggered and gdbserver-gt begins to listen to GPU events as indicated in the output above.
Check the presence of gdbserver-gt as follows:
info inferiors
Expected output:
Num Description Connection Executable 1 process 9463 1 (native) <path_to_array-transform> * 2 Remote target 2 (remote gdbserver-gt --attach - 9463)
The breakpoint event is received from the gdbserver-gt process. The thread id 2.2:1 points to the thread 2 of the inferior 2 and indicates that the first active SIMD lane is now in focus.
The breakpoint at line 61 is hit first (the order of branch execution is defined by the Intel® Graphics Compiler).
Check which SIMD lanes are currently active with the following command:
info threads
In the example, thread 2.2 has 4 active SIMD lanes: 1, 3, 5, and 7. The asterisk * marks the current SIMD lane. See the expected output below.
Id Target Id Frame 1.1 Thread <id omitted> <frame omitted> 1.2 Thread <id omitted> <frame omitted> 2.1 Thread 1610612736 (inactive) * 2.2:1 Thread 1073741824 <frame> at array-transform.cpp:61 2.2:[3 5 7] Thread 1073741824 <frame> at array-transform.cpp:61 2.3:[1 3 5 7] Thread 1073741888 <frame> at array-transform.cpp:61 2.4:[1 3 5 7] Thread 1073742080 <frame> at array-transform.cpp:61 2.5:[1 3 5 7] Thread 1073742144 <frame> at array-transform.cpp:61 2.6:[1 3 5 7] Thread 1073742336 <frame> at array-transform.cpp:61 2.7:[1 3 5 7] Thread 1073745920 <frame> at array-transform.cpp:61 2.8:[1 3 5 7] Thread 1073746176 <frame> at array-transform.cpp:61 2.9:[1 3 5 7] Thread 1073746432 <frame> at array-transform.cpp:61
To switch focus to a different SIMD lane, use the thread <thread_ID> command. Thread ID is specified by a triple: inferior.thread:lane. See examples of working with particular lanes:
thread 2.2:3
Example output:
[Switching to thread 2.2:3 (Thread 1073741824 lane 3)] #0 main::$_1::operator()<omitted> at array-transform.cpp:61 61 result = -1; // else-branch
print element
Example output:
$1 = 103
thread 2.2:5
Example output:
[Switching to thread 2.2:5 (Thread 1073741824 lane 5)] #0 main::$_1::operator()<omitted> at array-transform.cpp:61 61 result = -1; // else-branch
print element
Example output:
$2 = 105
thread :5
Expected output:
[Switching to thread 2.2:5 (Thread 1073741824 lane 5)] #0 main::$_1::operator()<omitted> at array-transform.cpp:61 61 result = -1; // else-branch
As you are now inside the kernel running on the GPU, you can get the GPU assembly code by executing the disassemble command. See an example output below:
Dump of assembler code for function _ZTSN2cl4sycl6kernelE(...): 0x00000000fffad000 <+0>: mov (1|M0) null<1>:ud 0xC72C169A:ud 0x00000000fffad010 <+16>: (W) mov (8|M0) r22.0<1>:ud r0.0<1;1,0>:ud 0x00000000fffad020 <+32>: (W) or (1|M0) cr0.0<1>:ud cr0.0<0;1,0>:ud 0x4C0:uw {Switch} 0x00000000fffad030 <+48>: (W) mov (8|M0) r9.0<1>:w 0x76543210:v 0x00000000fffad040 <+64>: (W) and (1|M0) r8.1<1>:d r22.5<0;1,0>:d 511:w 0x00000000fffad050 <+80>: (W) mul (1|M0) r8.1<1>:d r8.1<0;1,0>:d 0xC440:uw 0x00000000fffad060 <+96>: (W) add (1|M0) r8.2<1>:d r8.1<0;1,0>:d 0x8440:uw 0x00000000fffad070 <+112>: mov (8|M0) r9.0<1>:d r9.0<8;8,1>:uw 0x00000000fffad080 <+128>: mul (8|M0) r10.0<1>:d r9.0<8;8,1>:d 8:w
To display a list of GPU registers, run the following command:
info reg
Additionally, inspect the execution mask ($emask$ register), which shows active lanes. To print the result in binary format, use the /t format flag as follows:
print/t $emask
Example output:
$3 = 10101010
Recall that you have stopped at line 61: the else-branch of the condition that checks evenness of the work item index. Hence, every other SIMD lane is inactive, as indicated by the $emask bit pattern.
To move forward and stop at the then-branch, set the scheduler-locking mode to step and execute the next command:
set scheduler-locking step
next
You should see the following output:
[Switching to SIMD lane 0] Thread 2.2 hit Breakpoint 1, with SIMD lanes [0 2 4 6], main::$_1::operator()<omitted> at array-transform.cpp:59 59 result = result + 50; // then-branch
Due to the breakpoint event, the SIMD lane focus switches to the first active lane in the then-branch, which is SIMD lane 0. Other threads of inferior 2 stayed at the line 61:
info threads 2.*
Example output:
Id Target Id Frame 2.1 Thread 1610612736 (inactive) * 2.2:0 Thread 1073741824 <frame> at array-transform.cpp:59 2.2:[2 4 6] Thread 1073741824 <frame> at array-transform.cpp:59 2.3:[1 3 5 7] Thread 1073741888 <frame> at array-transform.cpp:61 2.4:[1 3 5 7] Thread 1073742080 <frame> at array-transform.cpp:61 2.5:[1 3 5 7] Thread 1073742144 <frame> at array-transform.cpp:61 2.6:[1 3 5 7] Thread 1073742336 <frame> at array-transform.cpp:61 2.7:[1 3 5 7] Thread 1073745920 <frame> at array-transform.cpp:61 2.8:[1 3 5 7] Thread 1073746176 <frame> at array-transform.cpp:61 2.9:[1 3 5 7] Thread 1073746432 <frame> at array-transform.cpp:61
Since the thread is vectorized, you can also inspect the vector of a local variable:
x /8dw &result
Example output:
0x7fffe3f972c0: 150 -1 152 -1 0x7fffe3f972d0: 154 -1 156 -1
To investigate all active SIMD lanes at once, use the thread apply command:
thread apply 2.2 print element
Example output:
Thread 2.2:0 (Thread 1073741824 lane 0): $4 = 100
You can specify an SIMD lane as a number:
thread apply 2.2:2 print element
Example output:
Thread 2.2:2 (Thread 1073741824 lane 2): $5 = 102
You can also specify an SIMD lane as a range. In this case, only active SIMD lanes from the range are considered:
thread apply 2.2:2-5 print element
Example output:
Thread 2.2:2 (Thread 1073741824 lane 2): $6 = 102 warning: SIMD lane 3 is inactive in thread 2.2 Thread 2.2:4 (Thread 1073741824 lane 4): $7 = 104 warning: SIMD lane 5 is inactive in thread 2.2
To denote all active SIMD lanes, use the wildcard:
thread apply 2.2:* print element
Example output:
Thread 2.2:0 (Thread 1073741824 lane 0): $8 = 100 Thread 2.2:2 (Thread 1073741824 lane 2): $9 = 102 Thread 2.2:4 (Thread 1073741824 lane 4): $10 = 104 Thread 2.2:6 (Thread 1073741824 lane 6): $11 = 106
To apply the command to all active SIMD lanes of all threads, use all-lanes parameter:
thread apply all-lanes print element
Example output:
Thread 2.9:7 (Thread 1073741888 lane 7): $12 = 155 Thread 2.9:5 (Thread 1073741888 lane 5): $13 = 153 <...> Thread 2.2:2 (Thread 1073741824 lane 2): $42 = 102 Thread 2.2:0 (Thread 1073741824 lane 0): $43 = 100 Thread 1.2 (Thread 0x7ffff26dc700 (LWP 30173) "array-transform"): No symbol "element" in current context.
You can mix SIMD lane ranges with thread ranges and the thread wildcard. For example, to apply the command to all active lanes of all threads of inferior 2, you can use any of the following commands:
thread apply 2.2-9:*
thread apply 2.*:*
If the current inferior is 2, the inferior number can be skipped:
thread apply 2-9:*
thread apply *:*
You can define a set of actions for a breakpoint to be executed when the breakpoint is hit. By default, the actions are executed in the context of the SIMD lane selected after the hit.
Start a new debugging session and define two temporary breakpoints with actions for if and else branches:
tbreak 61
Example output:
Temporary breakpoint 1 at 0x40584a: file /path/to/array-transform.cpp, line 61.
commands
When you are asked to type commands, enter the following:
print element
end
tbreak 59
Example output:
Temporary breakpoint 2 at 0x40583c: file /path/to/array-transform.cpp, line 59.
commands /a
When you are asked to type commands, enter the following:
print element
end
Start the program:
run gpu
Example output:
<...omitted...> Thread 2.2 hit Temporary breakpoint 1, with SIMD lanes [1 3 5 7], main::$_1::operator()<omitted> at array-transform.cpp:61 61 result = -1; // else-branch $1 = 101
Continue to hit both breakpoints:
continue
Example output:
Continuing. [Switching to SIMD lane 0] Thread 2.2 hit Temporary breakpoint 2, with SIMD lanes [0 2 4 6], main::$_1::operator()<omitted> at array-transform.cpp:59 59 result = result + 50; // then-branch $2 = 100 $3 = 102 $4 = 104 $5 = 106
The action for the breakpoint at the else branch was executed for a single SIMD lane 1, while the action at the then branch was executed for all active SIMD lanes.
Quit the debugging session and start the program from the beginning. This time set a breakpoint at line 59 with the condition element==106:
(gdb) break 59 if element == 106
Example output:
Breakpoint 1 at 0x40583c: file /path/to/array-transform.cpp, line 59.
Run the program (execute the run gpu command) and check if the output looks as follows:
Starting program: <path_to_array-transform> gpu <omitted> [Switching to Thread 1073741824 lane 6] Thread 2.2 hit Breakpoint 1, with SIMD lane 6, main::$_1::operator()<omitted> at array-transform.cpp:59 59 result = result + 50; // then-branchThe condition is true for the lane 6 in thread 2.2.