DPCT1116
Contents
DPCT1116#
Message#
The <original API> was migrated to <migrated API> for performance, and
<expression> is computed by (UINT_MAX + 1) / ('<argument>' + 1)
. This
migration requires the initial value of <value> to be scaled by multiplying
<expression>, and any usage of value of <value> outside atomic function to
be scaled by dividing <expression>.
Detailed Help#
By default, the migration tool uses the Compare-and-Swap atomic operation to
migrate the atomicInc
and atomicDec
functions. If the atomic API is in a
critical path, performance is not good.
To improve the performance, the migration tool checks the special value of the
second argument of atomicInc
/atomicDec
. If it is 2n – 1 (1 < n <= 32)
,
then the tool uses the atomic add
API to migrate atomicInc
/atomicDec
.
Using atomicInc(addr, val)
as an example, the essence of optimization involves:
Migrate the
atomicInc(addr, val)
function todpct::atomic_fetch_add(addr, step)
, wherestep
is calculated by(UINT_MAX + 1) / (val + 1)
. This strategy exploits the overflow behavior of unsigned integers to mimic the wrapping behavior ofatomicInc/Dec()
.Update the initial value of
*addr
to be scaled by multiplying withstep
, ensuring the wrapping behavior is consistent.Update the references to the value
*addr
outside atomic functions, dividing it bystep
to retrieve the intended value.
Suggestions to Fix#
For example, this original CUDA* code:
1__device__ unsigned int a1 = 5;
2__global__ void kernel(unsigned int *result){
3 ...
4 unsigned int old_val = atomicInc(&a1, 0x7fffffff);
5 ...
6 *result = a1;
7}
results in the following migrated SYCL* code:
1dpct::global_memory<unsigned int, 0> a1(5 * 2);
2void kernel(unsigned int*result, unsigned int &a1){
3 ...
4 /*
5 DPCT1116:0: The atomicInc was migrated to dpct::atomic_fetch_add(&a1, 2) / 2 for performance, and 2 is computed by (UINT_MAX + 1) / ('0x7fffffff' + 1). This migration requires the initial value of 'a1' to be scaled by multiplying 2, and any usage of the value of 'a1' outside atomic function to be scaled by dividing 2.
6 */
7 unsigned int old_val = dpct::atomic_fetch_add<sycl::access::address_space::generic_space>(&a1, 2) / 2;
8 ...
9 *result = a1 / 2; // Dividing a1 by step 2 to get the intended value.
10}
which is rewritten to:
1dpct::global_memory<unsigned int, 0> a1(5 * 2);
2void kernel(unsigned int*result, unsigned int &a1){
3 ...
4 unsigned int old_val = dpct::atomic_fetch_add<sycl::access::address_space::generic_space>(&a1, 2) / 2;
5 ...
6 *result = a1 / 2; // Dividing a1 by step 2 to get the intended value.
7}