Implementation and comparison of the Sobel operator on CPU and GPU using CUDA

Автор: Spiridonov K.A., Stulov I.S., Ferapontov I.A.

Журнал: Международный журнал гуманитарных и естественных наук @intjournal

Рубрика: Технические науки

Статья в выпуске: 10-5 (97), 2024 года.

Бесплатный доступ

This article examines the Sobel operator, which is used to highlight contours in images. Special attention is paid to two variants of its implementation: on the central processing unit (CPU) and on the graphics processor (GPU). The paper discusses in detail the technical aspects of the implementation of the Sobel method on the GPU, including the features of optimization and distribution of calculations on the graphics architecture. In addition, a comparative analysis of the method's performance is performed when it is performed on the CPU and GPU, which allows you to evaluate the efficiency of using the GPU for such tasks. The article also focuses on key aspects of algorithm development using the CUDA programming language, which is designed for parallel computing on GPUs.

Еще

Cpu, gpu, cuda, sobel operator

Короткий адрес: https://sciup.org/170207083

IDR: 170207083   |   DOI: 10.24412/2500-1000-2024-10-5-66-69

Текст научной статьи Implementation and comparison of the Sobel operator on CPU and GPU using CUDA

One of the most important convolutions is the calculation of derivatives. Derivatives play a very important role in mathematics and physics, and the same can be said about computer vision. The images we work with consist of pixels, which, for a grayscale image, set the brightness value. That is, our picture is just a two – dimensional matrix of numbers. Therefore, the derivative in the field of working with images is the ratio of the value of the pixel increment in y to the value of the pixel increment in x.

Working with image A, we work with a function of two variables A(x,y), i.e. with a scalar field. Therefore, it is more correct to speak not about the derivative, but about the gradient of the image.

The operator calculates the brightness gradient of the image at each point. This is the direction of the greatest increase in brightness and the magnitude of its change in this direction. The result shows how "sharply" or "smoothly" the brightness of the image changes at each point, which means that the probability of finding a point on the edge, as well as the orientation of the border. In practice, calculating the magnitude of the brightness change (the probability of belonging to a face) is more reliable and easier to interpret than calculating the direction.

One such convolution is the Sobel operator. This operator is used in computer vision to highlight boundaries. To apply the Sobel operator, we use two matrices:

Ml

I-

0-11

0-2|*Л

0-1J

0   0*4

  • -2-1J

where * - convolution operation.

CPU Implementation:

void apply_sobel_operator(uint8_t *img, int width, int height, int channels, uint8_t *res, int8_t

Wx[][3], int8_t Wy[][3]) { double Gx, Gy, grad;

for (int i = 0; i < width; ++i) { for (int j = 0; j < height; ++j) {

Gx = 0; Gy = 0;

for (int u = -1; u <= 1; ++u) { for (int v = -1; v <= 1; ++v) { int ip = max(min(i + u, width-1), 0), jp = max(min(j + v, height-1), 0);

double pix = rgb_to_gray(img[(jp * width + ip) * channels], img[(jp * width + ip) * channels + 1],                       img[(jp * width + ip) * channels + 2]);

Gx += Wx[u+1][v+1] * pix; Gy += Wy[u+1][v+1] * pix;

}

} grad = min(255., sqrt(Gx * Gx + Gy * Gy));

res[(j * width + i) * channels] = static_cast(grad);

res[(j * width + i) * channels + 1] = static_cast(grad);

res[(j * width + i) * channels + 2] = static_cast(grad);

res[(j * width + i) * channels + 3] = img[(j * width + i) * channels + 3];

}

}

}

The implementation on the CPU does not have any particularly unique or advanced features. One area where improvements could be made is in the matrix multiplication process. By optimizing the way the image matrix is stored, we could potentially reduce the number of cache misses, thereby enhancing performance. However, achieving this would necessitate preprocessing the image, which would in turn require additional memory resources.

GPU Implementation:

__constant__ char Wx[3][3], Wy[3][3];

__global__ void apply_sobel_operator(cudaTextureObject_t img, uchar4 *res, int width, int height) { double Gx, Gy, grad, pix;

uchar4 p;

for(int y = idy; y < height; y += off_y)

for(int x = idx; x < width; x += off_x) {

Gx = 0; Gy = 0;

for (int u = -1; u <= 1; ++u) { for (int v = -1; v <= 1; ++v) { p = tex2D(img, x + u, y + v);

pix = 0.299 * p.x + 0.587 * p.y + 0.114 * p.z;

Gx += Wx[u+1][v+1] * pix;

Gy += Wy[u+1][v+1] * pix;

}

} grad = min(255., sqrt(Gx * Gx + Gy * Gy));

res[y * width + x] = make_uchar4(grad, grad, grad, p.w);

}

}

A little bit about constant memory

Constant memory is the fastest GPU available. A distinctive feature of constant memory is the ability to write data from the host, but at the same time, only reading from this memory is possible within the

GPU, which determines its name. The __constant__ specifier is provided for storing data in constant memory.

If it is necessary to use an array in constant memory, then its size must be specified in advance, since dynamic allocation, unlike global memory, is not supported in constant memory. To write from the host to the constant memory, the cudaMemcpyToSymbol function is used, and to copy from the device to the cudaMemcpyFromSymbol host, as you can see, this approach is somewhat different from the approach when working with global memory.

To write in constant memory, use these functions:

cudaMemcpyToSymbol(Wx, host_Wx, 9);

cudaMemcpyToSymbol(Wy, host_Wy, 9);

Benchmarks and results:

Table 1. Benchmark

Configuration

Execution time, ms

CPU

1.902

48.823

200.103

853.682

5218.232

1x1, 32x1

0.618

11.989

65.734

232.912

1308.420

1x1, 32x32

0.179

3.102

15.083

49.431

299.033

32x32, 32x8

0.111

0.732

2.682

10.001

58.932

32x32, 32x32

0.157

0.973

3.992

12.783

59.562

64x64, 32x8

0.204

1.291

4.712

11.421

60.058

64x64, 32x32

0.361

1.401

4.302

16.103

62.842

Size of test

100x100

500x500

1000x1000

2000x2000

5000x5000

Results:

Figure 1. Original picture

Figure 2. The Sobel operator applied to that image

Список литературы Implementation and comparison of the Sobel operator on CPU and GPU using CUDA

  • Гонсалес Р., Вудс Р. Цифровая обработка изображений. - 3-е изд. - Москва: Техносфера, 2012. - 1104 с. EDN: SDTUTF
  • Кормен Т.Х., Лейзерсон Ч.Э., Ривест Р.Л., Штайн К. Алгоритмы: построение и анализ. - 3-е изд. - Москва: Вильямс, 2013. - 1328 с.
  • Сандерс Дж., Кэндрот Э. Технология CUDA в примерах. - Москва: ДМК Пресс, 2011. - 312 с.
  • Страуструп, Б. Программирование: принципы и практика с использованием C++. - 2-е изд. - М.: Addison-Wesley, 2014. - 1312 с.
  • Керниган Б., Ритчи Д. Язык программирования С. - 2-е изд. - М.: Мир, 1989. - 272 с.
Статья научная