SLAM Engineer

๐ŸŒˆ CUDA ํ”„๋กœ๊ทธ๋ž˜๋ฐ: Thrust ์‹ค์Šต 2ํŽธ (Feat. godbolt)


Compiler Explorer (godbolt) ์—์„œ ํŽธํ•˜๊ฒŒ ์—ฐ์Šตํ•˜์ž

ํ‹ฐ์ €

  • โ€ฆ ๋ผ๋Š” ๋งˆ์Œ์ด ๋“ค ๋•Œ
  • ์•„๋ž˜์ฒ˜๋Ÿผ ์‹ค์Šต์„ ํ•˜๋ฉด ๋œ๋‹ค. (์ปดํŒŒ์ผ์ด ๋˜๋Š” ๋™์•ˆ ์ž ์‹œ ๊ธฐ๋‹ค๋ ค์•ผ ํ•œ๋‹ค)

๊ฐœ์š”

  • ์ตœ๊ทผ (2023๋…„ 3๋ถ„๊ธฐ) thrust ๊ฐ€ CCCL (CUDA C++ Core Libraries) ์ด๋ผ๋Š” ์ƒ์œ„ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์— ํŽธ์ž…๋˜์—ˆ๋‹ค.
  • thrust ๋Š” CUDA programming ์„ ์ข€ ๋” modern C++ ์Šค๋Ÿฝ๊ฒŒ ํ•  ์ˆ˜ ์žˆ๊ฒŒ ์ง€์›ํ•ด์ฃผ๋Š” library์ด๋‹ค.
    • ์˜ˆ๋ฅผ ๋“ค์–ด, ๊ณต์‹ ๊นƒํ—ˆ๋ธŒ์— ์žˆ๋Š” ์˜ˆ์ œ๋ฅผ ๊ฐ€์ ธ์™€ ๋ณด์ž๋ฉด, ์ปค์Šคํ…€ ์ปค๋„์„ ์ž‘์„ฑํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค ํ›จ์”ฌ ๊ฐ„๊ฒฐํ•˜๊ณ  ์•ˆ์ „ํ•˜๋‹ค.

    • ps. GPT4์—๊ฒŒ thrust ์™€ ๊ฐ™์€ ๊ณ ์ˆ˜์ค€ API๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ์ข‹์€์ ์— ๋Œ€ํ•ด ์ข€ ๋” ์„ค๋ช…์„ ๋ถ€ํƒํ•ด๋ณด์•˜๋‹ค.

      Thrust์™€ ๊ฐ™์€ ๊ณ ์ˆ˜์ค€ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์€ ์ปค์Šคํ…€ CUDA ์ปค๋„์„ ์ง์ ‘ ์ž‘์„ฑํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค ๋ช‡ ๊ฐ€์ง€ ์ค‘์š”ํ•œ ์žฅ์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ์—ฌ๊ธฐ ๋‘ ๊ฐ€์ง€๋ฅผ ํฌํ•จํ•˜์—ฌ ๋ช‡ ๊ฐ€์ง€ ์žฅ์ ์„ ์„ค๋ช…ํ•ด ๋“œ๋ฆฌ๊ฒ ์Šต๋‹ˆ๋‹ค:

      1. ์ž๋™ ๋™๊ธฐํ™” ๊ด€๋ฆฌ: Thrust์™€ ๊ฐ™์€ ๊ณ ์ˆ˜์ค€ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋Š” ๋‚ด๋ถ€์ ์œผ๋กœ ํ•„์š”ํ•œ ๋™๊ธฐํ™”๋ฅผ ๊ด€๋ฆฌํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ํ”„๋กœ๊ทธ๋ž˜๋จธ๊ฐ€ CUDA ์ŠคํŠธ๋ฆผ๊ณผ ๋™๊ธฐํ™” ๋ฉ”์ปค๋‹ˆ์ฆ˜์— ๋Œ€ํ•ด ์‹ ๊ฒฝ ์“ฐ์ง€ ์•Š์•„๋„ ๋˜๋„๋ก ํ•ด์ค๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ๋ฐ์ดํ„ฐ๊ฐ€ GPU์—์„œ CPU๋กœ ๋˜๋Š” ๊ทธ ๋ฐ˜๋Œ€๋กœ ์ „์†ก๋˜๋Š” ๊ฒฝ์šฐ, Thrust๋Š” ์ ์ ˆํ•œ ๋™๊ธฐํ™” ์ ์„ ์ž๋™์œผ๋กœ ์‚ฝ์ž…ํ•˜์—ฌ ๋ฐ์ดํ„ฐ ์ผ๊ด€์„ฑ์„ ๋ณด์žฅํ•ฉ๋‹ˆ๋‹ค.

      2. ๊ฐ„๊ฒฐํ•˜๊ณ  ์ฝ๊ธฐ ์‰ฌ์šด ์ฝ”๋“œ: Thrust๋Š” C++ ํ‘œ์ค€ ํ…œํ”Œ๋ฆฟ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ(STL)๊ณผ ์œ ์‚ฌํ•œ ์ธํ„ฐํŽ˜์ด์Šค๋ฅผ ์ œ๊ณตํ•˜์—ฌ, ํ”„๋กœ๊ทธ๋ž˜๋จธ๊ฐ€ ์ต์ˆ™ํ•œ ํ˜•์‹์œผ๋กœ GPU ์ฝ”๋“œ๋ฅผ ์ž‘์„ฑํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ค๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ํ•จ์œผ๋กœ์จ ์ฝ”๋“œ์˜ ๊ฐ€๋…์„ฑ๊ณผ ์œ ์ง€ ๊ด€๋ฆฌ๊ฐ€ ํ–ฅ์ƒ๋˜๋ฉฐ, ์˜ค๋ฅ˜ ๋ฐœ์ƒ ๊ฐ€๋Šฅ์„ฑ์ด ์ค„์–ด๋“ญ๋‹ˆ๋‹ค.

      3. ํšจ์œจ์ ์ธ ๋ฉ”๋ชจ๋ฆฌ ๊ด€๋ฆฌ: Thrust๋Š” ์ž๋™ ๋ฉ”๋ชจ๋ฆฌ ๊ด€๋ฆฌ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ํ”„๋กœ๊ทธ๋ž˜๋จธ๊ฐ€ ๋ฉ”๋ชจ๋ฆฌ ํ• ๋‹น๊ณผ ํ•ด์ œ๋ฅผ ์ง์ ‘ ๊ด€๋ฆฌํ•˜๋Š” ๋ณต์žก์„ฑ์„ ํ”ผํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ค๋‹ˆ๋‹ค. ๋˜ํ•œ, Thrust๋Š” ๋ฉ”๋ชจ๋ฆฌ ์ ‘๊ทผ ํŒจํ„ด์„ ์ตœ์ ํ™”ํ•˜์—ฌ GPU์˜ ๋ฉ”๋ชจ๋ฆฌ ๋Œ€์—ญํญ์„ ํšจ์œจ์ ์œผ๋กœ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.

      4. ์„ฑ๋Šฅ ์ตœ์ ํ™”: Thrust๋Š” ๋‚ด๋ถ€์ ์œผ๋กœ ๋‹ค์–‘ํ•œ ์ตœ์ ํ™” ๊ธฐ๋ฒ•์„ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ๋ณ‘๋ ฌ ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๊ตฌํ˜„์—์„œ๋Š” ๋ธ”๋ก ํฌ๊ธฐ, ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ ๋“ฑ ์—ฌ๋Ÿฌ ์š”์†Œ๊ฐ€ ์ž๋™์œผ๋กœ ์ตœ์ ํ™”๋ฉ๋‹ˆ๋‹ค. ์ด๋Š” ํ”„๋กœ๊ทธ๋ž˜๋จธ๊ฐ€ ์ด๋Ÿฌํ•œ ์„ธ๋ถ€ ์‚ฌํ•ญ์— ๋Œ€ํ•ด ์‹ ๊ฒฝ ์“ฐ์ง€ ์•Š๊ณ ๋„ ํšจ์œจ์ ์ธ GPU ์ฝ”๋“œ๋ฅผ ์ž‘์„ฑํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ค๋‹ˆ๋‹ค.

      5. ๋„“์€ ๋ฒ”์œ„์˜ ๋ณ‘๋ ฌ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ง€์›: Thrust๋Š” ์ •๋ ฌ, ๊ฐ์†Œ, ์Šค์บ” ๋“ฑ ๋‹ค์–‘ํ•œ ๋ณ‘๋ ฌ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ํ”„๋กœ๊ทธ๋ž˜๋จธ๋Š” ๋ณต์žกํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‰ฝ๊ฒŒ ๊ตฌํ˜„ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ด๋Š” GPU ํ”„๋กœ๊ทธ๋ž˜๋ฐ์˜ ์ง„์ž… ์žฅ๋ฒฝ์„ ๋‚ฎ์ถ”๋Š” ๋ฐ ๋„์›€์ด ๋ฉ๋‹ˆ๋‹ค.

      6. ํœด๋Œ€์„ฑ๊ณผ ํ˜ธํ™˜์„ฑ: Thrust ์ฝ”๋“œ๋Š” ๋‹ค์–‘ํ•œ CUDA ๋ฒ„์ „๊ณผ GPU ์•„ํ‚คํ…์ฒ˜์—์„œ ํ˜ธํ™˜๋ฉ๋‹ˆ๋‹ค. ์ด๋Š” ํ”„๋กœ๊ทธ๋žจ์ด ๋‹ค์–‘ํ•œ ํ•˜๋“œ์›จ์–ด ๋ฐ ์†Œํ”„ํŠธ์›จ์–ด ํ™˜๊ฒฝ์—์„œ๋„ ์•ˆ์ •์ ์œผ๋กœ ์ž‘๋™ํ•˜๋„๋ก ๋ณด์žฅํ•ฉ๋‹ˆ๋‹ค.

      ์ข…ํ•ฉ์ ์œผ๋กœ, Thrust๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ํ”„๋กœ๊ทธ๋ž˜๋จธ๊ฐ€ ๋ณต์žกํ•œ ํ•˜๋“œ์›จ์–ด ์„ธ๋ถ€ ์‚ฌํ•ญ์— ๋Œ€ํ•ด ๊ฑฑ์ •ํ•˜์ง€ ์•Š๊ณ , ๊ณ ์„ฑ๋Šฅ GPU ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์„ ๋” ๋น ๋ฅด๊ณ  ์‰ฝ๊ฒŒ ๊ฐœ๋ฐœํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • ์•ž์˜ ๋ธ”๋กœ๊ทธ ํŠœํ† ๋ฆฌ์–ผ ์—์„œ ์ง์ ‘ ๋นŒ๋“œํ•˜๊ณ  ์‹ค์Šตํ•˜๋Š” ์˜ˆ์‹œ๋ฅผ ํ•˜๋‚˜ ๋‹ค๋ฃจ์—ˆ์—ˆ๋‹ค.
    • ๊ทธ๋Ÿฐ๋ฐ C++์€ ์ฐธ ์žฌ๋ฐŒ๋Š”๋ฐ ๋Š˜ ๋นŒ๋“œํ™˜๊ฒฝ ๊ตฌ์ถ•ํ•˜๋Š” ๊ฒŒ ๊ท€์ฐฎ๋‹ค. ๊ทธ๋ž˜์„œ ๊ณต๋ถ€ํ•˜๋Š” ๋ฐ๊นŒ์ง€ ๋งˆ์Œ๋จน๊ฒŒ ๋˜๋Š” ๊ฒƒ์ด ์ดˆ๋ณด์ž๋“ค์—๊ฒŒ ์žฅ๋ฒฝ์ธ ๋“ฏ.
    • ๊ฒŒ๋‹ค๊ฐ€ CUDA๊ณต๋ถ€๋ฅผ ํ•˜๊ณ  ์‹ถ์œผ๋ฉด NVIDIA GPU๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ๋จธ์‹ ์—์„œ ํ•ด์•ผ ํ•˜๋‹ˆ, ๋งŒ์•ฝ ์ด๋Ÿฐ ๋จธ์‹ ์„ ๊ฐ€์ง€๊ณ  ์žˆ์ง€ ์•Š๋‹ค๋ฉด ์—ญ์‹œ ์žฅ๋ฒฝ์ด ๋˜๊ฒ ๋‹ค.
  • ๊ทธ๋ž˜์„œ Compiler Explorer ๋ฅผ ์ถ”์ฒœํ•œ๋‹ค.

Compiler Explorer ๋ž€

  • Compiler Explorer ๋Š” Matt Godbolt ๋ผ๋Š” ๋ถ„๊ป˜์„œ ๋งŒ๋“œ์‹  ํˆด๋กœ์จ ์˜จ๊ฐ– ์–ธ์–ด์˜ ์ปดํŒŒ์ผ ๋ฐ ์‹คํ–‰์„ ์ˆ˜ํ–‰ํ•ด์ฃผ๋Š” ์›น ํˆด์ด๋ผ๊ณ  ๋ณด๋ฉด ๋œ๋‹ค.
    • ํ˜„์žฌ ๋‹ค์–‘ํ•œ ์–ธ์–ด๋ฅผ ์ง€์›ํ•˜๊ณ  ์žˆ์ง€๋งŒ ๋ณดํ†ต C/C++ ์œ ์ €๋“ค์ด ๋งŽ์ด ์‚ฌ์šฉํ•˜๊ณ  ์žˆ๋‹ค.
    • ์ƒ์„ธํ•œ ์—ญ์‚ฌ์— ๋Œ€ํ•ด์„œ๋Š” ์ €์ž์ง๊ฐ•์˜ ๋‹ค์Œ ์˜์ƒ์„ ์ถ”์ฒœํ•œ๋‹ค.
    • ์‚ฌ์šฉ์˜ˆ์‹œ
      • C++ Weekly - Ep 332 - C++ Lambda vs std::function vs Function Pointer
        • Lambda vs std::function vs Function Pointer ์˜ ์‹ค์ฒด๋Š” ๋‹ค๋ฅผ๊นŒ? ์— ๋Œ€ํ•ด์„œ ์„ค๋ช…ํ•˜๊ณ  ์‹ถ์„ ๋•Œ,
          • ์ปดํŒŒ์ผ์„ ํ•ด๋ณด๊ณ  ์˜ค๋ฅธ์ชฝ ์ฐฝ์— ๋‚˜ํƒ€๋‚˜๋Š” ํ•ญ๋ชฉ์—์„œ, ์–ด์…ˆ๋ธ”๋ฆฌ ์–ธ์–ด ์ˆ˜์ค€์—์„œ ๋‹ค๋ฅธ ์ด์œ ๋“ค์„ ๋ถ„์„ํ•ด์ค€๋‹ค! ์•„๋ž˜๋Š” ์ข€ ์ „ ์œ ํŠœ๋ธŒ ์บก์ฒ˜:

        • ๊ทธ๋Ÿฐ ์„ค๋ช…๋“ค์„ ํ•  ๋•Œ ์œ ์šฉํ•œ ํˆด์ด ๋˜๊ฒ ๋‹ค.
  • ps. URL๋„ https://godbolt.org/ ์ด๋ฉฐ ๊ทธ๋ž˜์„œ ๊ทธ๋ƒฅ Compiler Explorer๋ผ๊ณ  ์•ˆ๋ถ€๋ฅด๊ณ  godbolt ๋ผ๊ณ  ๋ถˆ๋ฆฌ๊ธฐ๋„ ํ•œ๋‹ค.

์š”์•ฝ

  • ์•”ํŠผ ์—ฌ๊ธฐ์—์„œ ํ•˜๋ ค๊ณ  ํ–ˆ๋˜ ์ด์•ผ๊ธฐ๋Š” Compiler Explorer๋Š” CUDA build (NVCC) and run ๋„ ์ง€์›ํ•œ๋‹ค.

์‹ค์Šต

  • NVCC ์ปดํŒŒ์ผ๋Ÿฌ๋ฅผ ์„ธํŒ…ํ•ด๋‘” ์ด ๋งํฌ ๋กœ ๋“ค์–ด๊ฐ€์ž.
  • ๊ทธ๋ฆฌ๊ณ  ์—ฌ๊ธฐ์— ์žˆ๋Š” ์ฝ”๋“œ ๋ฅผ ๋ณต์‚ฌํ•ด์„œ ๋ถ™์—ฌ๋„ฃ์€ ๋‹ค์Œ์—, ๊ฒฐ๊ณผ๋ฅผ ๊ด€์ฐฐํ•˜์ž.
    • ์ด ์˜ˆ์‹œ๋Š” point cloud ๋ฅผ spherical image ์— projection ํ•˜๋Š” ์˜ˆ์ œ ์ฝ”๋“œ์ด๋‹ค.
      • ์ผ๋ฐ˜์ ์œผ๋กœ ์ˆ˜์‹ญ๋งŒ~์ˆ˜๋ฐฑ๋งŒ ๊ทธ ์ด์ƒ์˜ ํฌ์ธํŠธ์— ๋Œ€ํ•ด CPU์—์„œ ์ด ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•˜๋ ค๋ฉด ๋งค์šฐ ๋Š๋ฆด ๊ฒƒ์ด๋‹ค.
    • ๊ตฌํ˜„์ฒด๋ฅผ ์ข€ ๋” ๋“ค์—ฌ๋‹ค๋ณด์ž.

๊ตฌํ˜„

  • ์ „์ฒด ์ฝ”๋“œ ์ค‘, ์‹ค์ œ ์ž‘์—…ํ•˜๋Š” ๋ถ€๋ถ„์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

    • ์–ด๋ ค์šธ ๊ฒƒ์ด 1๋„ ์—†๋Š” ์ฝ”๋“œ์ด๋‹ค. Functor ์ž‘์„ฑํ•œ ๋‹ค์Œ์— ::transform ์˜ ์ธ์ž๋กœ ๋„ฃ์–ด์ฃผ๋ฉด ๋!
      • ๊ทธ๋Ÿฌ๋ฉด thrust::transform๋Š” array๋ฅผ ์ˆœํšŒํ•˜๋ฉฐ ๋™์ผํ•œ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•˜๋Š” ์•„์ฃผ ์ „ํ˜•์ ์ธ functional programming style ์˜ ๊ตฌํ˜„์ด๋‹ค.
        • ๊ทธ๋Ÿฐ๋ฐ ์ด์ œ GPU์—์„œ์˜ ๋ณ‘๋ ฌ์ฒ˜๋ฆฌ, ๋ธ”๋ก ๋‚˜๋ˆ„๊ธฐ, ๋™๊ธฐํ™”๊ฐ™์€ ๊ฑธ ์•Œ์•„์„œ ๋‹ค ํ•ด์ฃผ๋Š”!
        • ๋‹ด๋Š” ์ž๋ฃŒํ˜•์ด std::vector๊ฐ€ ์•„๋‹ˆ๋ผ thrust::device_vector ๋ผ๋Š” ๊ฑธ ์‚ฌ์šฉํ•œ๋‹ค๋Š” ๊ฒƒ ์ •๋„๋งŒ ์‹ ๊ฒฝ์จ์ฃผ๋ฉด ๋œ๋‹ค.
    • ps. ์œ„์˜ ๊ตฌํ˜„์€ GPT4 ๊ฐ€ ๋‹ค ํ•ด์ค€ ๊ฒƒ์ด๋‹ค! See the ์ž‘์—…๊ธฐ
      • ๋•๋ถ„์— cuda ๋ฐ thrust ์˜ API ์ด๋ฆ„๋“ค์„ ๋‹ค ๋ชจ๋ฅด๋Š” ์ƒํƒœ์—์„œ๋„ ๋ฐ”๋กœ ๊ณต๋ถ€๋ฅผ ์‹œ์ž‘ํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค.
      • Tip: ๋‹ค๋งŒ, ํ• ๋ฃจ์‹œ๋„ค์ด์…˜์ด๋‚˜ ๋นŒ๋“œ๊ฐ€๋Šฅํ•˜์ง€ ์•Š์€ (pseudo-like code) ์ฝ”๋“œ๋ฅผ ์ค„ ๋•Œ๊ฐ€ ์žˆ๋Š”๋ฐ, ๊ทธ๋Ÿฐ ๋ถ€๋ถ„์ด ์ง€๋‚œ ChatGPT ์ถœ์‹œ 1๋…„ ๋™์•ˆ์— ์ฝ”๋“œ ์ž‘์„ฑ์‹œ ๋„์›€๋ฐ›์„ ๋•Œ ์•ฝ๊ฐ„ ์šฐ๋ ค๋˜๋Š” ๋ถ€๋ถ„์ด๊ธด ํ–ˆ๋‹ค (์ถœ์‹œ ์ดํ›„ ๊ณ„์† ๋งŽ์ด ์ข‹์•„์ง€๊ธด ํ•จ). ๊ทธ๋Ÿด ๋•Œ ๊ฒ€์ฆ์šฉ์œผ๋กœ ๋ฐ”๋กœ godbolt ์— ๋ถ™์—ฌ๋„ฃ๊ณ  ๋นŒ๋“œ๊ฐ€ ๋˜๋Š”์ง€ ํ™•์ธํ•˜๋ฉด์„œ ์™”๋‹ค๊ฐ”๋‹ค ํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ๊ณต๋ถ€ํ•˜๋‹ˆ ํšจ์œจ์ด ์ข‹์•˜๋‹ค.

์‹คํ—˜ํ™˜๊ฒฝ

  • ํ˜„์žฌ (2023.12.10) ๋ฌด๋ ค 16G์˜ T4 ์œ„์—์„œ (๋ฌด๋ฃŒ๋กœ) ์‹คํ–‰ํ•ด์ฃผ๊ณ  ์žˆ๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹คโ€ฆ ใ…Žใ…Žใ…Ž
  • ํ™•์ธํ•˜๋ ค๋ฉด (์ด๋Ÿฐ๊ฒƒ๋„ GPT4์—๊ฒŒ ๋ฌผ์–ด๋ณด๋ฉด ๋‹ค ์•Œ๋ ค์ค€๋‹ค)
     // GPU ์†์„ฑ ๊ฐ€์ ธ์˜ค๊ธฐ
      int device;
      cudaGetDevice(&device);
      cudaDeviceProp properties;
      cudaGetDeviceProperties(&properties, device);
      std::cout << "Using GPU: " << properties.name << std::endl;
    
      // CUDA ๋ฒ„์ „ ๊ฐ€์ ธ์˜ค๊ธฐ
      int runtimeVer;
      cudaRuntimeGetVersion(&runtimeVer);
      std::cout << "CUDA Runtime Version: " << runtimeVer / 1000 << "." << (runtimeVer % 100) / 10 << std::endl;
    
      int driverVer;
      cudaDriverGetVersion(&driverVer);
      std::cout << "CUDA Driver Version: " << driverVer / 1000 << "." << (driverVer % 100) / 10 << std::endl;
    
  • ๊ฒฐ๊ณผ:

      Using GPU: Tesla T4
      CUDA Runtime Version: 12.2
      CUDA Driver Version: 12.2
    

์‹คํ—˜ ์„ธํŒ…

  • Point ๋ฅผ N๊ฐœ ๋žœ๋ค ์ƒ์„ฑํ•˜๊ณ , ๋Œ€์ถฉ ์ •ํ•œ ์ด๋ฏธ์ง€ (640x480) ์— projection ํ•˜๋Š” ์˜ˆ์‹œ.

์‹คํ—˜ ๊ฒฐ๊ณผ ํ™”๋ฉด

  • godbolt ์—์„œ ์ˆ˜ํ–‰ ์‹œ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ฒฐ๊ณผ๋„ ์ถœ๋ ฅ๋œ๋‹ค.

์‹คํ—˜ ๊ฒฐ๊ณผ (์ˆ˜ํ–‰ ์‹œ๊ฐ„)

  • ๊ฒฐ๊ณผ ๊ฐ’ ์ž์ฒด๋Š” ์˜ˆ์‹œ์ด๋‹ˆ ํฐ ์˜๋ฏธ์—†๊ณ  ์‹œ๊ฐ„์„ ์ธก์ •ํ•ด๋ณด์ž.
  • cudaEventRecord ๋ผ๋Š” ๊ฑธ ์‚ฌ์šฉํ•˜๋ฉด device (GPU)์—์„œ ์ด๋ฃจ์–ด์ง€๋Š” code block์— ๋Œ€ํ•ด์„œ๋„ ์‹œ๊ฐ„์„ ์ธก์ •ํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ  ํ•œ๋‹ค.
    • ์˜ˆ์‹œ

  • N ์„ ๋ฐ”๊ฟ”๊ฐ€๋ฉด์„œ, ๊ฐ 3๋ฒˆ์”ฉ ์ˆ˜ํ–‰:
    • N = 1000
        Memory usage at before allocation: used = 105.000000, free = 14825.562500 MB, total = 14930.562500 MB
        Memory usage at after allocation: used = 109.000000, free = 14821.562500 MB, total = 14930.562500 MB
      
        * Point generation time: 0.036864 ms
        * Spherical transformation time: 0.060384 ms
        * Copying to host time: 0.032896 ms
        Total execution time: 0.130144 ms
      
        * Point generation time: 0.016064 ms
        * Spherical transformation time: 0.017056 ms
        * Copying to host time: 0.02736 ms
        Total execution time: 0.06048 ms
      
        * Point generation time: 0.016192 ms
        * Spherical transformation time: 0.016768 ms
        * Copying to host time: 0.024384 ms
        Total execution time: 0.057344 ms
      
    • N = 10000
        Memory usage at before allocation: used = 105.000000, free = 14825.562500 MB, total = 14930.562500 MB
        Memory usage at after allocation: used = 109.000000, free = 14821.562500 MB, total = 14930.562500 MB
      
        * Point generation time: 0.038912 ms
        * Spherical transformation time: 0.057376 ms
        * Copying to host time: 0.091008 ms
        Total execution time: 0.187296 ms
      
        * Point generation time: 0.018336 ms
        * Spherical transformation time: 0.017312 ms
        * Copying to host time: 0.086208 ms
        Total execution time: 0.121856 ms
      
        * Point generation time: 0.018464 ms
        * Spherical transformation time: 0.01664 ms
        * Copying to host time: 0.086176 ms
        Total execution time: 0.12128 ms
      
    • N = 100000
        Memory usage at before allocation: used = 105.000000, free = 14825.562500 MB, total = 14930.562500 MB
        Memory usage at after allocation: used = 109.000000, free = 14821.562500 MB, total = 14930.562500 MB
      
        * Point generation time: 0.064512 ms
        * Spherical transformation time: 0.09008 ms
        * Copying to host time: 0.683456 ms
        Total execution time: 0.838048 ms
      
        * Point generation time: 0.042784 ms
        * Spherical transformation time: 0.045728 ms
        * Copying to host time: 0.723328 ms
        Total execution time: 0.81184 ms
      
        * Point generation time: 0.043008 ms
        * Spherical transformation time: 0.045056 ms
        * Copying to host time: 0.670624 ms
        Total execution time: 0.758688 ms
      
    • N = 1000000
        Memory usage at before allocation: used = 105.000000, free = 14825.562500 MB, total = 14930.562500 MB
        Memory usage at after allocation: used = 127.000000, free = 14803.562500 MB, total = 14930.562500 MB
      
        * Point generation time: 0.354464 ms
        * Spherical transformation time: 0.399936 ms
        * Copying to host time: 4.54739 ms
        Total execution time: 5.30179 ms
      
        * Point generation time: 0.323712 ms
        * Spherical transformation time: 0.342688 ms
        * Copying to host time: 4.5623 ms
        Total execution time: 5.2287 ms
      
        * Point generation time: 0.323584 ms
        * Spherical transformation time: 0.341408 ms
        * Copying to host time: 4.63437 ms
        Total execution time: 5.29936 ms
      
    • N = 5000000
        Memory usage at before allocation: used = 105.000000, free = 14825.562500 MB, total = 14930.562500 MB
        Memory usage at after allocation: used = 205.000000, free = 14725.562500 MB, total = 14930.562500 MB
      
        * Point generation time: 1.76397 ms
        * Spherical transformation time: 1.72032 ms
        * Copying to host time: 20.9988 ms
        Total execution time: 24.483 ms
      
        * Point generation time: 1.7425 ms
        * Spherical transformation time: 1.67654 ms
        * Copying to host time: 20.9689 ms
        Total execution time: 24.388 ms
      
        * Point generation time: 1.6937 ms
        * Spherical transformation time: 1.62406 ms
        * Copying to host time: 21.2419 ms
        Total execution time: 24.5596 ms
      
    • N = 7000000
        Memory usage at before allocation: used = 105.000000, free = 14825.562500 MB, total = 14930.562500 MB
        Memory usage at after allocation: used = 243.000000, free = 14687.562500 MB, total = 14930.562500 MB
      
        * Point generation time: 2.51101 ms
        * Spherical transformation time: 2.38698 ms
        * Copying to host time: 29.7932 ms
        Total execution time: 34.6912 ms
      
        * Point generation time: 2.48816 ms
        * Spherical transformation time: 2.34291 ms
        * Copying to host time: 29.6979 ms
        Total execution time: 34.5289 ms
      
        * Point generation time: 2.49011 ms
        * Spherical transformation time: 2.34176 ms
        * Copying to host time: 29.7214 ms
        Total execution time: 34.5533 ms
      
    • N = 9000000
        Memory usage at before allocation: used = 105.000000, free = 14825.562500 MB, total = 14930.562500 MB
        Memory usage at after allocation: used = 281.000000, free = 14649.562500 MB, total = 14930.562500 MB
      
        * Point generation time: 3.10112 ms
        * Spherical transformation time: 2.92096 ms
        * Copying to host time: 37.5393 ms
        Total execution time: 43.5613 ms
      
        * Point generation time: 3.07869 ms
        * Spherical transformation time: 2.87117 ms
        * Copying to host time: 37.7315 ms
        Total execution time: 43.6814 ms
      
        * Point generation time: 3.08115 ms
        * Spherical transformation time: 2.86979 ms
        * Copying to host time: 37.6602 ms
        Total execution time: 43.6111 ms
      
    • N = 10000000
        Program terminated with signal: SIGKILL
      
      • ์ฒœ๋งŒ๊ฐœ๋Š” godbolt์—์„œ ์‹คํ–‰์„ ์•ˆํ•ด์ค€๋‹คโ€ฆ ใ…Ž
  • ์œ„ ๊ฒฐ๊ณผ๋ฅผ ๊ทธ๋ฆฌ๋ฉด (3ํšŒ๋“ค์˜ ํ‰๊ท ๊ฐ’):

    • ์ด ์‹คํ—˜์„ ํ†ตํ•ด ์šฐ๋ฆฌ๋Š” ์ด๋ฏธ ์•Œ๊ณ ์žˆ๋˜ ์‚ฌ์‹ค์„ ๋‹ค์‹œ (godbolt ๋•๋ถ„์—) ๋น ๋ฅด๊ฒŒ ์•Œ์•„๋ณผ ์ˆ˜ ์žˆ์—ˆ๋‹ค.
      1. GPU๋Š” ๋ณ‘๋ ฌ ์ž‘์—…์„ ๋น ๋ฅด๊ฒŒ ์ฒ˜๋ฆฌํ•œ๋‹ค.
        • task๋งˆ๋‹ค ๋‹ค๋ฅด๊ฒ ์ง€๋งŒ, 900๋งŒ ํฌ์ธํŠธ๋ฅผ 3d์—์„œ 2d (spherical image)๋กœ projectionํ•˜๋Š” ์—ฐ์‚ฐ์ด T4์—์„œ 3ms ์ •๋„ ์†Œ์š”๋˜์—ˆ๋‹ค.
      2. thrust ์˜ API๋ฅผ ์ด์šฉํ•˜๋ฉด ์ด๋Ÿฌํ•œ ์ด๋“์„ ํ”„๋กœ๊ทธ๋ž˜๋จธ ์ž…์žฅ์—์„œ ์•„์ฃผ ์†์‰ฝ๊ฒŒ ์ทจํ•  ์ˆ˜ ์žˆ๋‹ค.
      3. CPU์™€ GPU์‚ฌ์ด copy๋Š” ๋งค์šฐ ๋Š๋ฆฌ๋‹ค.
        • ps. ์š”์ฆ˜ ๊ทธ๋ž˜์„œ HBM์ด๋ผ๋Š” ๋‹จ์–ด๋„ ํ•ซํ•œ ๋“ฏ ํ•˜๋‹ค.
        • ์œ„์˜ ์˜ˆ์‹œ์—์„œ๋Š” input point cloud ๋ฅผ random ํ•˜๊ฒŒ ์ƒ์„ฑํ•˜๋Š” ๊ฒƒ๋„ device-side ์—์„œ thrust::transform์„ ํ†ตํ•ด์„œ ์ˆ˜ํ–‰ํ•˜์—ฌ์„œ ์—ญ์‹œ ์‹œ๊ฐ„ ์†Œ์š”๊ฐ€ ๋งค์šฐ ์ ์—ˆ๋‹ค.

๊ฒฐ๋ก 

  • godbolt ๋Š” ์•„์ฃผ ์ข‹์€ ์‚ฌ์ดํŠธ์ด๋‹ค.
    • share ๋„ ๊ฐ€๋Šฅํ•˜๋‹ค. ๊ต์œก์šฉ์œผ๋กœ ์•„์ฃผ ์ข‹์Œ.

    • godbolt cppcon ๋ผ๊ณ  ์œ ํŠœ๋ธŒ์—์„œ ๊ฒ€์ƒ‰ํ•˜๋ฉด Matt Godbolt ์„ ์ƒ๋‹˜์˜ ์•„์ฃผ ์ข‹์€ ๊ฐ•์˜๋“ค์ด ๋˜ ๋งŽ์ด ์žˆ๋‹ค.
  • thrust ์‹ค์Šต์„ ์ด์–ด๋‚˜๊ฐ€๋ณด์ž.