List Question
20 TechQA 2024-03-21T21:01:25.743000Convert Variable Width Bitstream (2-bit or 4-bit symbols) into Fixed Width
73 views
Asked by SapphireSun
ARM Neon Intrinsics - Lanes in FMA
55 views
Asked by Jacob FW
Preventing Arm Neon d8-d15 spilling in a function
43 views
Asked by Aki Suihkonen
ARM64 ASIMD intrinsic to load uint8_t* into uint16x8(x3)?
114 views
Asked by ImJustACowLol
How to use float16 neon intrinsics on Android?
62 views
Asked by fabian
Is there an ARM Neon Gather Instruction?
113 views
Asked by fabian
what's the difference between the vrnd32x and vrndx?
28 views
Asked by xxxLD
How do you compute the bitwise exclusive prefix parity on ARM Neon?
109 views
Asked by Jan Schultke
How to convert 8-bit YUV420 image to RGB with Neon?
220 views
Asked by zuguorui
Do AArch64 SIMD instructions zero/sign extend results?
101 views
Asked by John Källén
How can I do efficiently bitwise majority voting on 3, 5, 7, 9 inputs with SSE/SSE2/AVX/...?
374 views
Asked by Philipp Gühring
difference between vmovq_n_f32() and vdupq_n_f32()
82 views
Asked by Frank Ngwee
Optimize simd instructions (mov) for arm64 to pack alternating bytes into contiguous bytes (hex to uint64_t)
90 views
Asked by Stephane
How to init neon data type correctly in big endian
52 views
Asked by hstk
How to load global data to NEON registers more efficiently in Go's Assembler?
73 views
Asked by Emman Sun
How to Optimize 1024x1024 Matrix Multiplication in C to Match NumPy's Performance in M1 silicon
153 views
Asked by Steven Daniel Anderson
Does vfmaq_f32 really have higher running accuracy?
95 views
Asked by gaoshuzhendanteb
What is the difference between vfmaq_f32 and vmlaq_f32 in the neon instruction set, and the difference in running speed and accuracy
230 views
Asked by gaoshuzhendanteb
Cannot compile simple program which uses ARM Neon for Cortex A53
217 views
Asked by Douglas B