Running Yeppp library with Mono on Raspbery Pi

499 views Asked by At

I have an application using the Yeppp! SIMD library. The application is written in C#. It runs perfectly on Windows x86-32 and x86-64. However, when I run the application on a Raspberry Pi with Mono I get the following exception (not sure if it's an ARM issue, a Mono issue, or something else). I've tried running as root just to check, also same exception. I noticed the "UnixLibraryLoader" part of the stack trace so I made sure the Yeppp DLL (Yeppp.CLR.Bundle.dll) is in the same directory as the executable, which it is. Is this a problem with my code, the way I compiled it, or a problem with the library?

    Stacktrace:

  at <unknown> <0xffffffff>
  at (wrapper managed-to-native) Yeppp.UnixLibraryLoader.dlopen (string,int) <0xffffffff>
  at Yeppp.UnixLibraryLoader.Yeppp.INativeLibraryLoader.LoadLibrary (string) <0x0002f>
  at Yeppp.NativeLibrary..ctor (string,Yeppp.INativeLibraryLoader) <0x0006b>
  at Yeppp.Loader.LoadNativeLibrary () <0x000db>
  at Yeppp.Library.Init () <0x00027>
  at <Module>..cctor () <0x0000b>
  at (wrapper runtime-invoke) object.runtime_invoke_void (object,intptr,intptr,intptr) <0xffffffff>
  at <unknown> <0xffffffff>
  at SimdSpeedTest.Program.DisplayCpuFeatures () <0x00033>
  at SimdSpeedTest.Program.Main (string[]) <0x000c7>
  at (wrapper runtime-invoke) <Module>.runtime_invoke_void_object (object,intptr,intptr,intptr) <0xffffffff>

Native stacktrace:


Debug info from gdb:

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".
[New Thread 0xb5b7b430 (LWP 2272)]
0xb6eabaac in waitpid () from /lib/arm-linux-gnueabihf/libpthread.so.0
  Id   Target Id         Frame
  2    Thread 0xb5b7b430 (LWP 2272) "mono" 0xb6ea9770 in sem_wait@@GLIBC_2.4 () from /lib/arm-linux-gnueabihf/libpthread.so.0
* 1    Thread 0xb6f80000 (LWP 2271) "mono" 0xb6eabaac in waitpid () from /lib/arm-linux-gnueabihf/libpthread.so.0

Thread 2 (Thread 0xb5b7b430 (LWP 2272)):
#0  0xb6ea9770 in sem_wait@@GLIBC_2.4 () from /lib/arm-linux-gnueabihf/libpthread.so.0
#1  0x001fff10 in mono_sem_wait (sem=0x2f523c, alertable=1) at mono-semaphore.c:119
#2  0x0017db28 in finalizer_thread (unused=<optimized out>) at gc.c:1073
#3  0x001625b4 in start_wrapper_internal (data=0xb0d8c8) at threads.c:643
#4  start_wrapper (data=0xb0d8c8) at threads.c:688
#5  0x001f5c30 in thread_start_routine (args=0xac86c0) at wthreads.c:294
#6  0x00204268 in inner_start_thread (arg=0xac86b4) at mono-threads-posix.c:49
#7  0xb6ea2c00 in start_thread () from /lib/arm-linux-gnueabihf/libpthread.so.0
#8  0xb6e0f728 in ?? () from /lib/arm-linux-gnueabihf/libc.so.6
#9  0xb6e0f728 in ?? () from /lib/arm-linux-gnueabihf/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 1 (Thread 0xb6f80000 (LWP 2271)):
#0  0xb6eabaac in waitpid () from /lib/arm-linux-gnueabihf/libpthread.so.0
#1  0x000b2148 in mono_handle_native_sigsegv (signal=<optimized out>, ctx=<optimized out>) at mini-exceptions.c:2299
#2  0x00027af8 in mono_sigsegv_signal_handler (_dummy=11, info=0xbe9280e0, context=0xbe928160) at mini.c:6777
#3  <signal handler called>
#4  0xb6f6d754 in ?? () from /lib/ld-linux-armhf.so.3
#5  0xbe9284a0 in ?? ()
Cannot access memory at address 0x3000
#6  0xbe9284a0 in ?? ()
Cannot access memory at address 0x3000
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
3

There are 3 answers

2
Monoman On

I would guess Mono on the ARMV6 Hard-float architecture of RasPi, probably is having trouble in handling the intentional SIGILL issued by the feature detection code in Yepp (https://bitbucket.org/MDukhan/yeppp/src/40148ba4cdd00b03dfa880f6b7cecce83979c9d3/library/sources/library/Probe.arm.asm?at=default) and may be collapsing.

Detection depends on SIGILL handling just make the unsupported instructions being skipped. Other possibility is that the lib is not correctly retrieved from the resource (or the wrong one is retrieved as the Yeppp.Loader.LoadNativeLibrary guesses which native lib to use for the architecture it is running on) and when passing the execution to it things crash.

I second that you should contact the developer as I could not find any reference in the site, and on the source files I perused that indicate that RasPi and it's oldish version of ARM is supported.

PS.: I assumed you are using Raspbian that uses HardFloat, and a recent version of Mono (which initially was using the incompatible SoftFloat).

0
Z boson On

Now that the bounty is over let me put my comments into an answer here.

You're using a preview release so it's maybe not surprising that it's not working as your expect.


The Raspberry Pi 1 does not have NEON but does VFP2. The VFP instructions are not SIMD instructions despite the misleading acronym of Vector Floating Point (see arm-cortex-a8-whats-the-difference-between-vfp-and-neon and VFP SIMD Instructions, howto?).

VFPv2 was introduced with the ARMv5TE, ARMv5TEJ and ARMv6 architectures. So even though the source code for Yeppp! does not reference ARMV6 explicitly that does not necessarily mean it does not support VFPv2 since it does reference ARMV5T.

What would be the advantage of using Yeppp! with the Raspberry Pi 1 then since VFP instructions are not SIMD instructions? My guess is that GCC does not implement these well and so it may be advantageous to do it explicitly with Yeppp!.


I'm not sure what the peak flops of he Raspberry PI 1 is. However, benchmarks have measured

  • 0.041 DP GFLOPS
  • 0.192 SP GFLOPS

The cortex A7 cores, which which has NEON and VFP3, of the Raspberry Pi 2 can do:

  • 0.5 DP FLOPs/cycle: scalar VMLA.F64 every four cycles.
  • 1.0 DP FLOPs/cycle: scalar VADD.F64 every cycle.
  • 2.0 SP FLOPs/cycle: scalar VMLA.F32 every cycle.
  • 2.0 SP FLOPs/cycle: 2-wide VMLA.F32 every other cycle.

The Raspberry Pi 2 has four cores so the PEAK flops is 4* FLOPs/cycle/core.

Note that the peak FLOPS of Neon with the Cortex-A7 is the same as the peak FLOPS of VFP. The Cortex-A7 is 100% binary instruction set compatible with the Cortex-A15 which is why it's use in the ARM big.LITTLE design. So Neon is implemented in the Cortex-A7 only to be compatible.

I don't know about Integer operations per cycle yet.

However, there is another SIMD option for the Raspberry PI 1 and 2. You can use Integer SIMD instructions on the VideoCore IV (see also NEON instruction set support SIMD). You could implement fixed point with this. This could potentially give you a lot more performance than NEON anyway.

0
Marat Dukhan On

Yeppp! supports two Linux ARM platforms:

  • ARMv5TE + soft-float ABI (arm-linux-gnueabi)
  • ARMv7-A + hard-float ABI (arm-linux-gnueabihf)

Most Linux distributions for Raspberry Pi use unusual ARMv6 + hard-float ABI. Yeppp!'s ARMv7-A + hard+float version uses Thumb-2 instructions, which are not supported by Raspberry Pi. That is why you get SIGILL when trying to use it.

I can suggest two workarounds:

  • Use Raspberry Pi with soft-float Linux distribution
  • Use Raspberry Pi 2, which supports ARMv7-A (and thus Thumb-2)