How to benchmark some algorithms for Cortex-M architecture

430 views Asked by At

For my current project i have to investigate the runtime behavior (used cycles) of different algorithms on a Cortex-M4. The algorithms are pure computation in C, no IO and interrupts. Any hints and ideas how to do it?

My current idea is to create a minimal application and use renode (https://renode.io/) for cycle counting:

  • Create a file test.c with one function with fixed signature that runs my algorithm
  • Compile and link it to perform a minimal application
  • Load the application and the needed input data into renode
  • Run the application
  • Extract the output data from renode
  • Use the profiling data from renode to rate the algorithms

And now the questions:

  • Has anyone used renode or QEMU for similar purposes?
  • How to create a true minimal application? (crt0,ld flags)
  • Any other ideas for my problem?
  • How to configure a minimal system in renode? Which components are a minimal subset to successful run a C program?

Regards Jan

1

There are 1 answers

0
Piotr Zierhoffer On

FYI: I work at Antmicro and am one of the authors of Renode

There are really many ways to perform such profiling. Note that Renode is not cycle-accurate, but you can track virtual time progression.

One of the possible approaches would be to use Renode's metrics analyzer. You can read the docs here: https://renode.readthedocs.io/en/latest/basic/metrics.html

It allows you to capture data and analyze it in Python or generate some graphs straight away:

# in Renode
(monitor) machine EnableProfiler "path_to_dump_file"

# in Bash
python3 tools/metrics_analyzer/metrics_visualizer/metrics-visualizer.py path_to_dump_file

You can also analyze the virtual time passed until a specific string appears on UART. This can be done with a Robot test. An example of timestamp extraction can be found here: https://github.com/renode/renode/blob/master/tests/platforms/QuarkC1000/QuarkC1000.robot#L44

${r}        Wait For Line On Uart     My String
            Do Something With Time    ${r.timestamp}

Another option would be to instrument your code and dump binary data from memory, if needed.

You can also add hooks to be called on specific Program Counter value - then you can dump such a timestamp to log.

There are possibly many other options to move forward, but it would depend on your specific use case.

Minimal system in Renode: depending on your software, it would require

  • a core
  • nvic controller, if it's Cortex-M
  • memory
  • uart if you want output.

UPDATE:

We have added some tracing features that allow you to use https://www.speedscope.app/ or https://ui.perfetto.dev/ to display traces of execution, very useful in profiling.

The quick way to enable it for speedscope is:

cpu EnableProfilerCollapsedStack @path/to/trace true

For more details please see this chapter in the docs: https://renode.readthedocs.io/en/latest/advanced/execution-tracing.html