I'm developing a general purpose image processing core for FPGAs and ASICs. The idea is to interface a standard processor with it. One of the problems I have is how to "program" it. Let me explain: The core has a instruction decoder for my "custom" extensions. For instance:
vector_addition $vector[0], $vector[1], $vector[2] // (i.e. v2 = v0+v1)
and many more like that. This operation is sended by the processor through the bus to the core, using the processor for loops, non-vector operations, etc, like that:
for (i=0; i<15;i++) // to be executed in the processor
vector_add(v0, v1, v2) // to be executed in my custom core
Program is written in C/C++. The core only need the instruction itself, in machine code
- opcode = vector_add = 0x12h
- register_src_1 = v0 = 0x00h
- register_src_2 = v1 = 0x01h
register_dst = v2 = 0x02h
machine code = opcore | v0 | v1 | v2 = 0x7606E600h
(or whatever, just a contatenation of different fields to build the instruction in binary)
Once sending it through the bus to the core, the core is able to request all data from memory with dedicated buses and to handle everything without use the processor. The big cuestion is: how can I translate the previous instruction to its hexadecimal representation? (send it throught the bus is not a problem). Some options that come to mind are
- Run interpreted code (translate to machine code at runtime in the processor) --> very slow, even using some kind of inline macro
- Compile the custom sections with an external custom compiler, load the binary from the external memory and move it to the core with some unique instruction --> hard to read/understand source code, poor SDK integration, too many sections if code is very segmented
- JIT compilation --> to complex just for this?
- Extending the compiler --> a nightmare!
- A custom processor connected to the custom core to handle everything: loops, pointers, memory allocation, variables... --> too much work
The problem is about software/compilers, but for those that have deep knowledge in this topic, this is a SoC in an FPGA, the main processor is a MicroBlaze and the IP Core employes AXI4 buses.
I hope I explained it correctly... Thanks in advance!
Couldn't you translate your all your sections of code to machine code at the start of the program (just once), save them in binary format in blocks of memory and then use those binaries when needed.
That's basically how the OpenGL shaders work, and I find that quite easy to manage.
The main drawback is the memory consumption, as you have in memory both the text and binary representation of the same scripts. I don't know if this is a problem for you. If it is, there are partial solutions, as unloading the source texts once they are compiled.