SynDEx allows the efficient programming
of parallel, distributed, heterogeneous architectures,
composed of several different types of processors,
and of several different types of communication medium.
From a user specification
of an algorithm dataflow graph and of an architecture resources graph,
and from algorithm and architecture characterized libraries,
SynDEx automatically generates
an application specific executive code for each processor,
and provides a makefile
to automate the compilation and linking of each executive,
and its downloading
into the program memory of the corresponding processor.
Separate programming of non-volatile program memories being unpractical,
SynDEx considers that each processor has,
for only non-volatile resident program,
a boot-loader
(which may be very small and simple,
or may rely on a big and complex operating system)
expecting an executive to be downloaded
from a neighbour processor
through a communication medium,
except for a single host processor,
designated by the name root
in the specified architecture graph,
which boot-loader
expects all executives to be stored altogether
in its local non-volatile memory.
Consequently, SynDEx computes,
over the architecture graph,
an oriented coverage tree
rooted on the root processor,
and generates in each processor executive
the code needed to download the compiled executives through this tree,
in a predetermined order which is also used to generate the makefile.
This process is the same for all processors,
except that the root processor
gets executives from its local non-volatile memory,
whereas all the other processors
get executives from their neighbour processor
which is their ascendant towards the root of the download tree.
The processors which have the same ascendant processor
are called the descendants of that processor.
When powered on,
each processor boots by executing its resident boot-loader
which gets the processor’s executive,
loads it into the processor’s program memory,
and executes it.
During its initialization phase,
the executive gets and forwards executives to all its descendants,
before proceeding with application data processing.
The root processor,
usually an embedded PC or other kind of workstation,
bootloads from its disk an operating system
which automatically loads and executes a startup program
allowing the user to choose between different applications.
During early developments,
this program may be a simple shell
(but this requires a keyboard to be available),
and the user enters a make command
to compile the executives if needed,
and to execute the root executive,
with the other executive files passed as arguments on the command line.
In applications
where it is unpractical to use a keyboard permanently connected,
the startup program may use another input device
(for example a switch or a touch screen)
to let the user choose between different predefined shell commands,
starting different applications
through the corresponding make command,
or simply launching a shell for interaction with a keyboard.
In more deeply embedded applications,
where the root processor
has neither a disk nor an operating system,
all the executives are stored in a FLASH memory,
and the root processor boots by executing directly its own executive,
and finds the other executives sequentially stored in its FLASH.
The first executive forwarded to a descendant
is received, stored, and executed by that descendant’s boot-loader.
Then, while that descendant’s executive asks for executives,
the ascendant executive
gets and forwards the next executives to the same descendant,
until that descendant’s executive signals
that it has itself no more executives to forward.
Then the ascendant may switch to its next descendant,
until it has no more descendant to service,
and hence no more executive to forward.
This fully sequential download process boots processors
in the order of a depth-first traversal of the download tree.
In the case of a point-to-point medium,
the descendant executive may proceed to application data communications
as soon as it has no more executive to forward,
whereas in the case of a multipoint medium,
the descendant executive must wait
until the ascendant executive signals that
it has no more executive to forward
(to avoid communication interferences
between descendant application data and ascendant download data).
Each processor type may have a different compiler (linker) output format, and some processor types may have a ROM-ed embedded boot-loader (firmware), with its own requirements on the download format. The SynDEx common download format encapsulates the details and the differences of the compiler output formats, and of the boot-loaders download formats; it is composed as follows:
The first executive
forwarded to a descendant being received by that descendant’s boot-loader,
that executive must be sent without its four bytes prefix;
the following executives
sent to the same descendant being forwarded by that descendant’s executive,
they must be sent with their four bytes prefix.
The sequence of bytes itself
must follow the format expected by the destination boot-loader.
Therefore a linker post-processor
must be developped for each processor type,
to translate the linker output file
into the SynDEx common dowload format described above.
All the post-processors’ outputs
will be concatenated by the makefile
into a unique contiguous image (file),
that the root executive will use as source.
The downloader code is generated by two macros:
Processor names
are usefull to address processors
connected to multipoint medium:
a processor name may be suffixed
to give the name of a user defined macro,
which substitution gives the processor address.
As executives data
may be forwarded through several communication medium
of different bandwidths,
transfers must be synchronized such that data flow
at the speed of the slowest communication medium.
Between processors,
if flow control is not supported by the communication medium hardware,
it must be implemented by ready to receive control messages
sent by the loadFrom_ code
for each chunk of data to be sent by the loadDnto_ code.
Inside a processor,
the loadFrom_ and loadDnto_
macro cooperation
is based on the order
in which the spawn_thread_ macros
(one for each communication sequence,
i.e. for each communication media)
are generated in the initialization phase
of the main_ ... endmain_ sequence:
the spawn_thread_ macro
corresponding to the thread_ macro
of the communication sequence
starting with the loadFrom_ macro
(i.e. of the media connected to the ascendant processor)
is called first,
followed by the other spawn_thread_ macros,
among which the ones, if any,
corresponding to the communication sequences
with a loadDnto_ macro
(i.e. of the media connected to the descendant processors).
If the processor is a leaf node of the download tree,
its loadFrom_ macro
has only one argument;
in this case,
it directly generates the code
sending to the ascendant processor
a "null" message meaning that no more executive is requested,
followed, in the case of a multipoint medium,
by the code waiting for other executives
to be downloaded to the other processors
connected to the communication medium,
until the ascendant processor
sends an "empty" executive
meaning that the download process is complete
on this communication medium.
Otherwise,
before generating the code described in the previous paragraph,
the loadFrom_ macro
generates a RETURN instruction
(which will return control
after the CALL instruction
generated by the spawn_thread_ macro),
followed by a loadFrom_end_: label,
and the loadFrom_ macro
also defines three macros
for use by the loadDnto_ macros:
If the code generated by any of these three macros is limited to a few instructions, it may be generated inline, otherwise the loadFrom_ macro generates this code as a subroutine (between the RETURN instruction and the loadFrom_end_ label), and a call to that subroutine is generated instead of the inline code.