AutoDSE Tutorial
Design Space Definition
The candidate pragmas for AutoDSE can be in either of the following forms:
#pragma ACCEL PIPELINE auto{pragma_name}
#pragma ACCEL PARALLEL factor=auto{pragma_name}
#pragma ACCEL TILE factor=auto{pragma_name}
You can either let AutoDSE augment the code with the candidate pragmas or do it yourself. To make AutoDSE do it, run one of the following commands based on your use case:
ds_generator [-I<include dir>] <kernel file> # only adds the candidate pragmas to the kernel code and stores the generated code in `rose_merlinkernel_[KERNEL_NAME].c`
autodse <project dir> <working dir> <kernel file> <fastgen|accurategen> [<database file>] # adds candidate pragmas and runs DSE after it
Each of the above commands will produce a file named ds_info.json
which describes the design space along with AutoDSE’s settings. Each of the pragmas would be defined in the following form:
"__PARA__L0" :
{
"ds_type" : "PARALLEL",
"options" : "[x for x in [1,2,4,7,8,14,16,32] if x*__TILE__L0<=32]",
"default" : 1,
"order" : "0 if x&(x-1)==0 else 1"
}
- ds_type: this attribute defines the type of the pragma. Choices are
PIPELINE
,PARALLEL
, andTILE
. - options: this attribute shows the available options for the respective pragma as a Python list. As the example shows, it can have a condition part to define a dependency with other pragmas. Naturally, the definition shouldn’t result in an empty test.
- default: the default option for the pragma which should turn it off and be an element of the
options
attribute. - order: this attribute can describe a preference in searching order for the respective pragma. In the above example, we set higher priority to the values that are powers of 2.
Please refer to this file for a complete example on design space (DS) definition for the GEMM kernel from the Machsuite benchmark
AutoDSE Settings
The rest of the settings in the config file determines how AutoDSE should be run. The following table gives a description of each of them.
Setting | Description |
---|---|
design-space.max-part-num | the maximum numbers of DS partitions allowed |
evaluate.command.bitgen | Merlin command to generate the bitstream |
evaluate.command.hls | Merlin command to run the HLS synthesis |
evaluate.command.transform | Merlin command for applying its code transformations |
evaluate.max-util.[XX] | the maximum allowed utilization for resource XX |
project.backup | type of project backup, choices: BACKUP_ERROR , NO_BACKUP , or BACKUP_ALL |
project.fast-output-num | number of top designs generated as Merlin projects in fast mode |
search.algorithm.gradient.fine-grained-first | if set to true starts optimizing from the innermost loops in the bottleneck optimizer |
search.algorithm.gradient.latency-threshold | the minimum threshold latency we wish to achieve |
search.algorithm.gradient.quality-type | how to measure the quality of design, choices: performance , or finite-difference |
search.algorithm.name | exploration strategy, use bottleneck to implement the paper’s approach. You can also use exhaustive for exhaustive search or hybrid for a combination of these two. |
timeout.bitgen | time limit (in minutes) for generating the bitstream |
timeout.exploration | time limit (in minutes) for DSE |
timeout.hls | time limit (in minutes) for the HLS synthesis |
timeout.transform | time limit (in minutes) for Merlin’s code transformation |
Output Structure
The generated files structure is as shown below:
.
+-- evaluate # backup directory which includes some of the explored Merlin projects
+-- output # includes the Merlin project for top designs and a summary of them
+-- logs # all the log files
| +-- dse.log # contains all the messages printed in the console
| +-- eval.log # shows how many jobs were run and whether or not they were successful
| +-- partX_log.log # includes a summary of the status of explorer in DS partition X
| +-- partX_expr.log # includes a more detailed summary of the explorer in DS partition X
+-- result.db # includes a database of the explored designs
+-- summary_[XX].rpt # summarizes all the points explored in mode XX (`fast` or `accurate`)
General Tips
- When AutoDSE is finished, it will save all the explored design points in a file (by default:
redis.db
) as a Redis database. In the presence of such file, you can resume the DSE by re-running the DSE command:
dse <project dir> <working dir> <config file> <fast|accurate> [<database file>]
- Since AutoDSE runs the HLS tool to assess a design point, the timeout values may need to be changed across different kernels. If the final report of the AutoDSE shows high number of timed out designs, adjust the respective time limit and rerun the tool.