Add recursive partitioning ternary tree (RPTT) #1049

diriLin · 2024-08-14T08:05:24Z

No description provided.

Signed-off-by: Wenhao Lin <[email protected]>

Signed-off-by: Eddie Hung <[email protected]>

eddieh-xlnx · 2024-08-14T16:59:26Z

This PR brings in the Recursive Partition Ternary Tree technique as described in:

@inproceedings{zang2024parallel,
  title={An Open-Source Fast Parallel Routing Approach for Commercial FPGAs},
  author={Zang, Xinshi and Lin, Wenhao and Lin, Shiju and Liu, Jinwei and Young, Evangeline FY},
  booktitle={Proceedings of the Great Lakes Symposium on VLSI 2024},
  year={2024}
}

This technique is introduced across two new classes CUFR (which extends RWRoute) and PartialCUFR (which extends PartialRouter).

First up are the end-to-end wall clock results for PartialCUFR on all 28 benchmarks using the FPGA24 Routing Contest infrastructure on a 16-core 32-thread machine:

Runtime is normalized to the baseline RWRoute wall time, and in ascending order to this time. Lower normalized numbers represent faster wall clock time, and normalized values less than 1.0 represent a speed up over RWRoute. Most of the benchmarks stay under 1.0 representing a speedup. Two lines are shown RPTT-only, and RPTT-with-HUS.

For RPTT-only, two benchmarks (mlcad_d181*) slightly exceed 1.0 -- further investigation shows that these designs are likely sensitive to net ordering. Forcing single-threaded RWRoute to use the same net ordering gives RPTT-only a normalized value of 0.91 (hence a speedup).

For the largest designs (mlcad_d181*, boom_soc_v2), where HUS does noticeably kick in, significant runtime improvements are seen. HUS does activate and appears to hurt the performance of both corundum_* runs though.

Here's another figure of the CPU time, normalized again to the baseline result:

In general, hovering around the 1.0 value shows that CUFR is not spending more CPU time overall than sequential RWRoute, but uses more of them in parallel to get it done quicker. With HUS, this extends to doing less work too.

Note that these numbers are for the end-to-end result, which includes reading the FPGA Interchange Format benchmarks and writing them (with routed results) all back out again.

Geomean summary:

RPTT-only: 1.6x end-to-end speedup
RPTT+HUS: 1.9x end-to-end speedup

A few other bits of note:

A neat effect of this technique is that CUFR is deterministic -- regardless of whether one thread is used or many.
The routing result is not expected to be the same as for RWRoute due to nets being routed in a different order. Thus it's possible (as we saw for mlcad_d181*) that the routing problem becomes harder or easier, regardless of whether it is being solved in a parallel fashion.

Signed-off-by: Eddie Hung <[email protected]>

eddieh-xlnx · 2024-08-14T19:48:10Z

Can we use a smaller design or do anything to reduce the memory footprint?

Turns out ThreadLocal can incur memory "leaks" since the only decent way for its values to get garbage collected (without killing the thread) is for the owning thread to call ThreadLocal.remove(). Even allowing ThreadLocal to be GC-ed does not guarantee its values will be GC-ed. However, by the time we know all overlaps have been resolved and we're done routing which will be time at which we don't need to re-use ConnectionStates anymore, there's no practical way to cycle through all the threads to call remove().

Switch to a ConcurrentHashMap instead -- this only gets called once per routed connection, so performance impact should not be noticeable.

Here, the memory leak was because we had a non-timing-driven RouteNodeGraph (created by earlier tests) followed by a RouteNodeGraphTimingDriven (later tests) that both held onto many many nodes.

eddieh-xlnx · 2024-09-04T17:56:56Z

Looks like this got closed because the target branch was merged. @diriLin is it possible for you to re-open (otherwise you'll have to start a new PR).

diriLin · 2024-09-04T18:13:58Z

Looks like this got closed because the target branch was merged. @diriLin is it possible for you to re-open (otherwise you'll have to start a new PR).

@eddieh-xlnx It seems that I cannot re-open this PR because the base branch has been deleted. I would create a new PR.

diriLin and others added 5 commits August 14, 2024 16:00

Add recursive partitioning ternary tree (RPTT)

6172d86

Signed-off-by: Wenhao Lin <[email protected]>

[TestRWRoute] Add CUFR/PartialCUFR tests

0485eb1

Signed-off-by: Eddie Hung <[email protected]>

Revert access modifiers

9a5e513

Signed-off-by: Eddie Hung <[email protected]>

Tidy up, comments, PartialCUFR to repartition when unpreserving

d6e48fb

Signed-off-by: Eddie Hung <[email protected]>

Add a few more comments

c57caaf

Signed-off-by: Eddie Hung <[email protected]>

eddieh-xlnx requested a review from clavin-xlnx August 14, 2024 16:53

This comment was marked as resolved.

Sign in to view

Use ConcurrentHashMap not ThreadLocal due to GC issues

eb6a240

Signed-off-by: Eddie Hung <[email protected]>

eddieh-xlnx force-pushed the dev_RPTT_merge branch from 37ce068 to eb6a240 Compare August 14, 2024 20:53

Merge remote-tracking branch 'upstream/2024.1.2' into dev_RPTT_merge

8635966

eddieh-xlnx changed the base branch from master to 2024.1.2 August 15, 2024 20:48

eddieh-xlnx requested review from clavin-xlnx and removed request for clavin-xlnx August 15, 2024 21:19

clavin-xlnx approved these changes Aug 15, 2024

View reviewed changes

clavin-xlnx deleted the branch Xilinx:2024.1.2 September 4, 2024 17:53

clavin-xlnx closed this Sep 4, 2024

diriLin mentioned this pull request Sep 4, 2024

Add recursive partitioning ternary tree (RPTT) #1055

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add recursive partitioning ternary tree (RPTT) #1049

Add recursive partitioning ternary tree (RPTT) #1049

diriLin commented Aug 14, 2024

eddieh-xlnx commented Aug 14, 2024 •

edited

Loading

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

eddieh-xlnx commented Aug 14, 2024 •

edited

Loading

eddieh-xlnx commented Sep 4, 2024

diriLin commented Sep 4, 2024

Add recursive partitioning ternary tree (RPTT) #1049

Add recursive partitioning ternary tree (RPTT) #1049

Conversation

diriLin commented Aug 14, 2024

eddieh-xlnx commented Aug 14, 2024 • edited Loading

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

eddieh-xlnx commented Aug 14, 2024 • edited Loading

eddieh-xlnx commented Sep 4, 2024

diriLin commented Sep 4, 2024

eddieh-xlnx commented Aug 14, 2024 •

edited

Loading

eddieh-xlnx commented Aug 14, 2024 •

edited

Loading