-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Random distribution of 2-variable constraints not uniform #161
Comments
Hi @alwilson, thanks for putting the work into creating this way to look at 2d distributions. Thus far, I've primarily been looking at 1d distributions -- and that's difficult enough in text format given a reasonably-sized domain. Again, thanks for putting together this way to visualize PyVSC distributions. Hopefully I'll shortly have some insights on how to improve the results! |
The great part about python is all the built in graphing and data crunching tools! I've seen similar imbalances in my results, but they've been good enough for my uses up to this point. |
- (#161) Adjust swizzling algorithm to improve multi-variable distribution Signed-off-by: Matthew Ballance <[email protected]>
After reviewing the 'swizzling' implementation, I have a better understanding of the hot spots at (min,max) and (max,min). While the variables to swizzle in each solve set are randomly selected, the bits to force in each were being applied sequentially. This increased the probability that few of the bit-force constraints on the second variable would hold, allowing Boolector to repeatedly-select the same solution for the second variable. |
Hello :) Any news on this issue ? |
I reran some distribution tests on 5f79289. I don't believe much has changed with the pyvsc's randomization and swizzler. Here is the triangle distribution from before, and I ran a donut and square one as well. I was curious if those hot spots would show up for constraints without addition, which they do. The donut constraints are just some x,y constraints to make a hole in the middle. Those hot spots still show up. The square constraints don't have an issue, but I think that boils down to rand range calls that don't invoke the swizzler. The square constraints count/hit histogram is the normal distribution that I think we would want the others to have. Triangle constraints (100k samples): Donut constraints (100k samples): Square constraints (100k samples): This reminds me that I've never really seen a good description of what distribution SystemVerilog should return. The closest I've found is 1800-2017 Section 18.5.10 "Variable ordering":
I would think having a uniform distribution across a solution space would require somehow exploring that whole solution space before-hand. That may be possible for simple constraints, but I imagine it's always possible to construct more and more complex constraints until it's impossible to tell what the solution space might be. I wonder if randomizing all variables across their valid ranges once and then checking them against the constraints would be enough get closer to a uniform distribution. Although I think that would require obeying solve-before constraints somehow to get the ordering and distribution right. And it would only really work for simpler constraints. |
Here's an interesting 1d distribution example that explores the swizzler behavior. Pyvsc uses constraints to find the ranges of each variable when randomize is called, but those can change as variables get solved, they could be missing, or much wider than the true range. rand unsigned [32] a
rand unsigned [1] b
constraint ab_c
(b == 0) -> (a < 5); If I constrain new_range=5 : a = [211, 204, 184, 211, 190]
new_range=1000 : a = [119, 195, 181, 198, 307]
new_range=1000000 : a = [120, 117, 118, 129, 516]
new_range=1000000000 : a = [42, 23, 35, 34, 866]
new_range=4294967295 : a = [35, 30, 26, 29, 880] Sourceimport vsc
TRUE_RANGE=5
NUM_SAMPLES=1_000
@vsc.randobj
class my_c(object):
def __init__(self):
self.a = vsc.rand_bit_t(32)
self.b = vsc.rand_bit_t(1)
@vsc.constraint
def ab_c(self):
vsc.solve_order(self.b, self.a)
with vsc.implies(self.b == 0):
self.a < TRUE_RANGE
obj = my_c()
for new_range in [TRUE_RANGE, 1_000, 1_000_000, 1_000_000_000, 2**32-1]:
bins = [0] * TRUE_RANGE
for _ in range(NUM_SAMPLES):
with obj.randomize_with(debug=0):
obj.b == 0
obj.a < new_range
bins[obj.a] += 1
new_range = str(new_range)
print(f'{obj.tname} - {new_range=: <12}: {bins}') I think most of this bias is due to the swizzler switching from randomizing a single bit at a time to 6 bits at a time, and if If you look at the previous 2d examples I've shared with the triangle there are hot spots, and I think those are due to the swizzler randomizing the first variable, then using the same range, Here are some possible improvements I've thought of and implemented locally:
I implemented the each of these features with env vars to control them for now. Here's that 1d example with the cached min/max calculations. On my (older) laptop this reduces runtime by 15% (13.5s to 11.8s), which I assume is b/c the swizzler has fewer bits try. (3 bits vs 5x6-bit fields?) $ PYVSC_CACHE_RANGE=1 ./161_line_clean.py
my_c - new_range=5 : [172, 196, 204, 219, 209]
my_c - new_range=1000 : [216, 174, 218, 204, 188]
my_c - new_range=1000000 : [182, 211, 198, 186, 223]
my_c - new_range=1000000000 : [206, 200, 192, 211, 191]
my_c - new_range=4294967295 : [218, 196, 212, 202, 172] Here are my changes. I'll hold off making a PR until I get some feedback and also figure out making the boolector model dumping cleaner. |
Thanks, @alwilson, for all the investigation and analysis here. I'll need some time to think this over. One question I had on the conclusions was on this point:
What sort of constraint problems did you find were most helped by this approach? I'm guessing it didn't improve the |
Hey @mballance, for the (b == 0) -> (a < 5)
(b == 1) -> (a < 30) I've tried chaining constraints together before and I don't think those get broken up. I guess it would be trivial for pyvsc to break down AND constraints though. When I first started exploring this I know I think it would help most when a range isn't specified, as with I should go through the unit tests and the risc-dv and see if there's any real world examples there this would help. I'll admit that I don't think this will help all cases. Most of this is driven by me exploring that initial 2d case I came up with, so maybe this is tailored a little close to what I've explored so far. ;) |
[ Using pyvsc-0.7.6 ]
I've been curious about how uniform the random distributions for SV and pyvsc constraints are and cooked up some 2D examples to help visualize the distribution with heatmaps. The heatmaps show pyvsc making some interesting patterns rather than being flat and I was curious whether that was expected. The same constraints in SV appear to be fairly uniform. The heatmap goes from 0/black to the max value / white.
The constraints for a triangle, used in the examples below:
Triangle using pyvsc (100k randomizations, ranging from 14 to 831 hits per bucket):
Triangle using SV (100k randomizations, ranging from 49 to 105 hits per bucket):
Source for the above SV and pyvsc examples of a simple triangle constraint (x + y <= N). The dist.py script is pretty simple and has two functions to choose from for either a live animation of the randomizations or a performance measurement. The SV file and plot.py are just the same triangle and circle constraints I was testing out.
https://github.com/alwilson/pyvsc_testing
Note: I've also found that SV constraint solving is very fast (100k simple 2-var randomizations in ~2s) and that pyvsc is much slower (~2000s), which I should say I'm not surprised at, although I'd love to know better why and the differences between the solvers/generators.
I also tried putting a "hole" in the triangle constraints, which you can see in my source, but chaining constraints with or/and doesn't seem to do the same thing as SV does. Not sure if it's a pyvsc bug or if there's a different why to write out those constraints.
Thanks for developing this project! I've been having fun with it. >:)
The text was updated successfully, but these errors were encountered: