Skip to content

Commit

Permalink
Merge pull request #1 from MilesCranmer/master
Browse files Browse the repository at this point in the history
Getting up to date
  • Loading branch information
DhananjayAshok authored Jan 17, 2021
2 parents 70dedf9 + 1175632 commit 7be8652
Show file tree
Hide file tree
Showing 10 changed files with 133 additions and 92 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
.dataset*.jl
.hyperparams*.jl
*.csv
*.bkup
performance*txt
*.out
trials*
Expand Down
34 changes: 23 additions & 11 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -1,20 +1,32 @@
language: julia
os: linux
dist: bionic

julia:
- 1

addons:
apt:
packages:
- python3-pip
- python3-setuptools
jobs:
include:
- name: "Linux"
os: linux
dist: bionic
before_install: sudo apt-get -y install python3-pip python3-setuptools
env: PY=python3 SETUPPREFIX="--user"
- name: "macOS"
os: osx
before_install: python3 --version; pip3 --version; sw_vers
env: PY=python3
- name: "Windows"
os: windows
before_install:
- choco install python --version 3.8.0
- python -m pip install --upgrade pip
env: PATH=/c/Python38:/c/Python38/Scripts:$PATH PY=python

install: pip3 install --upgrade pip

before_script:
- export PATH=$HOME/.local/bin:$PATH
- julia --color=yes -e 'using Pkg; pkg"add Optim; add SpecialFunctions; precompile;"'

script:
- julia --color=yes -e 'import Pkg; Pkg.add("Optim"); Pkg.add("SpecialFunctions")'
- ./test/travis.sh
- pip3 install numpy pandas
- $PY setup.py install $SETUPPREFIX
- PATH=$HOME/.local/bin:$PATH $PY test/test.py

13 changes: 9 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# [PySR.jl](https://github.com/MilesCranmer/PySR)

(pronounced like *py* as in python, and then *sur* as in surface)

[![Documentation Status](https://readthedocs.org/projects/pysr/badge/?version=latest)](https://pysr.readthedocs.io/en/latest/?badge=latest)
[![PyPI version](https://badge.fury.io/py/pysr.svg)](https://badge.fury.io/py/pysr)
[![Build Status](https://travis-ci.com/MilesCranmer/PySR.svg?branch=master)](https://travis-ci.com/MilesCranmer/PySR)
Expand Down Expand Up @@ -47,10 +49,11 @@ then instructions for [mac](https://julialang.org/downloads/platform/#macos)
and [linux](https://julialang.org/downloads/platform/#linux_and_freebsd).
(Don't use the `conda-forge` version; it doesn't seem to work properly.)
Then, at the command line,
install the `Optim` and `SpecialFunctions` packages via:
install and precompile the `Optim` and `SpecialFunctions`
packages via:

```bash
julia -e 'import Pkg; Pkg.add("Optim"); Pkg.add("SpecialFunctions")'
julia -e 'using Pkg; pkg"add Optim; add SpecialFunctions; precompile;"'
```

For python, you need to have Python 3, numpy, sympy, and pandas installed.
Expand All @@ -73,8 +76,10 @@ y = 2*np.cos(X[:, 3]) + X[:, 0]**2 - 2

# Learn equations
equations = pysr(X, y, niterations=5,
binary_operators=["plus", "mult"],
unary_operators=["cos", "exp", "sin"])
binary_operators=["plus", "mult"],
unary_operators=[
"cos", "exp", "sin", #Pre-defined library of operators (see https://pysr.readthedocs.io/en/latest/docs/operators/)
"inv(x) = 1/x"]) # Define your own operator! (Julia syntax)

...# (you can use ctl-c to exit early)

Expand Down
9 changes: 7 additions & 2 deletions TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,19 +58,23 @@
- [x] Consider printing output sorted by score, not by complexity.
- [x] Increase max complexity slowly over time up to the actual max.
- [x] Record density over complexity. Favor equations that have a density we have not explored yet. Want the final density to be evenly distributed.
- [x] Do printing from Python side. Then we can do simplification and pretty-printing.
- [x] Sympy printing
- [ ] Sort these todo lists by priority

## Feature ideas

- [ ] Do printing from Python side. Then we can do simplification and pretty-printing.
- [ ] Other default losses (e.g., abs, other likelihoods, or just allow user to pass this as a string).
- [ ] Other dtypes available
- [ ] NDSA-II
- [ ] Cross-validation
- [ ] Sympy printing
- [ ] Hierarchical model, so can re-use functional forms. Output of one equation goes into second equation?
- [ ] Add function to plot equations
- [ ] Refresh screen rather than dumping to stdout?
- [ ] Add ability to save state from python
- [ ] Additional degree operators?
- [ ] Multi targets (vector ops). Idea 1: Node struct contains argument for which registers it is applied to. Then, can work with multiple components simultaneously. Though this may be tricky to get right. Idea 2: each op is defined by input/output space. Some operators are flexible, and the spaces should be adjusted automatically. Otherwise, only consider ops that make a tree possible. But will need additional ops here to get it to work. Idea 3: define each equation in 2 parts: one part that is shared between all outputs, and one that is different between all outputs. Maybe this could be an array of nodes corresponding to each output. And those nodes would define their functions.
- Much easier option: simply flatten the output vector, and set the index as another input feature. The equation learned will be a single equation containing indices as a feature.
- [ ] Tree crossover? I.e., can take as input a part of the same equation, so long as it is the same level or below?
- [ ] Create flexible way of providing "simplification recipes." I.e., plus(plus(T, C), C) => plus(T, +(C, C)). The user could pass these.
- [ ] Consider allowing multi-threading turned off, for faster testing (cache issue on travis). Or could simply fix the caching issue there.
Expand Down Expand Up @@ -100,6 +104,7 @@

- [ ] How hard is it to turn the recursive array evaluation into a for loop?
- [ ] Try defining a binary tree as an array, rather than a linked list. See https://stackoverflow.com/a/6384714/2689923
- in array branch
- [ ] Add true multi-node processing, with MPI, or just file sharing. Multiple populations per core.
- Ongoing in cluster branch
- [ ] Performance: try inling things?
Expand Down
52 changes: 18 additions & 34 deletions hyperparamopt.py → benchmarks/hyperparamopt.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,58 +34,46 @@ def run_trial(args):
"""

print("Running on", args)
for key in 'niterations npop'.split(' '):
args[key] = int(args[key])


total_steps = 10*100*1000
niterations = args['niterations']
npop = args['npop']
if niterations == 0 or npop == 0:
print("Bad parameters")
return {'status': 'ok', 'loss': np.inf}

args['ncyclesperiteration'] = int(total_steps / (niterations * npop))
args['niterations'] = 100
args['npop'] = 100
args['ncyclesperiteration'] = 1000
args['topn'] = 10
args['parsimony'] = 1e-3
args['parsimony'] = 0.0
args['useFrequency'] = True
args['annealing'] = True

if args['npop'] < 20 or args['ncyclesperiteration'] < 3:
print("Bad parameters")
return {'status': 'ok', 'loss': np.inf}


args['weightDoNothing'] = 1.0

maxTime = 30
ntrials = 2
equation_file = f'.hall_of_fame_{np.random.rand():f}.csv'
ntrials = 3

with temp_seed(0):
X = np.random.randn(100, 5)*3
X = np.random.randn(100, 10)*3

eval_str = ["np.sign(X[:, 2])*np.abs(X[:, 2])**2.5 + 5*np.cos(X[:, 3]) - 5",
"np.sign(X[:, 2])*np.abs(X[:, 2])**3.5 + 1/(np.abs(X[:, 0])+1)",
eval_str = [
"np.sign(X[:, 2])*np.abs(X[:, 2])**2.5 + 5*np.cos(X[:, 3]) - 5",
"np.exp(X[:, 0]/2) + 12.0 + np.log(np.abs(X[:, 0])*10 + 1)",
"1.0 + 3*X[:, 0]**2 - 0.5*X[:, 0]**3 + 0.1*X[:, 0]**4",
"(np.exp(X[:, 3]) + 3)/(np.abs(X[:, 1]) + np.cos(X[:, 0]) + 1.1)"]
"(np.exp(X[:, 3]) + 3)/(np.abs(X[:, 1]) + np.cos(X[:, 0]) + 1.1)",
"X[:, 0] * np.sin(2*np.pi * (X[:, 1] * X[:, 2] - X[:, 3] / X[:, 4])) + 3.0"
]

print(f"Starting", str(args))
try:
trials = []
for i in range(3, 6):
for i in range(len(eval_str)):
print(f"Starting test {i}")
for j in range(ntrials):
print(f"Starting trial {j}")
trial = pysr.pysr(
test=f"simple{i}",
y = eval(eval_str[i])
trial = pysr.pysr(X, y,
procs=4,
populations=20,
binary_operators=["plus", "mult", "pow", "div"],
unary_operators=["cos", "exp", "sin", "loga", "abs"],
equation_file=equation_file,
timeout=maxTime,
unary_operators=["cos", "exp", "sin", "logm", "abs"],
maxsize=25,
verbosity=0,
constraints={'pow': (-1, 1)},
**args)
if len(trial) == 0: raise ValueError
trials.append(
Expand All @@ -109,8 +97,6 @@ def run_trial(args):


space = {
'niterations': hp.qlognormal('niterations', np.log(10), 1.0, 1),
'npop': hp.qlognormal('npop', np.log(100), 1.0, 1),
'alpha': hp.lognormal('alpha', np.log(10.0), 1.0),
'fractionReplacedHof': hp.lognormal('fractionReplacedHof', np.log(0.1), 1.0),
'fractionReplaced': hp.lognormal('fractionReplaced', np.log(0.1), 1.0),
Expand All @@ -126,8 +112,6 @@ def run_trial(args):

################################################################################



def merge_trials(trials1, trials2_slice):
"""Merge two hyperopt trials objects
Expand Down
2 changes: 1 addition & 1 deletion docs/options.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ These are described below
The program will output a pandas DataFrame containing the equations,
mean square error, and complexity. It will also dump to a csv
at the end of every iteration,
which is `hall_of_fame.csv` by default. It also prints the
which is `hall_of_fame_{date_time}.csv` by default. It also prints the
equations to stdout.

## Operators
Expand Down
27 changes: 19 additions & 8 deletions julia/sr.jl
Original file line number Diff line number Diff line change
Expand Up @@ -1086,7 +1086,12 @@ function fullRun(niterations::Integer;
end
println("Started!")
cycles_complete = npopulations * niterations
curmaxsize += 1
if warmupMaxsize != 0
curmaxsize += 1
if curmaxsize > maxsize
curmaxsize = maxsize
end
end

last_print_time = time()
num_equations = 0.0
Expand Down Expand Up @@ -1212,15 +1217,19 @@ function fullRun(niterations::Integer;
deleteat!(equation_speed, 1)
end
average_speed = sum(equation_speed)/length(equation_speed)
@printf("\n")
@printf("Cycles per second: %.3e\n", round(average_speed, sigdigits=3))
@printf("Hall of Fame:\n")
@printf("-----------------------------------------\n")
@printf("%-10s %-8s %-8s %-8s\n", "Complexity", "MSE", "Score", "Equation")
curMSE = baselineMSE
@printf("%-10d %-8.3e %-8.3e %-.f\n", 0, curMSE, 0f0, avgy)
lastMSE = curMSE
lastComplexity = 0
if verbosity > 0
@printf("\n")
@printf("Cycles per second: %.3e\n", round(average_speed, sigdigits=3))
cycles_elapsed = npopulations * niterations - cycles_complete
@printf("Progress: %d / %d total iterations (%.3f%%)\n", cycles_elapsed, npopulations * niterations, 100.0*cycles_elapsed/(npopulations*niterations))
@printf("Hall of Fame:\n")
@printf("-----------------------------------------\n")
@printf("%-10s %-8s %-8s %-8s\n", "Complexity", "MSE", "Score", "Equation")
@printf("%-10d %-8.3e %-8.3e %-.f\n", 0, curMSE, 0f0, avgy)
end

for size=1:actualMaxsize
if hallOfFame.exists[size]
Expand All @@ -1246,7 +1255,9 @@ function fullRun(niterations::Integer;
delta_c = size - lastComplexity
delta_l_mse = log(curMSE/lastMSE)
score = convert(Float32, -delta_l_mse/delta_c)
@printf("%-10d %-8.3e %-8.3e %-s\n" , size, curMSE, score, stringTree(member.tree))
if verbosity > 0
@printf("%-10d %-8.3e %-8.3e %-s\n" , size, curMSE, score, stringTree(member.tree))
end
lastMSE = curMSE
lastComplexity = size
end
Expand Down
Loading

0 comments on commit 7be8652

Please sign in to comment.