You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
During my recent studies I found ASSUME to be very slow when simulating a whole year.
One way to improve this is by switching towards daily market clearing instead of hourly, but it still takes a while.
When looking into the code which takes a lot of time I found pandas to often be the case:
Act 1: Profiling Benchmarks
due to various reasons, cProfile does not give good timings when running async code.
More correct timings can be seen using yappi (pip install yappi) - https://github.com/sumerc/yappi
So one can run yappi -o "out.profile" cli.py and then use tuna (pip install tuna) to visualize the profiling result: tuna out.profile.
This gives theses visual charts like shown below.
The results are therefore equally to running assume -s example01a -c base. This run takes 88s on my laptop.
Probably ~20s are spent organizing asyncio-stuff
~60s is spent in pandas
~3s on imports
~ rest on other stuff
calculate_bids boils down to take time in pandas
handling market_feedback spends a lot of time in pandas too - nearly all the site-packages stuff is spend in pandas
writing outputs spends most of its time in pandas too
Though one can not see that much due to the long lines - I could not find a way to remove the absolute paths from the pictures..
Act 2: Alternatives
So I thought how one can replace pandas.
Our requirements includes slicing, indexing by datetime and having multiple series.
After experimenting with modin and dask
I could not use modin as a drop in replacement and dask did not seem like a good solution either, as we spend a lot of time in the initialization of dataframes and not in the heavy lifting.
I came up with good old numpy, which supports slicing. But can only have an array with the same types.
So a datetime index is not possible.
I thought about having a convenience wrapper - something like this:
After all, it turns out, that switching to numpy is at least 40x faster than pandas.
I really hope, that this is also the case when switching the main parts of the simulation to it.
Act 3
Implementation
TBD
The text was updated successfully, but these errors were encountered:
During my recent studies I found ASSUME to be very slow when simulating a whole year.
One way to improve this is by switching towards daily market clearing instead of hourly, but it still takes a while.
When looking into the code which takes a lot of time I found pandas to often be the case:
Act 1: Profiling Benchmarks
due to various reasons, cProfile does not give good timings when running async code.
More correct timings can be seen using
yappi
(pip install yappi) - https://github.com/sumerc/yappiSo one can run
yappi -o "out.profile" cli.py
and then usetuna
(pip install tuna) to visualize the profiling result:tuna out.profile
.This gives theses visual charts like shown below.
The results are therefore equally to running
assume -s example01a -c base
. This run takes 88s on my laptop.Probably ~20s are spent organizing asyncio-stuff
~60s is spent in pandas
~3s on imports
~ rest on other stuff
calculate_bids boils down to take time in pandas
handling market_feedback spends a lot of time in pandas too - nearly all the site-packages stuff is spend in pandas
writing outputs spends most of its time in pandas too
Though one can not see that much due to the long lines - I could not find a way to remove the absolute paths from the pictures..
Act 2: Alternatives
So I thought how one can replace pandas.
Our requirements includes slicing, indexing by datetime and having multiple series.
After experimenting with modin and dask
I could not use modin as a drop in replacement and dask did not seem like a good solution either, as we spend a lot of time in the initialization of dataframes and not in the heavy lifting.
I came up with good old numpy, which supports slicing. But can only have an array with the same types.
So a datetime index is not possible.
I thought about having a convenience wrapper - something like this:
After all, it turns out, that switching to numpy is at least 40x faster than pandas.
I really hope, that this is also the case when switching the main parts of the simulation to it.
Act 3
Implementation
TBD
The text was updated successfully, but these errors were encountered: