Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Topic proposals for upcoming lessons #23

Open
chrisconlan opened this issue Feb 8, 2021 · 16 comments
Open

Topic proposals for upcoming lessons #23

chrisconlan opened this issue Feb 8, 2021 · 16 comments

Comments

@chrisconlan
Copy link
Member

  • Alex: FPGAs for quant trading (and how it relates to Gamestop)
@emican
Copy link
Contributor

emican commented Feb 8, 2021

Eric: How you utilize your new high performance Dell workstation.

@joe-wojniak
Copy link
Contributor

joe-wojniak commented Feb 9, 2021

Are Markhov-chain process models useful for predicting time-series data?-> Wiener Process-> Black-Scholes-Merton model https://en.wikipedia.org/wiki/Black%E2%80%93Scholes_model

@chrisconlan
Copy link
Member Author

chrisconlan commented Feb 9, 2021

Are Markhov-chain process models useful for predicting time-series data?

@joe-wojniak I encourage you to research Markov chains a little bit on your own and try to answer that question. Are there any scenarios in finance where you want to model discrete transitions between certain states? And the probabilities of those transitions? Can you imagine any scenarios where such a model would provide an improvement over other types of models?

@alexpryszlakh
Copy link
Contributor

From the book "Common Stocks and Uncommon Profits"

Question 7: Does the company have outstanding labor and personnel relations?
Question 8: Does the company have outstanding executive relations?
Question 9: Does the company have depth to its management?

Can you just web scrape for people's reviews on Glassdoor and discussion forums?

@chrisconlan
Copy link
Member Author

@alexpryszlakh Great book and great question.

@joe-wojniak
Copy link
Contributor

joe-wojniak commented Feb 11, 2021 via email

@emican
Copy link
Contributor

emican commented Feb 11, 2021

https://www.crunchbase.com/ can be a source of data.

@joe-wojniak
Copy link
Contributor

Do we want to investigate whether py-polars is faster than pandas?

Blog on the topic: https://medium.com/analytics-vidhya/is-pypolars-the-new-alternative-to-pandas-916400f03fd7

@chrisconlan
Copy link
Member Author

@joe-wojniak What do you think about py-polars? We can talk about lazy evaluation and query optimization, but it is very much a computer science and a database design topic.

@joe-wojniak
Copy link
Contributor

joe-wojniak commented Feb 24, 2021 via email

@joe-wojniak
Copy link
Contributor

joe-wojniak commented Feb 24, 2021 via email

@chrisconlan
Copy link
Member Author

@joe-wojniak Let's talk about lazy evaluation and query optimization at our next lesson. I want you to understand why there aren't necessarily any intrinsic speed gains buried within it, and why this library might be slower overall.

@joe-wojniak
Copy link
Contributor

joe-wojniak commented Feb 24, 2021 via email

@chrisconlan
Copy link
Member Author

chrisconlan commented Feb 25, 2021

Update @joe-wojniak

Pypolars philosophy relies on this design pattern:

lazy_df = lazy_df.filter(col("Rain") > (lit(120))) # Nothing happens
lazy_df = lazy_df.filter(col("Temp") > (lit(78))) # Nothing happens
lazy_df = lazy_df.filter(col("Earthquakes") > (lit(2))) # Nothing happens

# All of the above filter operations happen at once here
lazy_df.collect() 

Whereas pandas would require this.

# Method 1 (slow way)
df = df[df.rain > 120] # Filter shrinks df
df = df[df.temp > 78] # Filter shrinks df
df = df[df.earthquakes > 2] # Filter shrinks df

# Method 2 (fast way)
df = df[df.rain > 120 & df.temp > 78 & df.earthquakes] 

Theoretically, the Pypolars method above would be just as fast as Pandas Method number 2, because of lazy evaluation. Does this provide a speedup? No, not necessarily, and likely not at all. It just changes the way you write code, and it changes the way you optimize code. Is it worth it at this point to explore Pypolars? I don't think so. I would need to see the author of Pypolars show some provable speedups that go above and beyond lazy evaluation to even consider.

Further, Pypolars seems to advertise built-in parallelization. I don't like this at all. Serious engineers need explicit control of parallelization. Python in-memory parallelization sucks in general, because it requires pickling and unpickling of code, which isn't fully supported throughout the language. I can guarantee that running any complex parallel .apply function in Pypolars would cause unsolvable errors. I prefer to use Unix forking to parallelize my work, because it is the only thing that is sufficiently stable in Python.

@joe-wojniak
Copy link
Contributor

I thought this article on adaptive filtering was interesting. It explores an application for predicting stock price:
https://towardsai.net/p/machine-learning/time-series-prediction-using-adaptive-filtering

@chrisconlan
Copy link
Member Author

Interesting post. I don't know anything about this method.

It could be an interesting, albeit complex, technical feature for an ML model. Definitely couldn't work as a standalone, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants