Replies: 5 comments 1 reply
-
Hi ay, I really appreciate the time you have taken to look into pinkfish and your kind words regarding the code base. I'm sorry it isn't able at this time to meet the needs that you have identified. Please understand that I'm essentially the lone developer and work on it as a hobby outside of my full time profession and family. I developed it for my own use and usually only add new features when I need them. That really is the only time I have to give it. For my style of investing (short term to medium term ETFs, it does everything I need). I understand this means it will have limited appeal and a relatively small user base. That's fine. I wanted to share what I have done in case anyone else had the same requirements. That said, I will look over the points you have made when I get the chance. My quick reading of what you have said, I think you have made some valid issues that either I haven't considered or wasn't aware of. Thanks for making me aware of these issues. Farrell |
Beta Was this translation helpful? Give feedback.
-
I certainly understand. Some things I have been able to adjust easily with minimal intervention. But I wanted to avoid anything that might make future upgrades difficult. If I were better at writing python than I actually am, I’d offer to make some code contributions since I have a simple data handling design in mind. If I think a bit more about it over the next day or two, maybe I will find some easier adjustments than I saw over the weekend. It is a lot more pleasant to work with pinkfish, especially for research purposes, than with the complex frameworks.
Best regards
arthur
From: Farrell Aultman ***@***.***>
Sent: Monday, July 26, 2021 12:40 AM
To: fja05680/pinkfish ***@***.***>
Cc: EcoFin ***@***.***>; Author ***@***.***>
Subject: Re: [fja05680/pinkfish] Need facility for using data sources other than yfinance/Yahoo (#43)
Hi ay,
I really appreciate the time you have taken to look into pinkfish and your kind words regarding the code base. I'm sorry it isn't able at this time to meet the needs that you have identified. Please understand that I'm essentially the lone developer and work on it as a hobby outside of my full time profession and family. I developed it for my own use and usually only add new features when I need them. That really is the only time I have to give it. For my style of investing (short term to medium term ETFs, it does everything I need). I understand this means it will have limited appeal and a relatively small user base. That's fine. I wanted to share what I have done in case anyone else had the same requirements.
That said, I will look over the points you have made when I get the chance. My quick reading of what you have said, I think you have made some valid issues that either I haven't considered or wasn't aware of. Thanks for making me aware of these issues.
Farrell
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub <#43 (comment)> , or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB3HOERGJGMHC6O5XZ7BAATTZTRILANCNFSM5A7DDECA> .
|
Beta Was this translation helpful? Give feedback.
-
If you can describe what.is that you have in mind regarding data handling
that would be helpful.
…On Mon, Jul 26, 2021, 8:32 AM EcoFin ***@***.***> wrote:
I certainly understand. Some things I have been able to adjust easily with
minimal intervention. But I wanted to avoid anything that might make future
upgrades difficult. If I were better at writing python than I actually am,
I’d offer to make some code contributions since I have a simple data
handling design in mind. If I think a bit more about it over the next day
or two, maybe I will find some easier adjustments than I saw over the
weekend. It is a lot more pleasant to work with pinkfish, especially for
research purposes, than with the complex frameworks.
Best regards
arthur
From: Farrell Aultman ***@***.***>
Sent: Monday, July 26, 2021 12:40 AM
To: fja05680/pinkfish ***@***.***>
Cc: EcoFin ***@***.***>; Author ***@***.***>
Subject: Re: [fja05680/pinkfish] Need facility for using data sources
other than yfinance/Yahoo (#43)
Hi ay,
I really appreciate the time you have taken to look into pinkfish and your
kind words regarding the code base. I'm sorry it isn't able at this time to
meet the needs that you have identified. Please understand that I'm
essentially the lone developer and work on it as a hobby outside of my full
time profession and family. I developed it for my own use and usually only
add new features when I need them. That really is the only time I have to
give it. For my style of investing (short term to medium term ETFs, it does
everything I need). I understand this means it will have limited appeal and
a relatively small user base. That's fine. I wanted to share what I have
done in case anyone else had the same requirements.
That said, I will look over the points you have made when I get the
chance. My quick reading of what you have said, I think you have made some
valid issues that either I haven't considered or wasn't aware of. Thanks
for making me aware of these issues.
Farrell
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub <
#43 (comment)> ,
or unsubscribe <
https://github.com/notifications/unsubscribe-auth/AB3HOERGJGMHC6O5XZ7BAATTZTRILANCNFSM5A7DDECA>
.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#43 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACD3KHSNHKPXG2YERFJJ7QTTZVIWVANCNFSM5A7DDECA>
.
|
Beta Was this translation helpful? Give feedback.
-
Farrell,
Maybe I’ll try to do a little diagram. But I imagine a 2D timeseries dataframe like the current ts with “symbol identified” columns. I would want to be able to fill that df either from yfinance/Yahoo downloads or from any other datasource either directly (into the df) or indirectly (via read_csv). There is no particular reason not to use the yahoo column names for price data. If someone (me, say) wants to provision from another datasource, it would be my business to provide price data with the correct names.
It might make sense to have two dataframes, one provisioned “directly” the other from csv, that could just be concatenated.
Whether it was easier to have one price data df and one “other (fundamental/economic/alternative) data” df, or put it all in one doesn’t matter too much. Somewhere in the current code I found that it was already set up to let users add their own indicators. That is how I would expect to use the second df in fact: generate indicators/signals from it to add to ts.
In effect, I’d put all the data handling code that prepares the ts dataframe into one level and then just pass ts to the backtester.
I know that is a bit easier to say than to do. I pulled the data provisioning code out of one of my equity/ETF screening gizmos to try to deliver a ts dataframe, thinking I needed to modularize it anyway. Of course, I then was reminded that there were actually 3 parts to that process: prices, metadata and some fundamental data. For backtesting we could skip the metadata for simplicity.
I could deliver the ts easily enough. But then ran into problems with select_timperiod and Benchmark that I wasn’t expecting.
Does that help at all?
Regards
ay
From: Farrell Aultman ***@***.***>
Sent: Monday, July 26, 2021 8:50 AM
To: fja05680/pinkfish ***@***.***>
Cc: EcoFin ***@***.***>; Author ***@***.***>
Subject: Re: [fja05680/pinkfish] Need facility for using data sources other than yfinance/Yahoo (#43)
If you can describe what.is that you have in mind regarding data handling
that would be helpful.
On Mon, Jul 26, 2021, 8:32 AM EcoFin ***@***.***> wrote:
I certainly understand. Some things I have been able to adjust easily with
minimal intervention. But I wanted to avoid anything that might make future
upgrades difficult. If I were better at writing python than I actually am,
I’d offer to make some code contributions since I have a simple data
handling design in mind. If I think a bit more about it over the next day
or two, maybe I will find some easier adjustments than I saw over the
weekend. It is a lot more pleasant to work with pinkfish, especially for
research purposes, than with the complex frameworks.
Best regards
arthur
From: Farrell Aultman ***@***.***>
Sent: Monday, July 26, 2021 12:40 AM
To: fja05680/pinkfish ***@***.***>
Cc: EcoFin ***@***.***>; Author ***@***.***>
Subject: Re: [fja05680/pinkfish] Need facility for using data sources
other than yfinance/Yahoo (#43)
Hi ay,
I really appreciate the time you have taken to look into pinkfish and your
kind words regarding the code base. I'm sorry it isn't able at this time to
meet the needs that you have identified. Please understand that I'm
essentially the lone developer and work on it as a hobby outside of my full
time profession and family. I developed it for my own use and usually only
add new features when I need them. That really is the only time I have to
give it. For my style of investing (short term to medium term ETFs, it does
everything I need). I understand this means it will have limited appeal and
a relatively small user base. That's fine. I wanted to share what I have
done in case anyone else had the same requirements.
That said, I will look over the points you have made when I get the
chance. My quick reading of what you have said, I think you have made some
valid issues that either I haven't considered or wasn't aware of. Thanks
for making me aware of these issues.
Farrell
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub <
#43 (comment)> ,
or unsubscribe <
https://github.com/notifications/unsubscribe-auth/AB3HOERGJGMHC6O5XZ7BAATTZTRILANCNFSM5A7DDECA>
.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#43 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACD3KHSNHKPXG2YERFJJ7QTTZVIWVANCNFSM5A7DDECA>
.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub <#43 (comment)> , or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB3HOEUUMZINFFUJECEUXXDTZVKX5ANCNFSM5A7DDECA> .
|
Beta Was this translation helpful? Give feedback.
-
Converting to a discussion is a good idea.
I understand that if yahoo data is sufficient then there’s no particular need or interest in altering the existing data provisioning setup.
I have messed around with root directories and non-yahoo csv files. It turned out to be more trouble than I expected.
Resetting base_dir works fine. Resetting dirname, not so much.
“They should use the same data cache”. No kidding … that’s what I would have expected too.
I will give it another try using a slightly different approach to see if I can make Benchmark work properly.
An easy to use, trouble-free interface to yahoo data is of course desirable. That would remain with a more flexible data-provisioning layer.
If I get something working reliably, I’ll pass on the procedure! I expected it to be as straightforward as you suggest … maybe I just made a bad decision somewhere along the way.
Regards
ay
From: Farrell Aultman ***@***.***>
Sent: Tuesday, July 27, 2021 11:00 AM
To: fja05680/pinkfish ***@***.***>
Cc: EcoFin ***@***.***>; Author ***@***.***>
Subject: Re: [fja05680/pinkfish] Need facility for using data sources other than yfinance/Yahoo (#44)
Arthur,
I have converted this issue to a discussion, since it contains a listing of several issues. I will create some issues from it.
In the mean time, here are some quick responses to the issues you raise.
"The data handling in pinkfish is too tightly bound to yfinance/Yahoo. It would be a valuable enhancement to be able to deliver data easily to pinkfish from other data sources. Lots of individual investors/students/researchers will be using non-yahoo data sources, many housed in databases, not csv files...."
I would like to do this. I just don't know how to proceed since I only use yahoo finance myself. Probably that's what most non-professionals use, so I want pinkfish to preserve the 'out of the box' experience for them. I've considered purchasing data from norgate to backtest stocks, I've just had other priorities. If I did that, I'd have a better idea of what you have having problems with. My suggestion is to write from your DB to a csv file in the data cashe, then set use_cache=True in fetch_timeseries(). It shouldn't fetch from the internet. Your column names will have to match the column names from yahoo finance. I see you tried that though and had issues.
"a generic interface to the ts data frame would make pinkfish far more usable
exclusive reliance on a pinkfish data cache is not the best approach; users do not want to replicate existing datastores; conversely, data downloaded to serve pinkfish should be generally available; (i.e. ideally, you just provision a pinkfish experiment out of an existing database; the alternative is to point pinkfish at an existing csv repository )"
You can point pinkfish to an existing csv repository. You need to specify your 'root' data cache folder within the pinkfish conf file. Then, specify the subdirectory with the 'data' argument in select_timeseries. Please take a look at the installation instructions again. I haven't tried this on Windows, but it should be handling the path correctly.
"select_timeperiod seems to be more problematic than fetch_timeseries
I have no clue at all why fetch_timeseries in benchmark doesn't seem respect the use_cache setting
there should also be a single spot where we could specify the default path to the pinkfish data cache/folder"
benchmark always tries to use the data cache. It only downloads from yahoo finance if it can't find the timeseries for a symbol in the data cache. This part is correct, as it shouldn't be allowed to fetch a newer timeseries than the one used in the backtest. You have identified a valid issue though, The benchmark doesn't use the data cache that's specified in fetch_timeseries() for the non-benchmark backtest. They should use the same data cache.
"btw: backtrader already discovered this and offers 3 or 4 generic data interfaces. At the other end of the spectrum, zipline's data handling is almost impossible. Easy, flexible pandas-based data handling could really differentiate pinkfish."
Again, I want to do this. Just need to identify the right path forward.
—
You are receiving this because you authored the thread.
Reply to this email directly, <#44 (reply in thread)> view it on GitHub, or <https://github.com/notifications/unsubscribe-auth/AB3HOEVFOFSKJHJ7BR2CCW3TZ3CX5ANCNFSM5BCNSTOA> unsubscribe.
|
Beta Was this translation helpful? Give feedback.
-
The data handling in pinkfish is too tightly bound to yfinance/Yahoo. It would be a valuable enhancement to be able to deliver data easily to pinkfish from other data sources. Lots of individual investors/students/researchers will be using non-yahoo data sources, many housed in databases, not csv files.
It shouldn't be hard, because all we really need (in the first instance) is to deliver the basic ts dataframe. I ran an experiment using my Norgate database since I already know how to make it deliver data to zipline and backtrader. I have now given up.
I thought I would just write the df out of the Norgate db. That turned out to be problematic so I tried copying a Norgate csv into the cache directory. Here are the problems:
pinkfish has yahoo column names hard-coded. But I do all the data configuration and adjustment in the database before getting anywhere near a backtester. Column names are not the same and I have several timeseries columns pinkfish doesn't know about a priori but that I might want to use; no need to throw them away. To try to move forward, I reconfigured my csv.
The showstopper seems to be the fetch_timeseries and select_timeperiod in Benchmark. Benchmark just doesn't want to use the cache at all and seems to insist on trying to download from yahoo. None of the small code patches I tried have solved the problem.
In short,
Once again, I really enjoy pinkfish and appreciate the elegance of the codebase! But I cannot use it if I'm locked into yahoo data.
btw: backtrader already discovered this and offers 3 or 4 generic data interfaces. At the other end of the spectrum, zipline's data handling is almost impossible. Easy, flexible pandas-based data handling could really differentiate pinkfish.
best regards
ay
Beta Was this translation helpful? Give feedback.
All reactions