-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add T018–T020 to clean three ADB ATO data sets #80
base: main
Are you sure you want to change the base?
Add T018–T020 to clean three ADB ATO data sets #80
Conversation
converting ipynb to py to push into transport energy
Hi @serahrashmi —thank you very much for your contributions to the iTEM Open Data code! We appreciate that members of the community, such as yourself, make efforts to expand the scope and quality of the data included in this shared resource. In this review, I will do a few things:
To be clear, I understand from you and @soniayeh that the course in which you did this work is coming to an end, so you may not have dedicated time to do (2) or (3). So those TODOs are mentioned without saying who will do them or when—it could be you, me, or someone else; sooner or later. What is added
TODOs to merge the branch
TODOs for follow-up
|
Hi Paul,
Thank you for the very detailed email.
hope you are doing well.
We have completed our course and have submitted our laptops to the school.
In my case I’m yet to make alternative to another laptop once I get that
I’ll be able to work on it as much as possible.
I’ll get back to you as and when I find a solution to my situation.
Thank you for your understanding.
regards
Serah
…On Mon, 11 Dec 2023 at 1:33 AM, Paul Natsuo Kishimoto < ***@***.***> wrote:
Hi @serahrashmi <https://github.com/serahrashmi> —thank you very much for
your contributions to the iTEM Open Data code! We appreciate that members
of the community, such as yourself, make efforts to expand the scope and
quality of the data included in this shared resource.
In this review, I will do a few things:
1. Summarize what I think is being added by the commits on this
branch. (This is information that should go in the description of the pull
request. If I am guessing incorrectly, please correct me by providing the
correct explanation.)
2. Identify TODOs to merge this branch/PR.
3. Identify TODOs for further clean-up (after the branch is merged).
To be clear, I understand from you and @soniayeh
<https://github.com/soniayeh> that the course in which you did this work
is coming to an end, so you may not have dedicated time to do (2) or (3).
So those TODOs are mentioned without saying *who* will do them or *when*—it
could be you, me, or someone else; sooner or later.
What is added
- Three data-cleaning modules, each cleaning one data flow from the
Asian Development Bank (ADB) Asian Transport Outlook (ATO); I guess the
2023 edition:
- item.historical.T018 → TAS-FRA-004(2)
- item.historical.T019 → TAS-FRA-007(3)
- item.historical.T020 → TAS-VEP-018
- Similar (identical) files in item/serah/…
- Files in item/config:
- regions_roadmap.yaml —appears similar to this file
<https://github.com/transportenergy/metadata/blob/main/model/roadmap/regions.yaml>
.
- sources.yaml —appears to contain entries corresponding to the 3
added modules, in similar form to this file
<https://github.com/transportenergy/metadata/blob/main/historical/sources.yaml>
.
TODOs to merge the branch
-
Deduplicate the code in the 3 added modules. For instance, the files
T018.py and T019.py are about around 190 lines, but all except a few of
these lines are identical in both files; this makes it difficult to
identify which steps are common versus specific to each input data flow.
One way to achieve this would be to define the class ItemTransformer
in a single file, then import it in each of the 3 modules, setting some
different parameters or creating a subclass.
Also delete the duplicate files in item/serah/….
-
Provide a single entry-point in each module. Existing data modules
like T000
<https://github.com/transportenergy/database/blob/main/item/historical/T000.py>
do not execute any code when they are imported; they instead provide a
function like process() that triggers processing of the data. In the
current branch, this could call the method item_transformer.execute().
-
Create a pull request in transportenergy/metadata
<https://github.com/transportenergy/metadata> that makes the necessary
additions to sources.yaml, currently contained in a duplicate file on this
branch.
-
Add documentation, for instance in the docstring at the top of each
module. Explain, at least, which specific data are used: at what URL can
the original input data be found?
-
Ensure the tests pass. This can be started by rebasing the branch on
transportenergy:main.
TODOs for follow-up
- Consider overlap with the transport_data.adb code maintained by the
Transport Data Commons, here
<https://github.com/transport-data/tools/blob/main/transport_data/adb/__init__.py>;
make use of the TDC code where possible, and keep the features specific to
iTEM Open Data.
- Consider overlap with e.g. the class AtoWorkbook added in #78
<#78>; merge common
actions to common code.
- Add the new modules to the test suite. item.tests.test_historical
<https://github.com/transportenergy/database/blob/main/item/tests/test_historical.py>
contains tests of the existing (T000–T010, T012) data modules, and these
tests should be expanded to test the modules added in this PR.
—
Reply to this email directly, view it on GitHub
<#80 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A5J5II7QSFIFWSGE433TGHLYIZIEPAVCNFSM6AAAAABAAEY7F2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBZGE2TAMZWGY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Includes changes related to cleaning the ITEM ATO worksheet specifically T018, T019 and T020.
The final cleaned file is merged to the "master dataset.csv"