Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: add progress bar to journal data update script #2182

Merged
merged 26 commits into from
Sep 4, 2023
Merged
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
5148d87
Add db
tobiasdiez Jun 24, 2023
31946b8
add graphql
tobiasdiez Jun 26, 2023
2747811
Merge remote-tracking branch 'origin/main' into journalinfo
tobiasdiez Jun 26, 2023
4392e06
Merge remote-tracking branch 'origin/main' into journalinfo
tobiasdiez Jun 26, 2023
2a7dfa0
Merge remote-tracking branch 'origin/main' into journalinfo
tobiasdiez Jun 26, 2023
6734e1b
Merge remote-tracking branch 'origin/main' into journalinfo
tobiasdiez Jun 26, 2023
5d9c67d
Merge remote-tracking branch 'origin/main' into journalinfo
tobiasdiez Jun 26, 2023
67046b3
Merge remote-tracking branch 'origin/main' into journalinfo
tobiasdiez Jun 26, 2023
3fb1a30
Merge remote-tracking branch 'origin/main' into journalinfo
tobiasdiez Jun 26, 2023
84d4059
fix tests
tobiasdiez Jun 26, 2023
cf7fed1
use string instead of int for issn
tobiasdiez Jun 27, 2023
bb13e55
fix db issn type
tobiasdiez Jun 27, 2023
d295213
Merge branch 'main' into journalinfo
tobiasdiez Jul 6, 2023
7510dc6
Merge branch 'main' into journalinfo
tobiasdiez Jul 6, 2023
97ac59d
Merge branch 'main' into journalinfo
tobiasdiez Jul 6, 2023
0692abd
fix revision hash
tobiasdiez Jul 10, 2023
eb2ce77
Add script to get journal data and update db
tobiasdiez Jul 28, 2023
655f183
Merge remote-tracking branch 'origin/main' into journalinfo
tobiasdiez Jul 28, 2023
9aa8394
fix linter
tobiasdiez Jul 28, 2023
24a9e14
fix seed
tobiasdiez Jul 28, 2023
1e083fb
fix bigint
tobiasdiez Jul 28, 2023
cb0e913
use json-bigint-patch lib
tobiasdiez Jul 28, 2023
09098ac
chore: add progress bar to journal data update script
tobiasdiez Aug 22, 2023
73971ac
Merge branch 'main' into journal-info-stats
tobiasdiez Sep 4, 2023
e3975e0
Update index.ts
tobiasdiez Sep 4, 2023
5c66f06
Update global.setup.ts
tobiasdiez Sep 4, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions scripts/journaldata.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@

from prisma import Prisma
from prisma.types import JournalCitationInfoYearlyCreateWithoutRelationsInput
from tqdm import tqdm

# current_year should be the latest year of data available at https://www.scimagojr.com/journalrank.php
current_year = 2022
Expand Down Expand Up @@ -137,8 +138,7 @@ def download_all_data():
def combine_data():
"""Iterate over files and return the consolidated dataset"""
journals: dict[int, JournalInfo] = {}
for year in range(start_year, current_year + 1):
print(f'Processing {year}')
for year in tqdm(range(start_year, current_year + 1), desc='Processing data'):
filepath = get_data_filepath(year)
with open(filepath, mode='r', encoding='utf-8') as csv_file:
csv_reader = csv.DictReader(csv_file, delimiter=';')
Expand Down Expand Up @@ -212,7 +212,7 @@ async def dump_into_database(journals: dict[int, JournalInfo]):
# delete all existing yearly data (because its easier than updating)
await db.journalcitationinfoyearly.delete_many()

for journal in journals.values():
for journal in tqdm(journals.values(), desc='Saving to database'):
citation_info: list[JournalCitationInfoYearlyCreateWithoutRelationsInput] = [
{
'year': year,
Expand Down
Loading