Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release #10

Merged
merged 40 commits into from
Oct 28, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
565f2f6
基本情報のパース処理作成
RikitoNoto Oct 21, 2023
2c1a4a0
Merge pull request #1 from RikitoNoto/feature/create_base_info_parser
RikitoNoto Oct 21, 2023
df5a574
学部名のパース処理作成
RikitoNoto Oct 22, 2023
0893bdf
xlsxオープンファイルの削除
RikitoNoto Oct 22, 2023
209d2b6
学科のパース処理作成
RikitoNoto Oct 22, 2023
82acec8
Merge pull request #2 from RikitoNoto/feature/create_faculty_parser
RikitoNoto Oct 22, 2023
750e1c4
エクセル中間ファイルの削除
RikitoNoto Oct 22, 2023
0bcdf08
大学院のパース処理作成
RikitoNoto Oct 22, 2023
8fa5bff
Merge pull request #3 from RikitoNoto/feature/create_graduate_school_…
RikitoNoto Oct 22, 2023
774a48b
不要なimportの削除
RikitoNoto Oct 22, 2023
89b59cb
Schoolクラス作成
RikitoNoto Oct 23, 2023
352adaa
モデルでのパース関数作成
RikitoNoto Oct 24, 2023
b47c3bc
dict型のパース処理作成
RikitoNoto Oct 24, 2023
ec70ac1
json 出力処理の作成
RikitoNoto Oct 24, 2023
33f0287
testsディレクトリの場所を変更
RikitoNoto Oct 24, 2023
b92e8e3
Merge pull request #4 from RikitoNoto/feature/create_graduate_school_…
RikitoNoto Oct 24, 2023
1bccf2d
学校名を追加
RikitoNoto Oct 24, 2023
301d3a2
Merge pull request #5 from RikitoNoto/feature/add_school_name_column
RikitoNoto Oct 24, 2023
e48e90d
学校区分のパース処理作成
RikitoNoto Oct 25, 2023
cce6cd6
学校区分のjson出力作成
RikitoNoto Oct 25, 2023
6cad5bc
Merge pull request #6 from RikitoNoto/feature/add_school_type_column
RikitoNoto Oct 25, 2023
2417d96
ワークフローファイルの作成
RikitoNoto Oct 25, 2023
97340b2
現在のブランチに出力するように修正
RikitoNoto Oct 25, 2023
1a5fff6
testディレクトリの設定
RikitoNoto Oct 25, 2023
68d8c7f
インデント修正
RikitoNoto Oct 25, 2023
a2a6ae0
テスト実行ディレクトリの変更
RikitoNoto Oct 25, 2023
d61a35e
環境変数の修正
RikitoNoto Oct 25, 2023
1455768
環境変数のディレクトリ修正
RikitoNoto Oct 25, 2023
64d4e85
Merge pull request #7 from RikitoNoto/feature/create_ci
RikitoNoto Oct 25, 2023
a525cdb
findコマンドの末尾にバックスラッシュを追加
RikitoNoto Oct 25, 2023
c981a3b
実行パス変更
RikitoNoto Oct 25, 2023
f0c2859
エクセルファイルを減らしてテスト
RikitoNoto Oct 25, 2023
60343b0
findコマンドの修正
RikitoNoto Oct 25, 2023
f84fe36
[自動生成]パース出力
RikitoNoto Oct 25, 2023
5a1bb89
出力前にoutputsフォルダをクリーンする処理追加
RikitoNoto Oct 25, 2023
c91c556
inputファイル追加
RikitoNoto Oct 25, 2023
b98261c
Merge pull request #8 from RikitoNoto/feature/fix_output_ci
RikitoNoto Oct 25, 2023
398e5a2
[自動生成]パース出力
RikitoNoto Oct 25, 2023
c5edb33
ReadMeの追加
RikitoNoto Oct 28, 2023
4a86719
Merge pull request #9 from RikitoNoto/feature/write_readme
RikitoNoto Oct 28, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions .github/workflows/black_lint.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
name: Black Lint Check

on: [pull_request]

jobs:
lint:
runs-on: ubuntu-latest

steps:
- name: Checkout repository
uses: actions/checkout@v2

- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: 3.11

- name: Install Black
run: |
python -m pip install --upgrade pip
pip install black

- name: Run Black Lint Check
run: black --check .
39 changes: 39 additions & 0 deletions .github/workflows/output.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
name: ファイル出力

on: workflow_dispatch

jobs:
output:
runs-on: ubuntu-latest

steps:
- name: Checkout repository
uses: actions/checkout@v2

- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: 3.11

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt

- name: Output
run: |
mkdir -p assets/outputs
find assets/outputs -name *.xlsx -type f -exec rm {} \;
cd src
find ../assets/inputs/ -name *.xlsx -type f -exec sh -c 'python japanese_school_parser.py "$0" "../assets/outputs/$(basename "$0" .xlsx).json"' {} \;

- name: Commit and push
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
git remote set-url origin https://github-actions:${GITHUB_TOKEN}@github.com/${GITHUB_REPOSITORY}
git config --global user.name "GITHUB_ACTOR"
git config --global user.email "${GITHUB_ACTOR}@users.noreply.github.com"
git add .
git commit -m "[自動生成]パース出力"
git push origin HEAD:${{ github.ref_name }}
28 changes: 28 additions & 0 deletions .github/workflows/test.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
name: テスト

on: [pull_request]

jobs:
test:
runs-on: ubuntu-latest

steps:
- name: Checkout repository
uses: actions/checkout@v2

- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: 3.11

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt

- name: Run pytest
env:
PYTHONPATH: ../
run: |
cd src/tests
pytest
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -158,3 +158,5 @@ cython_debug/
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/

~$*
21 changes: 21 additions & 0 deletions .vscode/launch.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
{
// Use IntelliSense to learn about possible attributes.
// Hover to view descriptions of existing attributes.
// For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
"version": "0.2.0",
"configurations": [
{
"name": "Python: Debug",
"type": "python",
"request": "launch",
"program": "japanese_school_parser.py",
"cwd": "${workspaceFolder}/src",
"args": [
"tests/files/multi_sheets1.xlsx",
"output.json",
],
"console": "integratedTerminal",
"justMyCode": false
}
]
}
14 changes: 14 additions & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{
"editor.indentSize": 4,
"editor.tabSize": 4,
"python.testing.pytestEnabled": true,
"python.testing.autoTestDiscoverOnSaveEnabled": true,
"python.testing.pytestArgs": [
"tests"
],
"python.testing.cwd": "src",
"[python]": {
"editor.defaultFormatter": "ms-python.black-formatter"
},
"python.formatting.provider": "none",
}
103 changes: 102 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,102 @@
# JapaneseSchoolPaser
# JapaneseSchoolPaser
文部科学省が公開している[大学一覧](https://www.mext.go.jp/a_menu/koutou/ichiran/mext_01853.html)のエクセルファイルから、大学情報をパースしデータとして使いやすい形で出力するプログラムです。

### 使い方
使用には[Pythonの実行環境](https://www.python.org/downloads/)が必要です。

1. [文部科学省のサイト](https://www.mext.go.jp/a_menu/koutou/ichiran/mext_00006.html)よりエクセルファイルをダウンロードします。
1. Json出力の場合、以下のpythonコマンドを実行し、出力を行います。
```bash
python japanese_school_parser.py <入力Excelファイルパス> <出力ファイルパス>
```

```bash
# 例
python japanese_school_parser.py 20220415_mxt_daigakuc01_000021808_03-7.xlsx 20220415_mxt_daigakuc01_000021808_03-7.json
# => コマンドを実行したディレクトリに20220415_mxt_daigakuc01_000021808_03-7.jsonが生成されます。
```
Pythonのdict型で出力を行いたい場合は以下の関数を呼び出します。
```python
schools = parse_schools_to_dict(source_path)
```


## 実装状況
### 出力形式
- [x] Json
- [x] Python dict型
- [ ] yaml

### 出力内容
#### 学校基本情報
- [x] 学校コード
- [x] 学長
- [ ] 学長任期
- [ ] 所在地
- [ ] 電話番号

#### 設置者情報
- [ ] 設置者
- [ ] 役職
- [ ] 役職者名

#### 学部・研究科所在地(キャンパス名など)
- [ ] 名称
- [ ] 所在地
- [ ] 電話番号

#### 国立大学附置研究所
- [ ] 附置研究所名
- [ ] 所在地住所
- [ ] 電話番号
- [ ] 設置年月日
- [ ] 備考

#### 国立大学附置研究所附属施設
- [ ] 附置研究所名
- [ ] 附属施設
- [ ] 所在地住所
- [ ] 電話番号

#### 学部
- [x] 学部
- [x] 学科
- [ ] 都道府県
- [ ] 市区町村
- [ ] 修業年限
- [ ] 入学定員
- [ ] 編入定員
- [ ] 編入(夜間)

#### 研究科
- [x] 研究科
- [x] 専攻
- [ ] 都道府県
- [ ] 市区町村
- [ ] 夜間昼夜
- [ ] 修士/博士 前期課程
- [ ] 博士後期課程
- [ ] 博士課程(一貫制)
- [ ] 専門職学位課程
- [ ] 編入定員

#### 学部沿革
- [ ] 年月
- [ ] 沿革

#### 大学院沿革
- [ ] 年月
- [ ] 沿革


## 貢献
現状作成者が個人的に使用する部分のみ実装しています。

Issueでご要望があれば追加実装を行います。

Pullrequestもお待ちしています。

### 備考
出典:[文部科学省ホームページ](https://www.mext.go.jp/)

本プログラムは上記出典のデータを加工して出力しています。
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
1 change: 1 addition & 0 deletions assets/outputs/20220415_mxt_daigakuc01_000021808_03-1.json

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions assets/outputs/20220415_mxt_daigakuc01_000021808_03-2.json

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions assets/outputs/20220415_mxt_daigakuc01_000021808_03-3.json

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions assets/outputs/20220415_mxt_daigakuc01_000021808_03-4.json

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions assets/outputs/20220415_mxt_daigakuc01_000021808_03-5.json

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions assets/outputs/20220415_mxt_daigakuc01_000021808_03-6.json

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions assets/outputs/20220415_mxt_daigakuc01_000021808_03-7.json

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions assets/outputs/20220415_mxt_daigakuc01_000021808_03-8.json

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions assets/outputs/20220415_mxt_daigakuc01_000021808_03-9.json

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions assets/outputs/20220627_mxt_daigakuc01_000021808_01.json

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions assets/outputs/20230126_mxt_daigakuc01_000021808_02.json

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions assets/outputs/multi_sheets1.json
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"schools": [{"classification": "\u79c1\u7acb", "faculties": [{"departments": [{"name": "\u9020\u5f62\u5b66\u79d1"}], "name": "\u82b8\u8853\u5b66\u90e8"}, {"departments": [{"name": "\u30d3\u30b8\u30e5\u30a2\u30eb\u30c7\u30b6\u30a4\u30f3\u5b66\u79d1"}, {"name": "\u30a4\u30e9\u30b9\u30c8\u5b66\u79d1"}, {"name": "\u30d7\u30ed\u30c0\u30af\u30c8\u30c7\u30b6\u30a4\u30f3\u5b66\u79d1"}, {"name": "\u5efa\u7bc9\u5b66\u79d1"}], "name": "\u30c7\u30b6\u30a4\u30f3\u5b66\u90e8"}, {"departments": [{"name": "\u30de\u30f3\u30ac\u5b66\u79d1"}, {"name": "\u30a2\u30cb\u30e1\u30fc\u30b7\u30e7\u30f3\u5b66\u79d1"}], "name": "\u30de\u30f3\u30ac\u5b66\u90e8"}, {"departments": [{"name": "\u7dcf\u5408\u4eba\u6587\u5b66\u79d1"}], "name": "\u4eba\u6587\u5b66\u90e8"}, {"departments": [{"name": "\u30dd\u30d4\u30e5\u30e9\u30fc\u30ab\u30eb\u30c1\u30e3\u30fc\u5b66\u79d1"}], "name": "\u30dd\u30d4\u30e5\u30e9\u30fc\u30ab\u30eb\u30c1\u30e3\u30fc\u5b66\u90e8"}, {"departments": [{"name": "\u30e1\u30c7\u30a3\u30a2\u8868\u73fe\u5b66\u79d1"}], "name": "\u30e1\u30c7\u30a3\u30a2\u8868\u73fe\u5b66\u90e8"}, {"departments": [{"name": "\u4eba\u6587\u5b66\u79d1"}, {"name": "\u30b0\u30ed\u30fc\u30d0\u30eb\u30b9\u30bf\u30c7\u30a3\u30fc\u30ba\u5b66\u79d1"}], "name": "\u56fd\u969b\u6587\u5316\u5b66\u90e8"}], "graduate_schools": [{"majors": [{"name": "\u82b8\u8853\u5c02\u653b"}], "name": "\u82b8\u8853\u7814\u7a76\u79d1"}, {"majors": [{"name": "\u30c7\u30b6\u30a4\u30f3\u5c02\u653b"}, {"name": "\u5efa\u7bc9\u5c02\u653b"}], "name": "\u30c7\u30b6\u30a4\u30f3\u7814\u7a76\u79d1"}, {"majors": [{"name": "\u30de\u30f3\u30ac\u5c02\u653b"}], "name": "\u30de\u30f3\u30ac\u7814\u7a76\u79d1"}, {"majors": [{"name": "\u4eba\u6587\u5b66\u5c02\u653b"}], "name": "\u4eba\u6587\u5b66\u7814\u7a76\u79d1"}], "name": "\u4eac\u90fd\u7cbe\u83ef\u5927\u5b66", "president": "\u30a6\u30b9\u30d3\u30fb\u30b5\u30b3", "school_code": "F126310107644"}, {"classification": "\u79c1\u7acb", "faculties": [{"departments": [{"name": "\u937c\u7078\u5b66\u79d1"}], "name": "\u937c\u7078\u5b66\u90e8"}, {"departments": [{"name": "\u67d4\u9053\u6574\u5fa9\u5b66\u79d1"}, {"name": "\u6551\u6025\u6551\u547d\u5b66\u79d1"}], "name": "\u4fdd\u5065\u533b\u7642\u5b66\u90e8"}, {"departments": [{"name": "\u770b\u8b77\u5b66\u79d1"}], "name": "\u770b\u8b77\u5b66\u90e8"}], "graduate_schools": [{"majors": [{"name": "\u937c\u7078\u5b66\u5c02\u653b"}, {"name": "\u81e8\u5e8a\u937c\u7078\u5b66\u5c02\u653b"}], "name": "\u937c\u7078\u5b66\u7814\u7a76\u79d1"}, {"majors": [{"name": "\u67d4\u9053\u6574\u5fa9\u5b66\u5c02\u653b"}], "name": "\u4fdd\u5065\u533b\u7642\u5b66\u7814\u7a76\u79d1"}], "name": "\u660e\u6cbb\u56fd\u969b\u533b\u7642\u5927\u5b66", "president": "\u77e2\u91ce\u3000\u5fe0", "school_code": "F126310107653"}]}
7 changes: 7 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
et-xmlfile==1.1.0
exceptiongroup==1.1.3
iniconfig==2.0.0
openpyxl==3.1.2
pluggy==1.3.0
pytest==7.4.2
tomli==2.0.1
Empty file added src/__init__.py
Empty file.
42 changes: 42 additions & 0 deletions src/japanese_school_parser.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
import json
from typing import Any

from openpyxl import load_workbook
from openpyxl.workbook import Workbook
from openpyxl.worksheet.worksheet import Worksheet

from models.school import School
from parser.school_parser import SchoolParser


def parse_schools_to_dict(path: str) -> dict[str, Any]:
schools = parse_schools_to_model(path)

return {"schools": [school.to_dict() for school in schools]}


def parse_schools_to_model(path: str) -> list[School]:
book: Workbook = load_workbook(path)
schools: list[School] = []

sheet_names = book.sheetnames
for sheet_name in sheet_names:
sheet: Worksheet = book[sheet_name]
schools.append(SchoolParser(sheet).parse())

book.close()

return schools


def output_json(source_path: str, out_file_path: str) -> None:
schools = parse_schools_to_dict(source_path)
with open(out_file_path, "w") as file:
json.dump(schools, file)


if __name__ == "__main__":
import sys

args = sys.argv
output_json(args[1], args[2])
Empty file added src/models/__init__.py
Empty file.
47 changes: 47 additions & 0 deletions src/models/base_info.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
from enum import Enum
from typing import Optional
from models.model import Model


class SchoolClassification(Enum):
NATIONAL = "国立"
PUBLIC = "公立"
PRIVATE = "私立"

@classmethod
def from_str(cls, class_str: str):
for value in list(cls):
if value.value == class_str:
return value


class BaseInfo(Model):
def __init__(
self, name="", school_code="", president="", classification=None
) -> None:
self.__name = name
self.__school_code: str = school_code
self.__president: str = president
self.__classification: Optional[SchoolClassification] = classification

def _register_base_info(self, base_info):
self.__name = base_info.name
self.__school_code: str = base_info.school_code
self.__president: str = base_info.president
self.__classification: Optional[SchoolClassification] = base_info.classification

@property
def name(self) -> str:
return self.__name

@property
def classification(self) -> SchoolClassification:
return self.__classification

@property
def school_code(self) -> str:
return self.__school_code

@property
def president(self) -> str:
return self.__president
25 changes: 25 additions & 0 deletions src/models/faculty.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
from models.model import Model


class Department(Model):
def __init__(self, name: str) -> None:
self.__name: str = name
pass

@property
def name(self) -> str:
return self.__name


class Faculty(Model):
def __init__(self, name: str, departments: list[Department]) -> None:
self.__name: str = name
self.__departments: list[Department] = departments

@property
def name(self) -> str:
return self.__name

@property
def departments(self) -> list[Department]:
return self.__departments
25 changes: 25 additions & 0 deletions src/models/graduate_school.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
from models.model import Model


class Major(Model):
def __init__(self, name: str) -> None:
self.__name: str = name
pass

@property
def name(self) -> str:
return self.__name


class GraduateSchool(Model):
def __init__(self, name: str, majors: list[Major]) -> None:
self.__name: str = name
self.__majors: list[Major] = majors

@property
def name(self) -> str:
return self.__name

@property
def majors(self) -> list[Major]:
return self.__majors
Loading
Loading