Because I'm tired of running into broken READMEs!
GitHub changed the way ATX headers are parsed in Markdown files. This caused many repos' READMEs to have their headings suddenly broken, and albeit time have passed, many are still broken.
vmarkovtsev created a dataset (CC BY-NC 4.0) containing the repos with more than 50 stars that contain READMEs broken in this way. So I created this script to iterate through the list and create a PR to fix each of them.
Caution: this is an automated script to create Pull Requests. Please be cautious to avoid creating spam with it.
The script works on Python 3.6+. To install its dependencies:
pip install -r requirements.txt
To run it, you first need to configure a Personal Access Token with repo:public_repo scope to be able to fork projects and to create pull requests. Then:
export GITHUB_ACCESS_TOKEN=<YOUR ACCESS TOKEN>
./readmesfix.py
It will start processing each repo in the file (one by line) by cloning it, finding its Markdown files, checking if they should be fixed, forking them and creating a pull request. Take into account GitHub API rate limiting, so avoid overwhelming it by making the script much faster.
To select a different dataset than top_broken.tsv
:
./readmesfix.py --dataset dataset_file
To test this script:
python -m unittest discover