Changelog for nlp_profiler.
Based on the issue raised on GitHub #1
b2a002a - 4f117a6 @neomatrix369 Wed Sep 16 17:15:29 2020 +0100
6bdc799 - 4c49ae5 @neomatrix369 Thu Sep 17 17:27:14 2020 +0100
GitHub branch add-progress-bars
add progress bars to the various levels of transformation for better UX/UI experience
Based on the issue raised on GitHub #3 - although only implements progress bars at the first and second levels of iterations, pending level 3 iteration (row/record level)
a83bc23 - 7c72b0e @neomatrix369 Thu Sep 17 19:50:30 2020 +0100
GitHub branch add-progress-bars
add progress bars to the various levels of transformation for better UX/UI experience
Continuing with the above changes, third-level progress-bar is in place (row-level progress)
7c72b0e - c3ada30 @neomatrix369 Fri Sep 18 13:44:48 2020 +0100
GitHub pull request #9 improve performance of the library when used on larger datasets
Branch scale-when-applied-to-larger-datasets
Added parallelisation and some caching to improve the initial slow-down in the performance.
Verification and tests have been performed, although this is a continuous process.
For performance metrics before and after changes see this comment on GitHub issue #2.
00a68e2 - 1ff5082 @neomatrix369 Fri Sep 18 14:09:12 2020 +0100
Just releasing to GitHub under the Releases tab and on PyPi
d5d0bc1 - 6510131 @neomatrix369 _Sun Sep 27 11:56:48 2020 +0100 _
GitHub branch scale-when-applied-to-larger-datasets
Improving performance of Grammar check on large datasets
Tweaking the Grammar check function to perform better than the previous version
81d055f - 2e311f7 @neomatrix369 Sat Oct 3 07:57:39 2020 +0100
Enable running tests with coverage when a new PR is created or commits are pushed to the repo, across Linux and Windows instances.
Producing the Code coverage report with each commit. And uploading the artifacts to GitHub.
a806716 - 7e4ca87 @neomatrix369 Thu Oct 15 16:50:59 2020 +0100
GitHub branch add-docs-for-developers
and add-github-templates
Update docs for Developers and Add Github templates for issues and pull request
To improve communication with developers and also to create a streamlined process for the same, docs and templates have been added and updated to the repo. These do not change the functionality of the library in any form or shape.
6d40570 - 6d40570 @neomatrix369 Sat Oct 17 19:24:30 2020 +0100
Count the number of noun phrases in the text data and return it as part of granular features.
Thanks, @ritikjain51 for your contribution originally via PR #13, which was fixed and refactored via PR #47.
f8a22ba - fcd706b @neomatrix369 Wed Oct 21 13:40:20 2020 +0100
Now the build and test action runs on Windows instances as well. Fixes issue reported via #21.
5e7f999 @neomatrix369 Sat Oct 24 16:43:49 2020 +0100
Conda user(s) could not install the library using the pip install
this is now possible following the docs on the README page.
Fixes issue #57 via PR #58
ae91f5c @neomatrix369 Sun Dec 13 10:17:17 2020 +0000
Just like spelling check and grammar checks, adding a high-level feature to indicate if a block of text is easy to read or not, based on the library textstat's flesch_reading_ease().
It returns values between 0 and 100 (I have seen values go past 0 and 100 depending on how bad or good the text is).
4919a51 @neomatrix369 Sun Dec 13 18:36:42 2020 +0000
GitHub branch add-granular-features
Granular features: Add granular features: count letters, digits, spaces, whitespaces, and punctuations
Implemented functionality via PR #60 - details described in the body of the PR. In short, counting repeated letters, digits, spaces, whitespaces, and punctuations in the text. Counting English and non-English language characters in the text. Also, amending existing functionality of punctuations count, digits count and fixing a bug in ease of reading scoring. Housekeeping: removing duplicates, removing cached folders before running tests.
68bee76 @neomatrix369 Sun Dec 27 12:45:59 2020 +0000
Implemented functionality via PR #61 - details described in the body of the PR.
Added new feature(s) to the granular features groups: count syllables extracted from the text provided.
Credits: Gunes Evitan (https://www.kaggle.com/gunesevitan) -- inspired by the discussion on https://www.kaggle.com/c/commonlitreadabilityprize/discussion/238375
498338e @neomatrix369 Sat May 15 00:50:01 2021 +0100
Implemented functionality via PR #62 - details described in the body of the PR.
Moving notebooks to github releases from the notebooks
folder to prevent Github from misclassifying the repo/library.
5ba447e @neomatrix369 Sat Nov 13 21:02:21 2021 +0000
GitHub branch fix-failing-high-level-tests
Make the acceptance tests pass, fixing dependency versions
Implemented functionality via PR #63 - details described in the body of the PR.
Fixing dependency issues leading to tests to fail as API changes in the respective libraries i.e. language_tool_python
and pandas
.
5b87b03 @neomatrix369 Sat Nov 13 23:35:38 2021 +0000
GitHub branch correct-spelling-of-column-to-noun-phrase
Granular features: corrected the name of the new feature column to noun phrase
Implemented functionality via PR #64 - details described in the body of the PR.
Correct the misspelt term "noun phrase" or "noun phrases" across the codebase
a9d1e1a @neomatrix369 Sat Nov 13 21:41:56 2021 +0000
Implemented functionality via PR #65 - details described in the body of the PR.
Enabled nightly run of build and test via Github actions
dde3172 @neomatrix369 Sun Nov 14 09:12:33 2021 +0000
Implemented functionality via PR #69 - details described in the body of the PR.
Replaced language tool with Gingerit for faster calculations
b5a5dda @bitanb1999 Sun March 13 00:31:31 2023 +0000
GitHub branch revert-76-sourcery/revert-71-spelling_check
Granular features: reverted change made to spell checks
Implemented functionality via PR #75 - details described in the body of the PR.
Reverting spell check functionality as it is not tested and tests change/break with new implementation.
2cddf51 @neomatrix369 Mon Mar 13 02:56:40 2023 +0000
GitHub branch reformating-code-and-minor-fixes
Reformatting code, refactoring as per Sourcery, minor fixes and test fixes
Implemented functionality via PR #73 - details described in the body of the PR.
Reformatting code, refactoring as per Sourcery, minor fixes and test fixes. Bringing back the build system in order. Fixes old regressed tests.
7caeb47 @neomatrix369 Mon Mar 13 11:23:49 2023 +0000
Return to README.md