Skip to content

Commit

Permalink
Merge branch 'docs/add-legacy-in-docs'
Browse files Browse the repository at this point in the history
  • Loading branch information
ZenithClown committed Aug 18, 2024
2 parents 01e31b0 + 741619d commit 1f1db57
Show file tree
Hide file tree
Showing 4 changed files with 56 additions and 19 deletions.
1 change: 1 addition & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
```{toctree}
:hidden:
normalize.md
legacy.md
```

<div align = "justify">
Expand Down
15 changes: 15 additions & 0 deletions docs/legacy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Legacy Functions

<div align = "justify">

```{eval-rst}
.. automodule:: nlpurify.legacy
```

### NLP Utilities

```{eval-rst}
.. automodule:: nlpurify.legacy.nlp_utils
```

</div>
5 changes: 5 additions & 0 deletions nlpurify/legacy/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,11 @@
legacy submodule unless dependent codes are gradually migrated.
More Information: `Issue #5 <https://github.com/sharkutilities/NLPurify/issues/5>`_
on code migrations and submodule details.
.. caution::
The documentation does not follow PEP-8 convention, and is not maintained
properly. This submodule is kept only as a precautionary submodule.
"""

from nlpurify.legacy.nlp_utils import * # noqa: F401, F403 # pyright: ignore[reportMissingImports]
54 changes: 35 additions & 19 deletions nlpurify/legacy/nlp_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,34 @@

"""
A set of utility function related to natural language
processing. In addition to the basic libraries, the module
requires the following corpus from `nltk` library:
* `stopwords` : used to remove stop words from a given
strings. Currently using the function for
pre-processing.
In addition, need some additional libraries like `fuzzywuzzy`
and `python-Levenshtein` using the following:
```python
pip install fuzzywuzzy
pip install python-Levenshtein
```
processing. The code uses the :mod:`nltk` library along with basic
string formattings to clean and process texts.
.. warning::
The functions are not optimized and test cases are not checked.
Use the function with caution.
**Getting Started**
To use the function and its capabilities, first install the required
libraries:
.. code-block:: shell
$ pip install fuzzywuzzy
$ pip install python-Levenshtein # improve performance
The legacy code is a standalone submodule, and can be used for
existing dependent modules like:
.. code-block:: python
import nlpurify.legacy as nlpu # nlp-utility functions
print(nlpu.text_process("some random string that needs cleaning"))
To use the function, :mod:`nltk.corpus` must be installed for
``stopwords`` and related. More informations is available
`here <https://www.nltk.org/howto/corpus.html>`_.
"""

import re
Expand Down Expand Up @@ -80,16 +95,17 @@ def text_processor(string : str, **kwargs) -> str:
More information on in-built string methods is available here:
https://www.programiz.com/python-programming/methods/string.
# ! Function is not yet optimized when used in conjunction.
.. attention::
The function is not yet optimized when used in conjunction.
:type string: str
:param string: Base string which needs formatting. The string
is converted into lower case. If passed from
! `processor`this step is repeated.
TODO fix when passed through parent function.
is converted into lower case. If passed from
:func:`processor()`this step is repeated.
TODO fix when passed through parent function.
**Keyword Arguments**
Keyword Arguments
-----------------
* *isalnum* (bool): Only keep `alpha-numeric` charecters in the
string. Defaults to False.
* *isalpha* (bool): Only keep `alphabets` charecters in the
Expand Down

0 comments on commit 1f1db57

Please sign in to comment.