Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow multiple ngram complexity in NGramDocument, ngrams and ngrammize #157

Merged
merged 1 commit into from
Jun 9, 2019

Conversation

tanmaykm
Copy link
Contributor

Allow NGramDocument to optionally store ngrams of multiple ngram complexity. Updated ngrams and ngramize methods to generate such ngrams. Updated corresponding docs and added tests.

Also corrected show method for NGramDocument.

Ref: #148 and #148 (comment)

@@ -81,18 +81,17 @@ TokenDocument(txt::AbstractString) = TokenDocument(String(txt), DocumentMetadata

mutable struct NGramDocument{T<:AbstractString} <: AbstractDocument
ngrams::Dict{T,Int}
n::Int
n::Union{Int,Vector{Int}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this defeat type inference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't seen ngram_complexity (or this field) used anywhere in TextAnalysis. It may be used from external code. But then there are no abstracts here, and it should be easy to make them type stable if/where needed.

Allow NGramDocument to optionally store ngrams of multiple ngram complexity. Updated `ngrams` and `ngramize` methods to generate such ngrams. Updated corresponding docs and added tests.

Also corrected `show` method for `NGramDocument`.
@tanmaykm
Copy link
Contributor Author

rebased to fix conflicts

@aviks aviks merged commit b7fc9c4 into JuliaText:master Jun 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants