Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"resulting" bug #205

Closed
VasylHafych opened this issue Apr 5, 2020 · 2 comments
Closed

"resulting" bug #205

VasylHafych opened this issue Apr 5, 2020 · 2 comments

Comments

@VasylHafych
Copy link

Hi! I discovered a bug while playing with SentimentAnalyzer

model = SentimentAnalyzer()
data_tmp = model(StringDocument("resulting"), (x) -> error("OOV word $x encountered"))

Here is my error message:

BoundsError: attempt to access 32×5000 Array{Float32,2} at index [Base.Slice(Base.OneTo(32)), 5001]

Stacktrace:
 [1] throw_boundserror(::Array{Float32,2}, ::Tuple{Base.Slice{Base.OneTo{Int64}},Int64}) at ./abstractarray.jl:537
 [2] checkbounds at ./abstractarray.jl:502 [inlined]
 [3] _getindex at ./multidimensional.jl:726 [inlined]
 [4] getindex at ./abstractarray.jl:980 [inlined]
 [5] embedding(::Array{Float32,2}, ::Array{Float64,1}) at /Users/vhafych/.julia/packages/TextAnalysis/pcFQf/src/sentiment.jl:27
 [6] (::TextAnalysis.var"#24#25"{Dict{Symbol,Any}})(::Array{Float64,1}) at /Users/vhafych/.julia/packages/TextAnalysis/pcFQf/src/sentiment.jl:40
 [7] get_sentiment(::var"#21#22", ::Array{String,1}, ::Dict{Symbol,Any}, ::Dict{String,Any}) at /Users/vhafych/.julia/packages/TextAnalysis/pcFQf/src/sentiment.jl:59
 [8] (::TextAnalysis.SentimentModel)(::Function, ::Array{String,1}) at /Users/vhafych/.julia/packages/TextAnalysis/pcFQf/src/sentiment.jl:87
 [9] (::SentimentAnalyzer)(::StringDocument{String}, ::Function) at /Users/vhafych/.julia/packages/TextAnalysis/pcFQf/src/sentiment.jl:103
 [10] top-level scope at In[49]:2

I used a text with 10^7 symbols and the word "resulting" is the only one that crashed process.

@tejasvaidhyadev
Copy link
Member

Hi @VasylHafych,
Thanks for pointing out.
I think there is some error in implementation of SentimentAnalyzer
In file sentiment_weights we are having having 32×5000 matrices with index-1 is default pad token embedding
But here
in src/sentiment.jl
if ele in keys(rwi) && rwi[ele] <= ( size(weight[:embedding_1]["embedding_1"]["embeddings:0"])[2] ) # there are only 5000 unique embeddings - we are allowing 5000 unique embedding excluding padding which implies 5001 unique embedding but we have only 32×5000 weight.
I am not sure about training process may be we are training on only first 5000 element with padding token then replacing above line with
if ele in keys(rwi) && rwi[ele] <= ( size(weight[:embedding_1]["embedding_1"]["embeddings:0"])[2] - 1 ) # there are only 5000 unique embeddings including padding token will work.

@tejasvaidhyadev
Copy link
Member

tejasvaidhyadev commented Apr 23, 2020

I think we can close the issue now

@aviks aviks closed this as completed Jul 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants