Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

训练的word2vec 模型,放在hdfs上,用spark分布式的加载调用失败问题? #903

Closed
1 task done
tc620 opened this issue Jul 31, 2018 · 1 comment
Closed
1 task done
Labels

Comments

@tc620
Copy link

tc620 commented Jul 31, 2018

注意事项

请确认下列注意事项:

  • 我已仔细阅读下列文档,都没有找到答案:
  • 我已经通过Googleissue区检索功能搜索了我的问题,也没有找到答案。
  • 我明白开源社区是出于兴趣爱好聚集起来的自由社区,不承担任何责任或义务。我会礼貌发言,向每一个帮助我的人表示感谢。
  • 我在此括号内输入x打钩,代表上述事项确认完毕。

版本号

当前最新版本号是:portable-1.6.6
我使用的版本是:portable-1.6.6

我的问题

训练的word2vec 模型,放在hdfs上,用spark分布式的加载调用失败问题?

触发代码

rdd.mapPartitions(iteratos => myFunctions(iteratos, words))
def myFunctions(iterator: Iterator[String], word: String): Iterator[mutable.HashMap[Integer, Float]] = {
    val wordVecModel = new WordVectorModel("data/model.txt")
    val docmentsModel = new DocVectorModel(wordVecModel)
    val sets = mutable.Set[mutable.HashMap[Integer, Float]]()
    for (iterm <- iterator) {
      val arrays=iterm.split("\t")
      val id=arrays(0).toInt
      val contents =iterm
      docmentsModel.addDocument(id, HanLP.convertToSimplifiedChinese(contents))
      val list = docmentsModel.nearest(word)
      import scala.collection.JavaConversions._
      for (id <- list) {
        val maps = mutable.HashMap[Integer, Float]()
        maps.put(id.getKey, id.getValue)
        sets.add(maps)
      }
    }
    sets.iterator
  }

错误输出

 空指针异常,在读取模型那行代码上

其他信息

单机可以加载,放在集群上加载就不行
#默认的IO适配器如下,该适配器是基于普通文件系统的。
IOAdapter=com.npl.spark.HadoopFileIoAdapter 也已经重写了

@hankcs hankcs closed this as completed in c3885f3 Jul 31, 2018
@hankcs
Copy link
Owner

hankcs commented Jul 31, 2018

感谢反馈,已经修复,请参考上面的commit。
如果还有问题,欢迎重开issue。

@hankcs hankcs added the bug label Jul 31, 2018
hankcs added a commit that referenced this issue Jan 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants