Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

聚类算法-传入簇参数大于文档个数时报空指针 #1397

Closed
1 task done
Mryang11 opened this issue Jan 10, 2020 · 1 comment
Closed
1 task done

聚类算法-传入簇参数大于文档个数时报空指针 #1397

Mryang11 opened this issue Jan 10, 2020 · 1 comment
Assignees
Labels

Comments

@Mryang11
Copy link

Describe the bug
传入簇参数大于文档个数时报空指针

Code to reproduce the issue
Provide a reproducible test case that is the bare minimum necessary to generate the problem.

public static void main(String[] args) {
		String a = "select * from table;";
		String b = "select * from table;";
		String c = "select * from table where id = 100;";
		String d = "select * from table;";
		String e = "delete from table;";
		String f = "update table set age = 1 where id = 9";
		String g = "update table set age = 1 where id = 88";
		String h = "update table set age = 1 where id = 10";

		Set<Integer> set = new HashSet<>();
		List<String> list = new ArrayList<>();
		list.add(a);
		list.add(b);
		list.add(c);
		list.add(d);
		list.add(e);
		list.add(f);
		list.add(g);
		list.add(h);
		String[] array = list.toArray(new String[0]);
		set.add(0);
		set.add(1);
		set.add(3);
		set.add(5);
		set.add(6);
		set.add(7);
		System.out.println("================聚类算法==============");
		ClusterAnalyzer<String> analyzer = new ClusterAnalyzer<>();
		for (Integer s: set) {
			analyzer.addDocument(String.valueOf(s), array[s]);
		}
		// k大于set集合大小
		int k = 10;
		System.out.println(analyzer.kmeans(k));
		System.out.println();

		System.out.println(analyzer.repeatedBisection(k));
		System.out.println(analyzer.repeatedBisection(1.0));

	}

Describe the current behavior

kmeans:
================聚类算法==============
Exception in thread "main" java.lang.NullPointerException
	at com.hankcs.hanlp.mining.cluster.ClusterAnalyzer.refine_clusters(ClusterAnalyzer.java:263)
	at com.hankcs.hanlp.mining.cluster.ClusterAnalyzer.kmeans(ClusterAnalyzer.java:147)
repeatedBisection:
================聚类算法==============

Exception in thread "main" java.lang.NullPointerException
	at com.hankcs.hanlp.mining.cluster.ClusterAnalyzer.repeatedBisection(ClusterAnalyzer.java:222)
	at com.hankcs.hanlp.mining.cluster.ClusterAnalyzer.repeatedBisection(ClusterAnalyzer.java:180)

Expected behavior
簇类参数传的值越界时,程序应当友好处理为文档个数
System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Mac OS
  • Python version:
  • HanLP version: 1.7.6

Other info / logs
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

  • I've completed this form and searched the web for solutions.
@hankcs
Copy link
Owner

hankcs commented Jan 10, 2020

感谢反馈,已经修复,请参考上面的commit。
如果还有问题,欢迎重开issue。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants