聚类算法-传入簇参数大于文档个数时报空指针 #1397

Mryang11 · 2020-01-10T08:51:48Z

Describe the bug
传入簇参数大于文档个数时报空指针

Code to reproduce the issue
Provide a reproducible test case that is the bare minimum necessary to generate the problem.

public static void main(String[] args) {
		String a = "select * from table;";
		String b = "select * from table;";
		String c = "select * from table where id = 100;";
		String d = "select * from table;";
		String e = "delete from table;";
		String f = "update table set age = 1 where id = 9";
		String g = "update table set age = 1 where id = 88";
		String h = "update table set age = 1 where id = 10";

		Set<Integer> set = new HashSet<>();
		List<String> list = new ArrayList<>();
		list.add(a);
		list.add(b);
		list.add(c);
		list.add(d);
		list.add(e);
		list.add(f);
		list.add(g);
		list.add(h);
		String[] array = list.toArray(new String[0]);
		set.add(0);
		set.add(1);
		set.add(3);
		set.add(5);
		set.add(6);
		set.add(7);
		System.out.println("================聚类算法==============");
		ClusterAnalyzer<String> analyzer = new ClusterAnalyzer<>();
		for (Integer s: set) {
			analyzer.addDocument(String.valueOf(s), array[s]);
		}
		// k大于set集合大小
		int k = 10;
		System.out.println(analyzer.kmeans(k));
		System.out.println();

		System.out.println(analyzer.repeatedBisection(k));
		System.out.println(analyzer.repeatedBisection(1.0));

	}

Describe the current behavior

kmeans:
================聚类算法==============
Exception in thread "main" java.lang.NullPointerException
	at com.hankcs.hanlp.mining.cluster.ClusterAnalyzer.refine_clusters(ClusterAnalyzer.java:263)
	at com.hankcs.hanlp.mining.cluster.ClusterAnalyzer.kmeans(ClusterAnalyzer.java:147)

repeatedBisection:
================聚类算法==============

Exception in thread "main" java.lang.NullPointerException
	at com.hankcs.hanlp.mining.cluster.ClusterAnalyzer.repeatedBisection(ClusterAnalyzer.java:222)
	at com.hankcs.hanlp.mining.cluster.ClusterAnalyzer.repeatedBisection(ClusterAnalyzer.java:180)

Expected behavior
簇类参数传的值越界时，程序应当友好处理为文档个数
System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Mac OS
Python version:
HanLP version: 1.7.6

Other info / logs
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

I've completed this form and searched the web for solutions.

The text was updated successfully, but these errors were encountered:

hankcs · 2020-01-10T18:25:55Z

感谢反馈，已经修复，请参考上面的commit。
如果还有问题，欢迎重开issue。

Mryang11 added the bug label Jan 10, 2020

Mryang11 assigned hankcs Jan 10, 2020

hankcs added a commit that referenced this issue Jan 10, 2020

修复聚类数目大于文档数目时引发的异常 fix #1397

19809f3

hankcs closed this as completed Jan 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

聚类算法-传入簇参数大于文档个数时报空指针 #1397

聚类算法-传入簇参数大于文档个数时报空指针 #1397

Mryang11 commented Jan 10, 2020

hankcs commented Jan 10, 2020

聚类算法-传入簇参数大于文档个数时报空指针 #1397

聚类算法-传入簇参数大于文档个数时报空指针 #1397

Comments

Mryang11 commented Jan 10, 2020

hankcs commented Jan 10, 2020