Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

E-Tiller #365

Open
2 tasks done
doubanchan opened this issue Jul 18, 2024 · 4 comments
Open
2 tasks done

E-Tiller #365

doubanchan opened this issue Jul 18, 2024 · 4 comments
Assignees
Labels
bug Something isn't working enhancement New feature or request

Comments

@doubanchan
Copy link

doubanchan commented Jul 18, 2024

你遇到了什么问题? [必填]

  • 无法识别条目
  • 识别成multiple

发生问题的链接 [必填]
含能材料 识别成multiple
气候与环境研究识别成multiple
生物化学与生物物理进展未识别,默认Embedded Metadata
微生物学通报 未识别,默认Embedded Metadata
重庆大学学报(社会科学版),默认Embedded Metadata
电子测量与仪器学报
水土保持通报未识别,默认Embedded Metadata
哈尔滨工业大学学报,识别DOI
分子催化,识别成DOI,无结构化Metadata,或许可以放弃
中国工业医学杂志未识别,可能是网页结构问题,无结构化Metadata(底部有“勤云科技”字样)

问题描述 [必填]
以上多数的网页底部无“勤云科技”字样,但是不少网址或者部分网页元素中含有alljournal.com.cn、cnjournals.com、ijournals.cn、、cnjournals.net、、cnjournals.org、alljournals.cn(网页元素)等勤云所属域名。另外论文网址多数包含reader/view_abstractarticle/abstract(article/abstract针对摘要)。

@doubanchan doubanchan added the bug Something isn't working label Jul 18, 2024
@doubanchan
Copy link
Author

doubanchan commented Jul 19, 2024

DIY的一些修改,可以修复含indomain域名且识别成Embedded Metadata的和含能材料的multiple,生物化学与生物物理进展未修复,分子催化网页head部分无Metadata暂时放弃,哈尔滨工业大学学报似乎网页上存在一些设计问题,Google Chrome控制台一直输出错误。

const paths = [
'reader/view_abstract.aspx?file_no=',
'/article/abstract/'
];

//E-Tiller期刊会用到的网址
const indomain = [
	'ijournals.cn',
	'alljournals.cn',
	'alljournals.ac.cn',
	'alljournal.com.cn',
	'alljournals.com.cn',
	'alljournal.net',
	'cnjournals.com',
	'cnjournals.net',
	'cnjournals.org',
	'alljournalsystem.com',
	'journalsystem.net',
	'allmaga.net'
];

Z.debug(`incite: ${insite}`);
if (!insite) return false;
for (let path of paths) {
if (url.includes(path) && doc.querySelector('meta[name="citation_title"]')) {
Z.debug(`match path: ${path}`);
return 'journalArticle';
}
else if (doc.querySelector('meta[name]') && getSearchResults(doc, true)) {
Z.debug(`match path: ${path}`);
return 'multiple';
}
}

	let inurl = indomain.some((element) => url.includes(element)) ;
	Z.debug(`insite: ${insite}`);
	if (!insite && !inurl) return false;
	if(paths.some((element) => url.includes(element)) && doc.querySelector('meta[name="citation_title"]')){
		return 'journalArticle';
	}
	else if (doc.querySelector('meta[name]') && getSearchResults(doc, true)) {
		return 'multiple';
	}

@doubanchan
Copy link
Author

加入'article/html',可以让其识别完整HTML页面,而不仅仅是摘要界面,比如中国内镜杂志

const paths = [
'reader/view_abstract.aspx?file_no=',
'/article/abstract/'
];

const paths = [
	'reader/view_abstract.aspx?file_no=',
	'/article/abstract/',
	'article/html'
];

@doubanchan
Copy link
Author

针对生物化学与生物物理进展的文章,暂时修复方法:
Etiller-author-alljournal

	let like = doc.querySelector('div[class*="fudong_ycc"]');	
	let inurl = indomain.some((element) => url.includes(element)) ;
	Z.debug(`insite: ${insite}`);
	if (!insite && !inurl && !like) return false;

@jiaojiaodubai
Copy link
Collaborator

可归类的我们可以写到勤云里面去,不可归类的特殊网址宜写专门的 Translator(优先级更高)

@jiaojiaodubai jiaojiaodubai self-assigned this Aug 15, 2024
@jiaojiaodubai jiaojiaodubai added the enhancement New feature or request label Aug 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants