Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace github-linguist gem with Node port Nixinova/Linguist #436

Merged
merged 9 commits into from
Aug 1, 2021
Merged

Replace github-linguist gem with Node port Nixinova/Linguist #436

merged 9 commits into from
Aug 1, 2021

Conversation

Nixinova
Copy link
Contributor

@Nixinova Nixinova commented Aug 1, 2021

Description

Using the github-linguist gem directly is a bit of a hack, and any changes to console.log output breaks Metrics, like what happened with the release of v7.16.

Nixinova/Linguist is a Node port of github-linguist which allows for proper JavaScript language data generation.

This change also allows for prose and data languages to be included in any generated metrics as a new option if so desired.

Tests

Refs

Using a Node port of Linguist makes language data gathering less prone to breakage as Metrics previously relied on `console.log` output to fetch statistics, where SemVer doesn't apply.
Version 1.4.2 fixes a crash
Version 1.4.3 mandates consistent absolute paths in output
Dockerfile Show resolved Hide resolved
Copy link
Owner

@lowlighter lowlighter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution!
It'll definitely improve stability of languages plugins 👍

@lowlighter lowlighter merged commit 9e77a1b into lowlighter:master Aug 1, 2021
@Nixinova
Copy link
Contributor Author

Nixinova commented Aug 1, 2021

Since the output of Linguist is this ↓, is there anywhere in the languages plugin where language processing can now be simplified?

Linguist output schema
{
	"count": 3,
	"results": {
		"src/index.ts": "TypeScript",
		"src/cli.js": "JavaScript",
		"readme.md": "Markdown"
	},
	"languages": {
		"all": {
			"JavaScript": { "type": "programming", "bytes": 1000 },
			"TypeScript": { "type": "programming", "bytes": 2000 },
			"Markdown": { "type": "prose", "bytes": 3000 },
		},
		"programming": { "JavaScript": 1000, "TypeScript": 2000 },
		"markup": {},
		"data": {},
		"prose": { "Markdown": 3000 },
		"unknown": {},
		"total": { "unique": 3, "bytes": 6000, "unknownBytes": 0 }
	}
}

For example I see a couple instances of looped 'results[lang].bytes+=size':

//Process repository languages
for (const {size, node:{color, name}} of Object.values(repository.languages.edges)) {
languages.stats[name] = (languages.stats[name] ?? 0) + size
languages.colors[name] = colors[name.toLocaleLowerCase()] ?? color ?? "#ededed"
languages.total += size
}

(I could also add a color key to languages.all[lang] if it would be useful.)

@lowlighter
Copy link
Owner

This loop is for GitHub api data, not for linguist data. It's the non-indepth mode so it cannot be removed.

Linguist is only called in async function analyze() from analyzers.mjs, and I don't think we can simplify it further since once linguist data is gathered, that's pretty much it:

//Gather language data
console.debug(`metrics/compute/${login}/plugins > languages > indepth > running linguist`)
const {results:files, languages:languageResults} = await linguist(path)

Adding the color key would help to close #339 though, since it'll help covering languages that are not loaded from GitHub api and not user-defined

@lowlighter lowlighter mentioned this pull request Aug 2, 2021
@lowlighter
Copy link
Owner

I'll make a release soon even if there are still improvements that can be made, since #424 still affects @latest release

@Nixinova
Copy link
Contributor Author

Nixinova commented Aug 2, 2021

I also think the problem in #356 may be covered by my linguist.

@Nixinova
Copy link
Contributor Author

Nixinova commented Aug 3, 2021

Adding the color key would help to close #339 though, since it'll help covering languages that are not loaded from GitHub api and not user-defined

Added in 1.4.4: get from .languages.all[lang].color

@lowlighter
Copy link
Owner

I also think the problem in #356 may be covered by my linguist.

This PR was for gathering patches from GitHub events along with their associated repository .gitattributes. Since it's mostly for code collection I think it's unrelated though I may be wrong.

Added in 1.4.4: get from .languages.all[lang].color

Nice 👍

@github-actions github-actions bot locked and limited conversation to collaborators Aug 23, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Languages plugin no longer works
2 participants