Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tsx disambiguation #2761

Closed
mking opened this issue Dec 6, 2015 · 8 comments
Closed

tsx disambiguation #2761

mking opened this issue Dec 6, 2015 · 8 comments

Comments

@mking
Copy link

mking commented Dec 6, 2015

Here is an example where a TypeScript .tsx file is incorrectly recognized as tiled map format.

One way to solve this is to add a disambiguation heuristic for tsx (similar to the one used for ts).

    disambiguate ".tsx" do |data|
      if data.include?("<tile")
        Language["XML"]
      else
        Language["TypeScript"]
      end
    end

What do you think?

@pchaigno
Copy link
Contributor

pchaigno commented Dec 6, 2015

The heuristic for .ts files is a particular case because .ts XML files must always contain a TS tag.
That does not seem to always be the case for .tsx files. As such, I'm not sure there are enough mis-classifications of .tsx files to justify a new heuristic...

If you think this is a recurrent classification mistake, we could evaluate the number of files incorrectly classified for a sample?
Otherwise, you can use Linguist overrides to fix this issue on a per repository basis.

@mking
Copy link
Author

mking commented Dec 7, 2015

Ah ok. So, I did some additional research into the .tsx XML format and into the number of misclassified files:

  • Every .tsx file has an outer <tileset> element. So I think if we use <tileset we will catch every valid .tsx XML file and will only mistakenly classify React files if they have a <tileset> element. This should be rare since the React convention for non-builtin elements is to start with an uppercase letter. Also, note that <tileset (with space) should not exclude any valid files because the dtd requires at least one attribute (firstgid).
  • GitHub search brings up 322 misclassified .tsx TypeScript files. (Every tsx file will have an import React at the top.)
  • In total, I see 1893 tsx TypeScript files, so it seems like the number of misclassified files is not insignificant.

@arfon
Copy link
Contributor

arfon commented Mar 7, 2016

Thanks for documenting this @mking. We'd welcome a Pull Request implementing this heuristic. Alternatively the Linguist overrides (https://github.com/github/linguist#overrides) are probably the best solution for now.

@arfon arfon closed this as completed Mar 7, 2016
@dsifford
Copy link

dsifford commented May 30, 2016

@arfon I've tried now using every glob combination I can think of to get .tsx to register as TypeScript in .gitattributes, but it simply will not.

Am I missing something? https://github.com/dsifford/academic-bloggers-toolkit/blob/master/.gitattributes

@arfon
Copy link
Contributor

arfon commented May 31, 2016

@dsifford - can you tell me what you're not seeing? These are the language stats I see for your repository:

screen shot 2016-05-30 at 6 57 15 pm

@dsifford
Copy link

@arfon Sorry, perhaps I had the wrong idea on how deeply linguist goes in github..

My concern was with this:

image

Specifically, the .tsx files all are seemingly registered as XML files.

@arfon
Copy link
Contributor

arfon commented Jun 7, 2016

@arfon I've tried now using every glob combination I can think of to get .tsx to register as TypeScript in .gitattributes, but it simply will not.

I'm afraid .gitattributes data only overrides the repository classification. They're currently not supported by the search infrastructure or the syntax-highlighting of files sorry.

@dsifford
Copy link

dsifford commented Jun 7, 2016

Ah, understood. No problem at all. Thanks for the clarification 👍

alloy added a commit to alloy/linguist that referenced this issue Feb 8, 2017
lildude pushed a commit that referenced this issue Feb 20, 2017
Using the technique as discussed in #2761.
@github-linguist github-linguist locked as resolved and limited conversation to collaborators Jun 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants