Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode support #25

Closed
Igorbek opened this issue Jul 9, 2012 · 15 comments
Closed

Unicode support #25

Igorbek opened this issue Jul 9, 2012 · 15 comments

Comments

@Igorbek
Copy link
Contributor

Igorbek commented Jul 9, 2012

Internally the project uses 'char' instead of 'wchar_t' for manipulating strings.
So that means no Unicode support?

@akhleung
Copy link

akhleung commented Jul 9, 2012

Not currently; I didn't have time to look into it when we began the project. Is it as simple as using wchar_t? Will I need to rely on a 3rd party library for checking character classes and such?

@Igorbek
Copy link
Contributor Author

Igorbek commented Jul 9, 2012

It is not require any 3rd party libraries to manipulate wchar_t. It is simple as using char. All functions for char is also available for wchar_t.

@HamptonMakes
Copy link
Member

I believe we intend it. We are noobs.

On Mon, Jul 9, 2012 at 3:50 PM, Igorbek <
[email protected]

wrote:

Internally the project uses 'char' instead of 'wchar_t' for manipulating
strings.
So that means no Unicode support?


Reply to this email directly or view it on GitHub:
#25

@QuLogic
Copy link
Contributor

QuLogic commented Jul 10, 2012

There's no need to change to wchar_t if you pick a nice encoding like UTF-8 and stick with it (which I recommend). GLib uses char everywhere and only when you specifically need to deal with some Unicode peculiarities do you need to use any other functions.

@Igorbek
Copy link
Contributor Author

Igorbek commented Jul 10, 2012

Thanks, @QuLogic, I would agree with you.
But the library incorrect loads (only ANSII and UTF-8 without BOM encodings) and saves with encoding.
It is still the problem.

@QuLogic
Copy link
Contributor

QuLogic commented Jul 10, 2012

I just tried a file with actual UTF-8 characters and it works just fine. But it doesn't have a BOM because it's not necessary in UTF-8. I think all that needs to be done is to ignore the BOM (assuming the code's going to work with UTF-8 only, that is).

@Igorbek
Copy link
Contributor Author

Igorbek commented Jul 10, 2012

I think the problem is not just in encoding. If library interface takes a UTF-8 encoded char*, I'll convert to it from any encoding.
But, if my code imports some file (via @import derective), I can't specify the encoding of that file, and the libsass always interpretate it as UTF-8 without BOM.
In some cases, I can't control of the encoding of the files. But I can know what encoding is in every file (by detection algorithms, user settings or transport-specifiec information, in by ex. HTTP Content-Type).
I think the best solution for this is introduce ability to provide some interface, that would be able to resolve file paths and file contents.
Something like this:

class SourceContext
{
public:
    virtual std::string get_content() = 0;
    virtual std::shared_ptr<SourceContext> resolve_path(std::string path) = 0;
};

@QuLogic
Copy link
Contributor

QuLogic commented Jul 10, 2012

Ah, if only everyone just used UTF-8. But yes, I forgot about the @import issue.

I think if libsass says "I assume UTF-8 everywhere", then we fix up issue #21 nicely (in some way similar to what you propose), libsass could just have the application/bindings deal with the encoding.

@georgiosd
Copy link

Guys, have you decided what do on this? As already mentioned, it breaks with files that include a BOM.

I used to do C++ before I discovered C# and wchar_t was a big pain to use back then because it meant different OS support (I think Unicode was supported after Windows 2000/XP) and you have to use compiler flags to produce ANSI and Unicode versions specifically.

In this day and age when all OSes support Unicode, changing wchar_t should be trivial. The only "problem" is that the memory usage will double automatically - though this is not really a big problem given the size of computer memory and SCSS files.

Let me know what you think.

@akhleung
Copy link

LibSass will currently read a BOM if present and reject any files that aren't UTF-8.

Aside from that, since we'd like to avoid external dependencies if possible, I'll look into using wchar_t then.

@georgiosd
Copy link

Seriously?? Then it must be because this guy hasn't rebased in 3 months.
Anyhow, this was my fix: TBAPI-0KA#1

@georgiosd
Copy link

If you get stuck anywhere with wchar_t, feel free to comment here and I'll try to help

@craigbarnes
Copy link
Contributor

... should be trivial ...

Famous last words.

@georgiosd
Copy link

LOL. Fair comment but at I least I went for "should" vs "will" :)

@akhleung
Copy link

akhleung commented Jun 3, 2014

LibSass is only going to support UTF-8 for the forseeable future. This support is mostly implemented, and there are tickets for the little edge-cases on which it fails.

@akhleung akhleung closed this as completed Jun 3, 2014
thatguystone pushed a commit to thatguystone/libsass that referenced this issue May 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants