Unicode support #25

Igorbek · 2012-07-09T22:50:27Z

Internally the project uses 'char' instead of 'wchar_t' for manipulating strings.
So that means no Unicode support?

akhleung · 2012-07-09T22:55:38Z

Not currently; I didn't have time to look into it when we began the project. Is it as simple as using wchar_t? Will I need to rely on a 3rd party library for checking character classes and such?

Igorbek · 2012-07-09T23:43:58Z

It is not require any 3rd party libraries to manipulate wchar_t. It is simple as using char. All functions for char is also available for wchar_t.

HamptonMakes · 2012-07-09T23:57:50Z

I believe we intend it. We are noobs.

On Mon, Jul 9, 2012 at 3:50 PM, Igorbek <
[email protected]

wrote:

Internally the project uses 'char' instead of 'wchar_t' for manipulating
strings.
So that means no Unicode support?

Reply to this email directly or view it on GitHub:
#25

QuLogic · 2012-07-10T00:28:07Z

There's no need to change to wchar_t if you pick a nice encoding like UTF-8 and stick with it (which I recommend). GLib uses char everywhere and only when you specifically need to deal with some Unicode peculiarities do you need to use any other functions.

Igorbek · 2012-07-10T03:10:56Z

Thanks, @QuLogic, I would agree with you.
But the library incorrect loads (only ANSII and UTF-8 without BOM encodings) and saves with encoding.
It is still the problem.

QuLogic · 2012-07-10T04:00:10Z

I just tried a file with actual UTF-8 characters and it works just fine. But it doesn't have a BOM because it's not necessary in UTF-8. I think all that needs to be done is to ignore the BOM (assuming the code's going to work with UTF-8 only, that is).

Igorbek · 2012-07-10T04:49:24Z

I think the problem is not just in encoding. If library interface takes a UTF-8 encoded char*, I'll convert to it from any encoding.
But, if my code imports some file (via @import derective), I can't specify the encoding of that file, and the libsass always interpretate it as UTF-8 without BOM.
In some cases, I can't control of the encoding of the files. But I can know what encoding is in every file (by detection algorithms, user settings or transport-specifiec information, in by ex. HTTP Content-Type).
I think the best solution for this is introduce ability to provide some interface, that would be able to resolve file paths and file contents.
Something like this:

class SourceContext
{
public:
    virtual std::string get_content() = 0;
    virtual std::shared_ptr<SourceContext> resolve_path(std::string path) = 0;
};

QuLogic · 2012-07-10T05:14:26Z

Ah, if only everyone just used UTF-8. But yes, I forgot about the @import issue.

I think if libsass says "I assume UTF-8 everywhere", then we fix up issue #21 nicely (in some way similar to what you propose), libsass could just have the application/bindings deal with the encoding.

georgiosd · 2013-06-26T17:32:14Z

Guys, have you decided what do on this? As already mentioned, it breaks with files that include a BOM.

I used to do C++ before I discovered C# and wchar_t was a big pain to use back then because it meant different OS support (I think Unicode was supported after Windows 2000/XP) and you have to use compiler flags to produce ANSI and Unicode versions specifically.

In this day and age when all OSes support Unicode, changing wchar_t should be trivial. The only "problem" is that the memory usage will double automatically - though this is not really a big problem given the size of computer memory and SCSS files.

Let me know what you think.

akhleung · 2013-06-26T17:42:46Z

LibSass will currently read a BOM if present and reject any files that aren't UTF-8.

Aside from that, since we'd like to avoid external dependencies if possible, I'll look into using wchar_t then.

georgiosd · 2013-06-26T18:08:58Z

Seriously?? Then it must be because this guy hasn't rebased in 3 months.
Anyhow, this was my fix: TBAPI-0KA#1

georgiosd · 2013-06-26T18:16:36Z

If you get stuck anywhere with wchar_t, feel free to comment here and I'll try to help

craigbarnes · 2013-06-26T18:23:31Z

... should be trivial ...

Famous last words.

georgiosd · 2013-06-26T18:24:25Z

LOL. Fair comment but at I least I went for "should" vs "will" :)

akhleung · 2014-06-03T23:14:13Z

LibSass is only going to support UTF-8 for the forseeable future. This support is mostly implemented, and there are tickets for the little edge-cases on which it fails.

pull latest master, fixes one segfault

darrenkopp mentioned this issue Sep 19, 2013

Unicode/utf-8 issues darrenkopp/SassyStudio#8

Closed

akhleung closed this as completed Jun 3, 2014

thatguystone pushed a commit to thatguystone/libsass that referenced this issue May 10, 2018

Merge pull request sass#25 from wellington/feature/fixwinsegfault

8e4c1d4

pull latest master, fixes one segfault

skyvast404 mentioned this issue Dec 4, 2019

Heap-buffer-overflow in lexer.hpp #3045

Closed

xzyfer mentioned this issue Aug 24, 2020

Performance regression upgrading from v3.5.2 to v3.6.4 #3125

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unicode support #25

Unicode support #25

Igorbek commented Jul 9, 2012

akhleung commented Jul 9, 2012

Igorbek commented Jul 9, 2012

HamptonMakes commented Jul 9, 2012

QuLogic commented Jul 10, 2012

Igorbek commented Jul 10, 2012

QuLogic commented Jul 10, 2012

Igorbek commented Jul 10, 2012

QuLogic commented Jul 10, 2012

georgiosd commented Jun 26, 2013

akhleung commented Jun 26, 2013

georgiosd commented Jun 26, 2013

georgiosd commented Jun 26, 2013

craigbarnes commented Jun 26, 2013

georgiosd commented Jun 26, 2013

akhleung commented Jun 3, 2014

Unicode support #25

Unicode support #25

Comments

Igorbek commented Jul 9, 2012

akhleung commented Jul 9, 2012

Igorbek commented Jul 9, 2012

HamptonMakes commented Jul 9, 2012

QuLogic commented Jul 10, 2012

Igorbek commented Jul 10, 2012

QuLogic commented Jul 10, 2012

Igorbek commented Jul 10, 2012

QuLogic commented Jul 10, 2012

georgiosd commented Jun 26, 2013

akhleung commented Jun 26, 2013

georgiosd commented Jun 26, 2013

georgiosd commented Jun 26, 2013

craigbarnes commented Jun 26, 2013

georgiosd commented Jun 26, 2013

akhleung commented Jun 3, 2014