Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify string handling #2

Open
meshula opened this issue Feb 18, 2021 · 8 comments · May be fixed by #15
Open

Unify string handling #2

meshula opened this issue Feb 18, 2021 · 8 comments · May be fixed by #15
Labels

Comments

@meshula
Copy link
Member

meshula commented Feb 18, 2021

Unify string handling. cf. AcademySoftwareFoundation/OpenTimelineIO@fdc9570 Convert all string returning functions to return an OpenTimelineIO string pointer, and replace all string destroy functions with a single OTIO string destroy function.

@KarthikRIyer
Copy link
Contributor

What do you mean by an OpenTimelineIO string pointer? A typedef?
If we're going to be using simple char *, why do we need anything else? Is it just to make it explicit that the user has to take care of freeing the strings?

@meshula
Copy link
Member Author

meshula commented May 14, 2021

I mean an actual OTIO C string interning struct, with creation and deletion pair. The reason is that we are allocating strings internally to the library, and currently requiring users to call free. In general, that's not a great pattern, because it assumes that OTIO and a user are using the same memory allocator. At the moment, it's a safe assumption because OTIO is statically linked. If that ever changes it would be a problem, because OTIO might be compiled against the stdc library malloc/free, but a user library might be compiled with a custom allocator such as jemalloc. In that scenario calling free would corrupt memory.

The current state of my investigation on this is that redis' string interning library is a better candidate to fulfill this role, than writing our own. https://github.com/antirez/sds however, we don't actually need the full suite of functionality (strcat, dup, len, etc.) so an alternate might be to adopt their memory management structure but not the library. This is what they do:

[preamble][interned string][\0]
           ^---- returned char* points to interned string for compatibility stdc string libraries

Freeing that pointer means computing the address of preamble, and freeing that resulting address to the heap.

@KarthikRIyer
Copy link
Contributor

Ok. I get the gist. I'll take a look at the link you've sent and ask again if I have doubts.

@KarthikRIyer
Copy link
Contributor

KarthikRIyer commented Jun 15, 2021

@meshula I was looking at sds.

https://github.com/antirez/sds/blob/master/sds.c#L60-L74

static inline char sdsReqType(size_t string_size) {
    if (string_size < 1<<5)
        return SDS_TYPE_5;
    if (string_size < 1<<8)
        return SDS_TYPE_8;
    if (string_size < 1<<16)
        return SDS_TYPE_16;
#if (LONG_MAX == LLONG_MAX)
    if (string_size < 1ll<<32)
        return SDS_TYPE_32;
    return SDS_TYPE_64;
#else
    return SDS_TYPE_32;
#endif
}

I don't understand the check for (LONG_MAX == LLONG_MAX).

Also do we also need to support all the types? I think maybe yes if the bindings were to be used on an embedded system with limited memory? Anyways we need to store something in the string interning struct before our buffer because just a flexible array member is not allowed in an otherwise empty struct. If we support all types, then there's no problem. I can just copy sds' implementation.

@meshula
Copy link
Member Author

meshula commented Jun 15, 2021

on a compiler targeting 32 bit architectures, LONG_MAX is equal to LLONG_MAX, but they are not equal when targeting 64.

Picking a header with variable size is a micro-optimization that would yield benefits when the number of strings is enormous. i.e. on a 64 bit architecture short strings would win back 7 bytes per string.

@KarthikRIyer
Copy link
Contributor

Since you mentioned that we do not need the full functionality of sds (like len, cat, etc), we do not need the preamble, right? The preamble is for storing length, type and allocated size.

Could we just typedef char *otiostr and add creation & deletion methods, and ask users to call otiostr_create() & otiostr_delete()?

@KarthikRIyer
Copy link
Contributor

KarthikRIyer commented Jun 16, 2021

on a compiler targeting 32 bit architectures, LONG_MAX is equal to LLONG_MAX, but they are not equal when targeting 64

I printed LONG_MAX and LLONG_MAX by compiling for 32 & 64 bit and got this:

karthik@karthik-Inspiron-5567:~/try$ gcc -m32 try.c -o print32
karthik@karthik-Inspiron-5567:~/try$ ./print32 
2147483647
9223372036854775807
karthik@karthik-Inspiron-5567:~/try$ gcc -m64 try.c -o print64
karthik@karthik-Inspiron-5567:~/try$ ./print64 
9223372036854775807
9223372036854775807

But now I understand the condition

@meshula
Copy link
Member Author

meshula commented Jun 16, 2021

Not what I expected, obviously ;) Now I understand the condition as well LOL

@KarthikRIyer KarthikRIyer linked a pull request Jun 17, 2021 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants