[lldb] LRUCache for Swift type system mangling/demangling #9191

DmT021 · 2024-08-29T04:58:37Z

This is more like a proof of concept rather than the final solution, but it shows relatively good performance improvement for ValueObjectPrinter::PrintValueObject.
Tested using v x where x is a struct with about 35000+ nested unique types. Measured total time of the lldb_private::ValueObjectPrinter::PrintValueObject() call using Instruments (Time Profiler, default resolution)
Before: 24.7 s
After: 17.4 s

There's potential for more performance gains using local LRUCaches and the global one (GetCachedDemangledSymbol). The next obvious candidate for optimization is TypeSystemSwiftTypeRef::DemangleCanonicalType. But it's trickier - it takes Demangler as an argument, so I'd have to change the signature and all the references.

DmT021 · 2024-08-29T05:00:03Z

@augusto2112 please take a look. Not sure if targeting the right branch though

augusto2112

This looks really nice! I'm impressed by the performance gains you mentioned.

I think a great use case for this would be an LRU cache from name -> node pointer inside the typeref type system, and look into replacing all the calls of dem.demangleSymbol to look into the cache.

lldb/source/Plugins/TypeSystem/Swift/LRUCache.h

augusto2112 · 2024-08-29T16:48:03Z

lldb/source/Plugins/TypeSystem/Swift/LRUCache.h

+    auto it = map_.find(key);
+    if (it == map_.end())
+      return std::nullopt;
+    list_.splice(list_.begin(), list_, it->second);


As I understand it, this is moving the requested element to the front of the list, right? I think it'd be worthwhile adding a comment here explaining that.

Moved into a dedicated function

lldb/source/Plugins/TypeSystem/Swift/LRUCache.h

augusto2112 · 2024-08-29T16:52:26Z

lldb/source/Plugins/TypeSystem/Swift/LRUCache.h

+  using Node = std::pair<Key, Value>;
+  using List = std::list<Node>;
+  List list_;
+  std::unordered_map<Key, typename List::iterator> map_;


llvm::DenseMap is almost always more efficient compared to std::unordered_map

StringMap is ok? I use now it to avoid double allocation for keys

lldb/source/Plugins/TypeSystem/Swift/LRUCache.h

lldb/source/Plugins/TypeSystem/Swift/SwiftDemangle.h

augusto2112 · 2024-08-29T17:29:32Z

lldb/source/Plugins/TypeSystem/Swift/TypeSystemSwiftTypeRef.h

@@ -505,6 +506,8 @@ class TypeSystemSwiftTypeRef : public TypeSystemSwift {

  /// All lldb::Type pointers produced by DWARFASTParser Swift go here.
  ThreadSafeDenseMap<const char *, lldb::TypeSP> m_swift_type_map;
+  swift_demangle::LRUCache<std::string, CompilerType> m_canonical_types_cache;


We'll probably need to make sure these data structures are thread safe.

Yes, good point. I thought about this but then forgot. I'll add a mutex

augusto2112 · 2024-08-29T17:29:48Z

lldb/source/Plugins/TypeSystem/Swift/TypeSystemSwiftTypeRef.h

@@ -505,6 +506,8 @@ class TypeSystemSwiftTypeRef : public TypeSystemSwift {

  /// All lldb::Type pointers produced by DWARFASTParser Swift go here.
  ThreadSafeDenseMap<const char *, lldb::TypeSP> m_swift_type_map;
+  swift_demangle::LRUCache<std::string, CompilerType> m_canonical_types_cache;


I wonder if these specialized caches per function are scalable though, as we'd need one cache per function.

I suppose we could only create caches for functions we know are very hot.

Yes, I'm going to collect more detailed logs to see how many times a function is called with the same mangled name so we don't cache functions called one or two times.

DmT021 · 2024-08-30T06:57:17Z

I reimplemented LRUCache on StringMap. The idea is to avoid having two copies of a key in the list and map. The algorithm is:

insert a new key-value pair into the map, but instead of a real list iterator set a stub; the map will create a copy of the key
get this copy as a StringRef and put it into the list
get the fresh begin() iterator of the list and put it back into the map

So only the map is responsible for owning strings.

DmT021 · 2024-09-02T02:53:59Z

I collected some statistics about calls to demangleSymbol and mangleNode for frame variable.

The object that is printed looks like this:

struct SomeRandomValueTagged<Tag>: HasDefault {
  private var value: SomeClass

  private init(value: SomeClass) {
    self.value = value
  }

  static var defaultValue: Self {
    Self(value: SomeClass())
  }
}

private final class SomeClass {
  init() {}
}

enum Tag0 {}
enum Tag1 {}
enum Tag2 {}
enum Tag3 {}
enum Tag4 {}
enum Tag5 {}
enum Tag6 {}
enum Tag7 {}

final class L0<Tag> {
  var v0 = SomeRandomValueTagged<(Tag0, Tag)>.defaultValue
  var v1 = SomeRandomValueTagged<(Tag1, Tag)>.defaultValue
  var v2 = SomeRandomValueTagged<(Tag2, Tag)>.defaultValue
  var v3 = SomeRandomValueTagged<(Tag3, Tag)>.defaultValue
  var v4 = SomeRandomValueTagged<(Tag4, Tag)>.defaultValue
  var v5 = SomeRandomValueTagged<(Tag5, Tag)>.defaultValue
  var v6 = SomeRandomValueTagged<(Tag6, Tag)>.defaultValue
  var v7 = SomeRandomValueTagged<(Tag7, Tag)>.defaultValue
}

...

let var0 = L0<Void>()
// (lldb) v var0

Calls to `demangleSymbol`

Symbol	no cache	cache
$s13lldb_perf_dbg21SomeRandomValueTaggedVyAA4Tag0O_yttGD	124	72
$s13lldb_perf_dbg21SomeRandomValueTaggedVyAA4Tag1O_yttGD	122	72
$s13lldb_perf_dbg21SomeRandomValueTaggedVyAA4Tag2O_yttGD	122	72
$s13lldb_perf_dbg21SomeRandomValueTaggedVyAA4Tag3O_yttGD	122	72
$s13lldb_perf_dbg21SomeRandomValueTaggedVyAA4Tag4O_yttGD	122	72
$s13lldb_perf_dbg21SomeRandomValueTaggedVyAA4Tag5O_yttGD	122	72
$s13lldb_perf_dbg21SomeRandomValueTaggedVyAA4Tag6O_yttGD	122	72
$s13lldb_perf_dbg21SomeRandomValueTaggedVyAA4Tag7O_yttGD	122	72
$s13lldb_perf_dbg2L0Cys4VoidaGD	126	53
$s13lldb_perf_dbg2L0CyytGD	336	221
$s13lldb_perf_dbg3FooCACycfc	4	4
$s13lldb_perf_dbg3FooCACycfc$s13lldb_perf_dbg3FooCACycfc	0	0
$s13lldb_perf_dbg9SomeClass33_D6034200B16897271DB50112FD04A664LLC	18	18
$s13lldb_perf_dbg9SomeClass33_D6034200B16897271DB50112FD04A664LLCD	1190	585
Total	2652	1457

Calls to `mangleNode`

Symbol	no cache	cache
$s13lldb_perf_dbg21SomeRandomValueTaggedVD	72	32
$s13lldb_perf_dbg21SomeRandomValueTaggedVyAA4Tag0O_yttGD	50	9
$s13lldb_perf_dbg21SomeRandomValueTaggedVyAA4Tag1O_yttGD	49	9
$s13lldb_perf_dbg21SomeRandomValueTaggedVyAA4Tag2O_yttGD	49	9
$s13lldb_perf_dbg21SomeRandomValueTaggedVyAA4Tag3O_yttGD	49	9
$s13lldb_perf_dbg21SomeRandomValueTaggedVyAA4Tag4O_yttGD	49	9
$s13lldb_perf_dbg21SomeRandomValueTaggedVyAA4Tag5O_yttGD	49	9
$s13lldb_perf_dbg21SomeRandomValueTaggedVyAA4Tag6O_yttGD	49	9
$s13lldb_perf_dbg21SomeRandomValueTaggedVyAA4Tag7O_yttGD	49	9
$s13lldb_perf_dbg2L0CD	35	11
$s13lldb_perf_dbg2L0Cys4VoidaGD	39	6
$s13lldb_perf_dbg2L0CyytGD	113	33
$s13lldb_perf_dbg9SomeClass33_D6034200B16897271DB50112FD04A664LLCD	543	84
$ss4VoidaD	39	6
Total	1234	244

adrian-prantl · 2024-09-05T00:42:39Z

First: thanks for looking into this!

I believe that this kind of caching would be okay, since no clients ever modify existing nodes. AFAIK, they only ever add new node that point to existing ones. However, I wonder if sharing the demangle context between different threads could be an issue? It might need to guarded by a lock (I haven't checked whether you do this already), which might eat some of the performance benefits.

I was at first unsure about using shared pointers to store the demangle context and wondered if maybe the typesystem should own a global context, but I'm starting to think of this as a compression scheme where we unpack the demangle trees when we need to do work on them, and from that point of view something like you have makes more sense.

Did you pick this function because it is the most expensive leaf function inside of LLDB? For example, I would imagine that we can get even more out of it by caching the canonical demangle tree, which involves Clang type lookups, but getting the cache key right for that is also going to be more tricky.

Does the cache have to be owned by TypeSystemSwiftTypeRef (of which there are 100s on Darwin) or would there be a benefit to it being a static singleton?

DmT021 · 2024-09-05T03:29:36Z

However, I wonder if sharing the demangle context between different threads could be an issue? It might need to guarded by a lock (I haven't checked whether you do this already), which might eat some of the performance benefits.

For now, these shared demanglers are not participating in any operations at all. I just need them to be alive as long as I need the associated nodes to be alive. I implemented one operation on the node - GetType - but it's read-only. So I don't see how a concurrent access to a cached node can cause issues. But once we want to make a mutating operation (e.g. make another tree) reusing a shared demangler we would need some kind of thread-safe wrapper for it or make the Demangler type itself thread-safe.
Also, I don't know a case when a TypeSystemSwiftTypeRef is used from multiple threads. The frame variable command seems to be single-threaded. Could you please give me a hint where I can find an example of multithreaded use?

I was at first unsure about using shared pointers to store the demangle context and wondered if maybe the typesystem should own a global context

The main concern with a global context is uncontrollable memory consumption. The Demangler type has no public API to free some of the nodes it produced (it's only clear() which removes everything).
I made the sizes of the caches 10 elements and it seems to be enough from performance perspective and basically no additional memory footprint.

Did you pick this function because it is the most expensive leaf function inside of LLDB?

Not really. I mean, these functions were pretty hot. But actually, I just noticed heavy mangling and demangling calls in the inverted call tree and then made caches for functions calling them that were the easiest.

For example, I would imagine that we can get even more out of it by caching the canonical demangle tree, which involves Clang type lookups, but getting the cache key right for that is also going to be more tricky.

Sorry, but I don't understand what you mean by "the canonical demangle tree". Do you mean the result of GetCanonicalDemangleTree? If so I thought about it but opted to cache GetCanonicalType instead, which calls GetCanonicalDemangleTree. Is there a reason to do vice versa?

Does the cache have to be owned by TypeSystemSwiftTypeRef (of which there are 100s on Darwin) or would there be a benefit to it being a static singleton?

I think it doesn't really matter whether it's global or type-system-local. For example the cache for GetCanonicalType is bound to TypeSystemSwiftTypeRef, but the cache for bare calls to demangleSymbol is global. So we can unbind the caches from TypeSystemSwiftTypeRef (maybe we'd want to increase the capacity of these caches slightly then). I doubt there would be any significant difference but I don't see reasons not to do so either.
But again, I didn't see a use of TypeSystemSwiftTypeRef under a heavy multithreaded load, so maybe something in such circumstances can change my mind.

adrian-prantl · 2024-09-05T21:19:19Z

lldb/source/Plugins/TypeSystem/Swift/TypeSystemSwiftTypeRef.cpp

@@ -2991,6 +2997,12 @@ TypeSystemSwiftTypeRef::GetCanonicalType(opaque_compiler_type_t type) {
    ConstString mangled(mangling.result());
    return GetTypeFromMangledTypename(mangled);
  };
+  auto impl = [&] {
+    auto mangled_name = StringRef(AsMangledName(type));
+    return m_canonical_types_cache.GetOrCreate(


I don't think this is necessarily correct, since constructing the canonical demangle tree involves DWARF lookups, but they are not part of the cache key here.

In what cases might a DWARF lookup return different results? When we add a new module to the module list?

Yes that would be one such case.

Generally, a good way to be sure that we understand all implicit dependencies would be to make the function we want to cache astatic function first, then all dependencies become explicit, and we know they need to go into the cache key (or invalidate the cache).

augusto2112 · 2024-09-05T21:50:38Z

Also, I don't know a case when a TypeSystemSwiftTypeRef is used from multiple threads. The frame variable command seems to be single-threaded. Could you please give me a hint where I can find an example of multithreaded use?

You might not see it on the command line but LLDB is also a library. lldb-dap, which is how LLDB is integrated into VSCode for example, can have multiple threads calling into the type system to show local variables. You can implement an example yourself with a Python script using the SBAPI (you might need to set SBDebugger.SetAsync to true, I'm not sure).

Sorry, but I don't understand what you mean by "the canonical demangle tree". Do you mean the result of GetCanonicalDemangleTree? If so I thought about it but opted to cache GetCanonicalType instead, which calls GetCanonicalDemangleTree. Is there a reason to do vice versa?

One concern is that CompilerType can be produced by both TypeSystemSwiftTypeRef and SwiftASTContext. In GetCanonicalType specifically we don't return a CompilerType backed by a SwiftASTContext but SwiftASTContext influences the result via ReconstructType. Any cache hits here might be stale if something changes in SwiftASTContext (for example, a call to ModulesDidLoad).

I think Adrian brings up a great point, in that we need to be very careful in undertanding what the cache key has to be in each of these cases.

To move this patch forward I propose we focus on caching the results of Demangler.demangleSymbol, since its only dependency to that is the mangled name (as long as the second argument to that function, SymbolicReferenceResolver, isn't passed in, which in most places it isn't). I think we should replace GetDemangledType with GetCachedDemangledType and update all the calls to work with the new return type.

augusto2112 · 2024-09-05T21:53:02Z

lldb/source/Plugins/TypeSystem/Swift/SwiftDemangle.h

+  explicit operator bool() const { return IsValid(); }
+
+  /// Gets a raw pointer to the Node.
+  swift::Demangle::NodePointer GetRawNode() const & { return m_node; }


I wonder if this function is necessary if we already implement the -> operator.

DmT021 · 2024-09-05T22:36:02Z

One concern is that CompilerType can be produced by both TypeSystemSwiftTypeRef and SwiftASTContext. In GetCanonicalType specifically we don't return a CompilerType backed by a SwiftASTContext but SwiftASTContext influences the result via ReconstructType. Any cache hits here might be stale if something changes in SwiftASTContext (for example, a call to ModulesDidLoad).

Interestingly SwiftASTContext has its own cache mangled_typename->swift::TypeBase * which is used by ReconstructTypeImpl. But I don't see any invalidation of it; only insert and lookup are performed on m_mangled_name_to_type_map. Is it a bug or I'm missing something?

augusto2112 · 2024-09-05T23:25:25Z

One concern is that CompilerType can be produced by both TypeSystemSwiftTypeRef and SwiftASTContext. In GetCanonicalType specifically we don't return a CompilerType backed by a SwiftASTContext but SwiftASTContext influences the result via ReconstructType. Any cache hits here might be stale if something changes in SwiftASTContext (for example, a call to ModulesDidLoad).

Interestingly SwiftASTContext has its own cache mangled_typename->swift::TypeBase * which is used by ReconstructTypeImpl. But I don't see any invalidation of it; only insert and lookup are performed on m_mangled_name_to_type_map. Is it a bug or I'm missing something?

Hmm, it looks like SwiftASTContext has both a positive (m_mangled_name_to_type_map) and negative cache (m_negative_type_cache). A type can only be found in a single module, so any time the type is found it can be safely cached. Types that are stored in the negative cache have to be cleared when new modules are added, because the type might be in the new module.

Your cached version of GetCanonicalType doesn't take that into account and could potentially return an empty CompilerType if a module containing the type that is being lookup up is loaded later on.

DmT021 · 2024-09-05T23:51:02Z

One concern is that CompilerType can be produced by both TypeSystemSwiftTypeRef and SwiftASTContext. In GetCanonicalType specifically we don't return a CompilerType backed by a SwiftASTContext but SwiftASTContext influences the result via ReconstructType. Any cache hits here might be stale if something changes in SwiftASTContext (for example, a call to ModulesDidLoad).

Interestingly SwiftASTContext has its own cache mangled_typename->swift::TypeBase * which is used by ReconstructTypeImpl. But I don't see any invalidation of it; only insert and lookup are performed on m_mangled_name_to_type_map. Is it a bug or I'm missing something?

Hmm, it looks like SwiftASTContext has both a positive (m_mangled_name_to_type_map) and negative cache (m_negative_type_cache). A type can only be found in a single module, so any time the type is found it can be safely cached. Types that are stored in the negative cache have to be cleared when new modules are added, because the type might be in the new module.

Your cached version of GetCanonicalType doesn't take that into account and could potentially return an empty CompilerType if a module containing the type that is being lookup up is loaded later on.

When a type isn't found or found in the negative cache ReconstructTypeImpl will return an error which is mapped to nullptr in TypeSystemSwiftTypeRef::ReconstructType. So maybe we can still cache the result if ast_type.GetOpaqueQualType() != nullptr?

augusto2112 · 2024-09-10T23:46:15Z

One concern is that CompilerType can be produced by both TypeSystemSwiftTypeRef and SwiftASTContext. In GetCanonicalType specifically we don't return a CompilerType backed by a SwiftASTContext but SwiftASTContext influences the result via ReconstructType. Any cache hits here might be stale if something changes in SwiftASTContext (for example, a call to ModulesDidLoad).

Interestingly SwiftASTContext has its own cache mangled_typename->swift::TypeBase * which is used by ReconstructTypeImpl. But I don't see any invalidation of it; only insert and lookup are performed on m_mangled_name_to_type_map. Is it a bug or I'm missing something?

Hmm, it looks like SwiftASTContext has both a positive (m_mangled_name_to_type_map) and negative cache (m_negative_type_cache). A type can only be found in a single module, so any time the type is found it can be safely cached. Types that are stored in the negative cache have to be cleared when new modules are added, because the type might be in the new module.
Your cached version of GetCanonicalType doesn't take that into account and could potentially return an empty CompilerType if a module containing the type that is being lookup up is loaded later on.

When a type isn't found or found in the negative cache ReconstructTypeImpl will return an error which is mapped to nullptr in TypeSystemSwiftTypeRef::ReconstructType. So maybe we can still cache the result if ast_type.GetOpaqueQualType() != nullptr?

That'd be a way to do it, but if the type is already cached in SwiftASTContext I don't why we would want to recache it in TypeSystemSwiftTypeRef.

DmT021 · 2024-09-12T04:54:55Z

@augusto2112 I replaced some of the calls to demangleSymbol and want to check we are on the same page.
The changes in DemangleCanonicalType seem fine as all of its usages are very local; the raw node pointer never leaves the scopes of invocation of DemangleCanonicalType.
But I'm really concerned about GetCanonicalDemangleTreeWithCache. The issue is it produces a new tree based on a cached tree, so the result depends on the cached node which lifetime doesn't depend on the dem argument. So I had to make it return a tuple with the result and the original cached node to extend the lifetime.
It's a somewhat error-prone solution. One can easily get the first element of the pair and drop the other one. This will potentially result in a tree with some of the nodes freed.
Not sure if I can come up with a safer API here though.

adrian-prantl · 2024-09-12T15:56:43Z

lldb/source/Plugins/TypeSystem/Swift/LRUCache.h

+namespace swift_demangle {
+
+template <typename Value>
+class LRUCache {


Can you add short doxygen comments for the class and the public methods?

adrian-prantl · 2024-09-12T15:59:03Z

lldb/source/Plugins/TypeSystem/Swift/LRUCache.h

+  }
+
+private:
+  using List = std::list<llvm::StringRef>;


Would a vector<> and a moving "lru" iterator work and be faster?

Can you elaborate, please? I'm not sure I understand how it could be faster. Both Get and Put are O(1) now. With a vector moving an element to the front would be O(N).

I was thinking of storing the elements of the list in a vector and having an lru iterator that points to the newest entry. When inserting a new value we overwrite the value that is std::next(lru) and increment lru. And then we let that iterator wrap around like in a ring buffer. This would make scanning through the cache faster because we don't need to chase pointers and also use less memory. (Not that that matters for 10 elements). You are right that inserting an element would be more expensive because we need to copy up to N-1 elements. We could just always move the lru iterator to the right, and overwrite whatever is there whenever there is a cache miss. I think that should still give us a set of the last N used settings, we just don't know their relative order any more. But that might not matter.

Got it now, thanks. Yeah, this can work, but we have to sacrifice reordering in Get. I think that's ok. It looks like most accesses to a cache item are sequential. In this case, pretty much any replacement strategy would work (even RR). I doubt it should be called "LRU" then as we effectively discard least recently inserted items.
I'll prototype this variant.

Oh that's a good point though. I didn't think o this being a global cache. We might be having a lot of long-lived types that we need to demangle over and over again. We might benefit from having a much larger cache, but then we probably need a DenseMap for fast lookups and maybe separately track the LRU list? I don't want to get hung up on the implementation details here. The nice thing is we can change and fine-tune it after an initial version landed, too. So from that point of view we can also keep your list-based implementation.

Is DenseMap faster for lookups than StringMap? I chose StringMap because it's used in the ConstString's pools, so I thought it was the map to go for.

StringMap copies the string key into the map, if you want to take advantage of the ConstString pool DenseMap is probably better.

Is DenseMap faster for lookups than StringMap?

No it's not. I think your current implementation is fine actually.

But what Augusto said, is also true. The mangled names in the CompilerTypes are guaranteed to be ConstStrings, so a DenseMap<const char *, ...> is more efficient. We don't need to compare the strings at all, just the pointers.

Yeah, you're probably right. The conversion opaque_compiler_type_t->ConstString will still do a lookup in a StringMap (one of the ConstString's pools), so it will be a strlen + 2-3 hash operations + the lookup itself. But we will store just pointers and not copy whole strings in the cache. So probably in the end, it will be the same performance with less memory footprint.

adrian-prantl · 2024-09-12T15:59:39Z

lldb/source/Plugins/TypeSystem/Swift/SwiftDemangle.cpp

@@ -0,0 +1,28 @@
+//===-- SwiftDemangle.h ---------------------------------------*- C++ -*-===//


lldb/source/Plugins/TypeSystem/Swift/SwiftDemangle.cpp

adrian-prantl · 2024-09-12T16:04:35Z

lldb/source/Plugins/TypeSystem/Swift/TypeSystemSwiftTypeRef.cpp

@@ -1074,6 +1074,12 @@ TypeSystemSwiftTypeRef::GetCanonicalNode(swift::Demangle::Demangler &dem,

 /// Return the demangle tree representation of this type's canonical
 /// (type aliases resolved) type.
+std::pair<swift::Demangle::NodePointer, SharedDemangledNode> TypeSystemSwiftTypeRef::GetCanonicalDemangleTreeWithCache(


would we ever want the non-cached version?

I'd say no but it's used in DWARFASTParserSwift and it's tricky to refactor it to use the cached version. Before doing so I wanted us to discuss if we are happy with the API for GetCanonicalDemangleTreeWithCache. See my concerns above #9191 (comment)
That's why I temporarily left the non-cached version.

I can see three ways of making this API cleaner:

Make Transform clone the node returned by fn using the new demangler, this way all transformed nodes are guaranteed to be attached to the new demangler. The obvious downside is the unnecessary cloning of nodes.

Change SharedDemangledNode to keep a SmallVector of shared_ptr of demanglers (or perhaps only 2 demanglers, one being optional?) to ensure the entire demangle tree survives, since parts of the demangle tree are owned by different demanglers. The downside is that reasoning about SharedDemangledNode more complex since now two SharedDemangledNode can share a demangler.

Create a new class that owns multiple SharedDemangledNode (to keep their demanglers alive), but only exposes one NodePointer. I think this might be the best approach, but now we have two new classes representing demangled nodes instead of one.

augusto2112

This is looking great! A few small changes and we can merge it in!

augusto2112 · 2024-09-13T19:04:54Z

lldb/source/Plugins/TypeSystem/Swift/TypeSystemSwiftTypeRef.cpp

@@ -1074,6 +1074,12 @@ TypeSystemSwiftTypeRef::GetCanonicalNode(swift::Demangle::Demangler &dem,

 /// Return the demangle tree representation of this type's canonical
 /// (type aliases resolved) type.
+std::pair<swift::Demangle::NodePointer, SharedDemangledNode> TypeSystemSwiftTypeRef::GetCanonicalDemangleTreeWithCache(


I can see three ways of making this API cleaner:

Make Transform clone the node returned by fn using the new demangler, this way all transformed nodes are guaranteed to be attached to the new demangler. The obvious downside is the unnecessary cloning of nodes.

Change SharedDemangledNode to keep a SmallVector of shared_ptr of demanglers (or perhaps only 2 demanglers, one being optional?) to ensure the entire demangle tree survives, since parts of the demangle tree are owned by different demanglers. The downside is that reasoning about SharedDemangledNode more complex since now two SharedDemangledNode can share a demangler.

Create a new class that owns multiple SharedDemangledNode (to keep their demanglers alive), but only exposes one NodePointer. I think this might be the best approach, but now we have two new classes representing demangled nodes instead of one.

augusto2112 · 2024-09-13T19:07:47Z

lldb/source/Plugins/TypeSystem/Swift/LRUCache.h

+  }
+
+private:
+  using List = std::list<llvm::StringRef>;


StringMap copies the string key into the map, if you want to take advantage of the ConstString pool DenseMap is probably better.

augusto2112 · 2024-09-13T19:09:31Z

@swift-ci test

DmT021 · 2024-09-15T07:14:19Z

Recent changes:

LRUCache is reimplemented using DenseMap but still uses std::list so Get acts like in a proper LRU.
DemangleSymbolAsString uses the cache
New type NodePointerWithDeps: it's used when we need to return a NodePointer that depends on a SharedDemangledNode. I generalized it over the list of dependencies so we can use it with other types of dependencies in the future (like std::tuple for example).
Added doc comments and minor fixes

Also, I did quick measurements before and after this commit. The results are much less impressive now but still significant. I need to implement a cache for mangleNode in the next iteration.

	no cache	cache
run 1	24.17004	21.594701
run 2	25.179471	22.074973
run 3	24.796764	22.625824
run 4	24.802422	21.824036
run 5	24.570071	22.691519
avg	24.7037536	22.1622106
stddev	0.3699373855	0.4845473073

DmT021 · 2024-09-16T04:59:10Z

@adrian-prantl Can you help me with the logic of GetNodeForPrintingImpl please? I found your old commit where you changed the behavior of this function - https://github.com/swiftlang/llvm-project/pull/1926/files?diff=split&w=0#diff-170b8939d57bb70978692200f4bd5ac529881de350754df860fa604536127950L646-L672
I'm particularly interested in this section "Strip out LocalDeclNames.".
Before the commit, it created a copy of the node, set two children(module and identifier), and returned it.
Now it doesn't create a copy but modifies the original node by appending module and identifier. It looks like that's not what we want - two nodes are used twice in the tree after this mutation.
But if I patch the implementation so it creates a copy again it segfaults later in mangleFunction:

code and stacktrace

   // Bug-for-bug compatibility. Remove this loop!
   // Strip out LocalDeclNames.
   for (unsigned i = 0; i < node->getNumChildren(); ++i) {
     NodePointer child = node->getChild(i);
     if (child->getKind() == Node::Kind::LocalDeclName)
       for (NodePointer identifier : *child)
         if (identifier->getKind() == Node::Kind::Identifier) {
           NodePointer module = nullptr;
           if (node->getChild(0)->getNumChildren() > 1)
             module = node->getChild(0)->getChild(0);
           if (module->getKind() != Node::Kind::Module)
             break;
   
           NodePointer canonical = CopyNodeWithoutChildren(dem, node); <<< here
           canonical->addChild(module, dem);
           canonical->addChild(identifier, dem);
           return canonical;
         }
   }

Thread 0 Crashed::  Dispatch queue: com.apple.main-thread
0   LLDB  0x117c820fc (anonymous namespace)::Remangler::mangleFunction(swift::Demangle::Node*, unsigned int) + 208
1   LLDB  0x117c92f0c (anonymous namespace)::Remangler::mangleAnyGenericType(swift::Demangle::Node*, llvm::StringRef, unsigned int) + 236
2   LLDB  0x117c8b428 (anonymous namespace)::Remangler::mangleTypeMangling(swift::Demangle::Node*, unsigned int) + 188
3   LLDB  0x117c83ab4 (anonymous namespace)::Remangler::mangleGlobal(swift::Demangle::Node*, unsigned int) + 296
4   LLDB  0x117c7a330 swift::Demangle::mangleNode(swift::Demangle::Node*, llvm::function_ref<swift::Demangle::Node* (swift::Demangle::SymbolicReferenceKind, void const*)>) + 456
5   LLDB  0x117c7a15c swift::Demangle::mangleNode(swift::Demangle::Node*) + 28
6   LLDB  0x11611bf34 lldb_private::TypeSystemSwiftTypeRef::GetSwiftified(swift::Demangle::Demangler&, swift::Demangle::Node*, bool) + 176
7   LLDB  0x1161169a8 lldb_private::TypeSystemSwiftTypeRef::Transform(swift::Demangle::Demangler&, swift::Demangle::Node*, std::__1::function<swift::Demangle::Node* (swift::Demangle::Node*)>) + 1264
8   LLDB  0x11611677c lldb_private::TypeSystemSwiftTypeRef::Transform(swift::Demangle::Demangler&, swift::Demangle::Node*, std::__1::function<swift::Demangle::Node* (swift::Demangle::Node*)>) + 708
9   LLDB  0x11611677c lldb_private::TypeSystemSwiftTypeRef::Transform(swift::Demangle::Demangler&, swift::Demangle::Node*, std::__1::function<swift::Demangle::Node* (swift::Demangle::Node*)>) + 708
10  LLDB  0x11611677c lldb_private::TypeSystemSwiftTypeRef::Transform(swift::Demangle::Demangler&, swift::Demangle::Node*, std::__1::function<swift::Demangle::Node* (swift::Demangle::Node*)>) + 708
11  LLDB  0x11611c6c8 lldb_private::TypeSystemSwiftTypeRef::GetNodeForPrintingImpl(swift::Demangle::Demangler&, swift::Demangle::Node*, bool) + 240
12  LLDB  0x11611c898 lldb_private::TypeSystemSwiftTypeRef::GetDemangleTreeForPrinting(swift::Demangle::Demangler&, lldb_private::ConstString, bool) + 392
13  LLDB  0x116126fec lldb_private::TypeSystemSwiftTypeRef::GetTypeName(void*, bool) + 248
14  LLDB  0x115c1222c lldb_private::CompilerType::GetTypeName(bool) const + 256
15  LLDB  0x1160d66dc DWARFASTParserSwift::ParseTypeFromDWARF(lldb_private::SymbolContext const&, lldb_private::plugin::dwarf::DWARFDIE const&, bool*) + 2120
16  LLDB  0x1160d6944 DWARFASTParserSwift::ParseTypeFromDWARF(lldb_private::SymbolContext const&, lldb_private::plugin::dwarf::DWARFDIE const&, bool*) + 2736
17  LLDB  0x1160bedf8 lldb_private::plugin::dwarf::SymbolFileDWARF::ParseType(lldb_private::SymbolContext const&, lldb_private::plugin::dwarf::DWARFDIE const&, bool*) + 196
18  LLDB  0x1160bb698 lldb_private::plugin::dwarf::SymbolFileDWARF::GetTypeForDIE(lldb_private::plugin::dwarf::DWARFDIE const&, bool) + 904
19  LLDB  0x1160ba77c lldb_private::plugin::dwarf::SymbolFileDWARF::ResolveType(lldb_private::plugin::dwarf::DWARFDIE const&, bool, bool) + 96
20  LLDB  0x1160ba648 lldb_private::plugin::dwarf::SymbolFileDWARF::ResolveTypeUID(unsigned long long) + 120
21  LLDB  0x115c472dc lldb_private::SymbolFileType::GetType() + 64
22  LLDB  0x115b550d8 lldb_private::ValueObjectVariable::GetCompilerTypeImpl() + 24
23  LLDB  0x115b3fba0 lldb_private::ValueObject::MaybeCalculateCompleteType() + 52
24  LLDB  0x115c7f270 lldb_private::Process::IsPossibleDynamicValue(lldb_private::ValueObject&) + 84
25  LLDB  0x115b463f8 lldb_private::ValueObject::CalculateDynamicValue(lldb::DynamicValueType) + 104
26  LLDB  0x115b464bc lldb_private::ValueObject::GetDynamicValue(lldb::DynamicValueType) + 136
27  LLDB  0x115ca5b44 lldb_private::StackFrame::GetValueObjectForFrameVariable(std::__1::shared_ptr<lldb_private::Variable> const&, lldb::DynamicValueType) + 248
28  LLDB  0x115ca4190 lldb_private::StackFrame::GetValueForVariableExpressionPath(llvm::StringRef, lldb::DynamicValueType, unsigned int, std::__1::shared_ptr<lldb_private::Variable>&, lldb_private::Status&) + 1384
29  LLDB  0x1162b7778 CommandObjectFrameVariable::DoExecute(lldb_private::Args&, lldb_private::CommandReturnObject&) + 1520
30  LLDB  0x115bde4a4 lldb_private::CommandObjectParsed::Execute(char const*, lldb_private::CommandReturnObject&) + 704
31  LLDB  0x115bd4a30 lldb_private::CommandInterpreter::HandleCommand(char const*, lldb_private::LazyBool, lldb_private::CommandReturnObject&, bool) + 2588
32  LLDB  0x115bd8110 lldb_private::CommandInterpreter::IOHandlerInputComplete(lldb_private::IOHandler&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&) + 540
33  LLDB  0x115af2a00 lldb_private::IOHandlerEditline::Run() + 304
34  LLDB  0x115ad2444 lldb_private::Debugger::RunIOHandlers() + 140
35  LLDB  0x115bd9514 lldb_private::CommandInterpreter::RunCommandInterpreter(lldb_private::CommandInterpreterRunOptions&) + 216
36  LLDB  0x1158f7d44 lldb::SBDebugger::RunCommandInterpreter(bool, bool) + 168
37  lldb  0x10421d7bc Driver::MainLoop() + 3428
38  lldb  0x10421e2c4 main + 2140

And if I don't patch here we are mutating a node in the cache.

What should the correct behavior here be?

PS: fails on this test

augusto2112 · 2024-09-16T18:05:19Z

lldb/source/Plugins/TypeSystem/Swift/LRUCache.h

+    {
+      std::lock_guard lock{m_mutex};
+      if (auto map_it = m_map.find(key); map_it != m_map.end()) {
+        const auto &map_value = map_it->second;


Just thought about this: it'd be nice if in debug mode we assert that the cached value and the one created from the factory are the same (sorry for giving you more work!)

Sure. The Node doesn't seem to provide an equality function. But it has isSimilarTo which in combination with a simple DFS should be enough to implement it in SharedDemangledNode.
I have a couple of questions then:

Do you want this check to be a part of the LRUCache class or in GetCachedDemangledSymbol? I'm asking because in LRUCache we generally don't need the Value type to be equatable, but with this check we will require that.

Under what flag should I add this check?

!define(NDEBUG) works in ReleaseAssert builds

defined(LLDB_CONFIGURATION_DEBUG) is defined only in debug builds (AFAICS)

Make a new macro like LLDB_LRUCACHE_VALIDATE that defaults to NDEBUG or LLDB_CONFIGURATION_DEBUG:
#if !defined(LLDB_LRUCACHE_VALIDATE) && !defined(NDEBUG) #define LLDB_LRUCACHE_VALIDATE #endif
This is more flexible but I doubt anyone will ever want to specify it separately.

GetCachedDemangledSymbol would be ok.

#ifndef NDEBUG would be the one I'd pick, we already have similar check guarded (such as Verify) under that.

augusto2112 · 2024-09-16T20:35:24Z

@adrian-prantl Can you help me with the logic of GetNodeForPrintingImpl please? I found your old commit where you changed the behavior of this function - https://github.com/swiftlang/llvm-project/pull/1926/files?diff=split&w=0#diff-170b8939d57bb70978692200f4bd5ac529881de350754df860fa604536127950L646-L672 I'm particularly interested in this section "Strip out LocalDeclNames.". Before the commit, it created a copy of the node, set two children(module and identifier), and returned it. Now it doesn't create a copy but modifies the original node by appending module and identifier. It looks like that's not what we want - two nodes are used twice in the tree after this mutation. But if I patch the implementation so it creates a copy again it segfaults later in mangleFunction:

code and stacktrace

   // Bug-for-bug compatibility. Remove this loop!
   // Strip out LocalDeclNames.
   for (unsigned i = 0; i < node->getNumChildren(); ++i) {
     NodePointer child = node->getChild(i);
     if (child->getKind() == Node::Kind::LocalDeclName)
       for (NodePointer identifier : *child)
         if (identifier->getKind() == Node::Kind::Identifier) {
           NodePointer module = nullptr;
           if (node->getChild(0)->getNumChildren() > 1)
             module = node->getChild(0)->getChild(0);
           if (module->getKind() != Node::Kind::Module)
             break;
   
           NodePointer canonical = CopyNodeWithoutChildren(dem, node); <<< here
           canonical->addChild(module, dem);
           canonical->addChild(identifier, dem);
           return canonical;
         }
   }

Thread 0 Crashed::  Dispatch queue: com.apple.main-thread
0   LLDB  0x117c820fc (anonymous namespace)::Remangler::mangleFunction(swift::Demangle::Node*, unsigned int) + 208
1   LLDB  0x117c92f0c (anonymous namespace)::Remangler::mangleAnyGenericType(swift::Demangle::Node*, llvm::StringRef, unsigned int) + 236
2   LLDB  0x117c8b428 (anonymous namespace)::Remangler::mangleTypeMangling(swift::Demangle::Node*, unsigned int) + 188
3   LLDB  0x117c83ab4 (anonymous namespace)::Remangler::mangleGlobal(swift::Demangle::Node*, unsigned int) + 296
4   LLDB  0x117c7a330 swift::Demangle::mangleNode(swift::Demangle::Node*, llvm::function_ref<swift::Demangle::Node* (swift::Demangle::SymbolicReferenceKind, void const*)>) + 456
5   LLDB  0x117c7a15c swift::Demangle::mangleNode(swift::Demangle::Node*) + 28
6   LLDB  0x11611bf34 lldb_private::TypeSystemSwiftTypeRef::GetSwiftified(swift::Demangle::Demangler&, swift::Demangle::Node*, bool) + 176
7   LLDB  0x1161169a8 lldb_private::TypeSystemSwiftTypeRef::Transform(swift::Demangle::Demangler&, swift::Demangle::Node*, std::__1::function<swift::Demangle::Node* (swift::Demangle::Node*)>) + 1264
8   LLDB  0x11611677c lldb_private::TypeSystemSwiftTypeRef::Transform(swift::Demangle::Demangler&, swift::Demangle::Node*, std::__1::function<swift::Demangle::Node* (swift::Demangle::Node*)>) + 708
9   LLDB  0x11611677c lldb_private::TypeSystemSwiftTypeRef::Transform(swift::Demangle::Demangler&, swift::Demangle::Node*, std::__1::function<swift::Demangle::Node* (swift::Demangle::Node*)>) + 708
10  LLDB  0x11611677c lldb_private::TypeSystemSwiftTypeRef::Transform(swift::Demangle::Demangler&, swift::Demangle::Node*, std::__1::function<swift::Demangle::Node* (swift::Demangle::Node*)>) + 708
11  LLDB  0x11611c6c8 lldb_private::TypeSystemSwiftTypeRef::GetNodeForPrintingImpl(swift::Demangle::Demangler&, swift::Demangle::Node*, bool) + 240
12  LLDB  0x11611c898 lldb_private::TypeSystemSwiftTypeRef::GetDemangleTreeForPrinting(swift::Demangle::Demangler&, lldb_private::ConstString, bool) + 392
13  LLDB  0x116126fec lldb_private::TypeSystemSwiftTypeRef::GetTypeName(void*, bool) + 248
14  LLDB  0x115c1222c lldb_private::CompilerType::GetTypeName(bool) const + 256
15  LLDB  0x1160d66dc DWARFASTParserSwift::ParseTypeFromDWARF(lldb_private::SymbolContext const&, lldb_private::plugin::dwarf::DWARFDIE const&, bool*) + 2120
16  LLDB  0x1160d6944 DWARFASTParserSwift::ParseTypeFromDWARF(lldb_private::SymbolContext const&, lldb_private::plugin::dwarf::DWARFDIE const&, bool*) + 2736
17  LLDB  0x1160bedf8 lldb_private::plugin::dwarf::SymbolFileDWARF::ParseType(lldb_private::SymbolContext const&, lldb_private::plugin::dwarf::DWARFDIE const&, bool*) + 196
18  LLDB  0x1160bb698 lldb_private::plugin::dwarf::SymbolFileDWARF::GetTypeForDIE(lldb_private::plugin::dwarf::DWARFDIE const&, bool) + 904
19  LLDB  0x1160ba77c lldb_private::plugin::dwarf::SymbolFileDWARF::ResolveType(lldb_private::plugin::dwarf::DWARFDIE const&, bool, bool) + 96
20  LLDB  0x1160ba648 lldb_private::plugin::dwarf::SymbolFileDWARF::ResolveTypeUID(unsigned long long) + 120
21  LLDB  0x115c472dc lldb_private::SymbolFileType::GetType() + 64
22  LLDB  0x115b550d8 lldb_private::ValueObjectVariable::GetCompilerTypeImpl() + 24
23  LLDB  0x115b3fba0 lldb_private::ValueObject::MaybeCalculateCompleteType() + 52
24  LLDB  0x115c7f270 lldb_private::Process::IsPossibleDynamicValue(lldb_private::ValueObject&) + 84
25  LLDB  0x115b463f8 lldb_private::ValueObject::CalculateDynamicValue(lldb::DynamicValueType) + 104
26  LLDB  0x115b464bc lldb_private::ValueObject::GetDynamicValue(lldb::DynamicValueType) + 136
27  LLDB  0x115ca5b44 lldb_private::StackFrame::GetValueObjectForFrameVariable(std::__1::shared_ptr<lldb_private::Variable> const&, lldb::DynamicValueType) + 248
28  LLDB  0x115ca4190 lldb_private::StackFrame::GetValueForVariableExpressionPath(llvm::StringRef, lldb::DynamicValueType, unsigned int, std::__1::shared_ptr<lldb_private::Variable>&, lldb_private::Status&) + 1384
29  LLDB  0x1162b7778 CommandObjectFrameVariable::DoExecute(lldb_private::Args&, lldb_private::CommandReturnObject&) + 1520
30  LLDB  0x115bde4a4 lldb_private::CommandObjectParsed::Execute(char const*, lldb_private::CommandReturnObject&) + 704
31  LLDB  0x115bd4a30 lldb_private::CommandInterpreter::HandleCommand(char const*, lldb_private::LazyBool, lldb_private::CommandReturnObject&, bool) + 2588
32  LLDB  0x115bd8110 lldb_private::CommandInterpreter::IOHandlerInputComplete(lldb_private::IOHandler&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&) + 540
33  LLDB  0x115af2a00 lldb_private::IOHandlerEditline::Run() + 304
34  LLDB  0x115ad2444 lldb_private::Debugger::RunIOHandlers() + 140
35  LLDB  0x115bd9514 lldb_private::CommandInterpreter::RunCommandInterpreter(lldb_private::CommandInterpreterRunOptions&) + 216
36  LLDB  0x1158f7d44 lldb::SBDebugger::RunCommandInterpreter(bool, bool) + 168
37  lldb  0x10421d7bc Driver::MainLoop() + 3428
38  lldb  0x10421e2c4 main + 2140

And if I don't patch here we are mutating a node in the cache.

What should the correct behavior here be?

PS: fails on this test

Do you have a definition for CopyNodeWithoutChildren?

DmT021 · 2024-09-16T20:48:00Z

Do you have a definition for CopyNodeWithoutChildren?

static inline NodePointer
CopyNodeWithoutChildren(swift::Demangle::Demangler &dem, const Node *node) {
  if (!node)
    return nullptr;
  const auto kind = node->getKind();
  if (node->hasText())
    return dem.createNode(kind, node->getText());
  if (node->hasIndex())
    return dem.createNode(kind, node->getIndex());
  return dem.createNode(kind);
}

DmT021 · 2024-09-17T00:19:31Z

@augusto2112 I made an equality function and caught an edge case when it fails: two consequent calls to demangler produce different results when a node of the kind OpaqueReturnTypeParent is present. For example $sSTsE3mapySayqd__Gqd__7ElementQzKXEKlF7Network10HTTPFieldsV6fields33_F32FFD35E65B6CA87CB92CAA4C947545LL3forQrAE9HTTPFieldV4NameV_tF0O8SequenceL_V_SSTg5012$s7Network10d17V6valuesSaySSGAA9o2V4P62V_tcigSSAGcfu_33_03793aef873d64d3ef254dad604a5331AGSSTf3nnpk_nTf1cn_n.
This is because Index is assigned as a pointer to a parent node (here).

Demangling trees

Demangling $sSTsE3mapySayqd__Gqd__7ElementQzKXEKlF7Network10HTTPFieldsV6fields33_F32FFD35E65B6CA87CB92CAA4C947545LL3forQrAE9HTTPFieldV4NameV_tF0O8SequenceL_V_SSTg5012$s7Network10d17V6valuesSaySSGAA9o2V4P62V_tcigSSAGcfu_33_03793aef873d64d3ef254dad604a5331AGSSTf3nnpk_nTf1cn_n
validation_result 0x141389bf0
kind=Global
  kind=FunctionSignatureSpecialization
    kind=SpecializationPassID, index=1
    kind=FunctionSignatureSpecializationParam
      kind=FunctionSignatureSpecializationParamKind, index=5
      kind=FunctionSignatureSpecializationParamPayload, text="$s7Network10HTTPFieldsV6valuesSaySSGAA9HTTPFieldV4NameV_tcigSSAGcfu_33_03793aef873d64d3ef254dad604a5331AGSSTf3nnpk_n"
    kind=FunctionSignatureSpecializationParam
  kind=GenericSpecialization
    kind=SpecializationPassID, index=5
    kind=GenericSpecializationParam
      kind=Type
        kind=Structure
          kind=Function
            kind=Structure
              kind=Module, text="Network"
              kind=Identifier, text="HTTPFields"
            kind=PrivateDeclName
              kind=Identifier, text="_F32FFD35E65B6CA87CB92CAA4C947545"
              kind=Identifier, text="fields"
            kind=LabelList
              kind=Identifier, text="for"
            kind=Type
              kind=FunctionType
                kind=ArgumentTuple
                  kind=Type
                    kind=Tuple
                      kind=TupleElement
                        kind=Type
                          kind=Structure
                            kind=Structure
                              kind=Module, text="Network"
                              kind=Identifier, text="HTTPField"
                            kind=Identifier, text="Name"
                kind=ReturnType
                  kind=Type
                    kind=OpaqueReturnType
                      kind=OpaqueReturnTypeParent, index=5389195064
          kind=LocalDeclName
            kind=Number, index=0
            kind=Identifier, text="HTTPFieldSequence"
    kind=GenericSpecializationParam
      kind=Type
        kind=Structure
          kind=Module, text="Swift"
          kind=Identifier, text="String"
  kind=Function
    kind=Extension
      kind=Module, text="Swift"
      kind=Protocol
        kind=Module, text="Swift"
        kind=Identifier, text="Sequence"
    kind=Identifier, text="map"
    kind=LabelList
    kind=Type
      kind=DependentGenericType
        kind=DependentGenericSignature
          kind=DependentGenericParamCount, index=1
        kind=Type
          kind=FunctionType
            kind=ThrowsAnnotation
            kind=ArgumentTuple
              kind=Type
                kind=NoEscapeFunctionType
                  kind=ThrowsAnnotation
                  kind=ArgumentTuple
                    kind=Type
                      kind=DependentMemberType
                        kind=Type
                          kind=DependentGenericParamType
                            kind=Index, index=0
                            kind=Index, index=0
                        kind=DependentAssociatedTypeRef
                          kind=Identifier, text="Element"
                  kind=ReturnType
                    kind=Type
                      kind=DependentGenericParamType
                        kind=Index, index=1
                        kind=Index, index=0
            kind=ReturnType
              kind=Type
                kind=BoundGenericStructure
                  kind=Type
                    kind=Structure
                      kind=Module, text="Swift"
                      kind=Identifier, text="Array"
                  kind=TypeList
                    kind=Type
                      kind=DependentGenericParamType
                        kind=Index, index=1
                        kind=Index, index=0
result 0x1412167f0
kind=Global
  kind=FunctionSignatureSpecialization
    kind=SpecializationPassID, index=1
    kind=FunctionSignatureSpecializationParam
      kind=FunctionSignatureSpecializationParamKind, index=5
      kind=FunctionSignatureSpecializationParamPayload, text="$s7Network10HTTPFieldsV6valuesSaySSGAA9HTTPFieldV4NameV_tcigSSAGcfu_33_03793aef873d64d3ef254dad604a5331AGSSTf3nnpk_n"
    kind=FunctionSignatureSpecializationParam
  kind=GenericSpecialization
    kind=SpecializationPassID, index=5
    kind=GenericSpecializationParam
      kind=Type
        kind=Structure
          kind=Function
            kind=Structure
              kind=Module, text="Network"
              kind=Identifier, text="HTTPFields"
            kind=PrivateDeclName
              kind=Identifier, text="_F32FFD35E65B6CA87CB92CAA4C947545"
              kind=Identifier, text="fields"
            kind=LabelList
              kind=Identifier, text="for"
            kind=Type
              kind=FunctionType
                kind=ArgumentTuple
                  kind=Type
                    kind=Tuple
                      kind=TupleElement
                        kind=Type
                          kind=Structure
                            kind=Structure
                              kind=Module, text="Network"
                              kind=Identifier, text="HTTPField"
                            kind=Identifier, text="Name"
                kind=ReturnType
                  kind=Type
                    kind=OpaqueReturnType
                      kind=OpaqueReturnTypeParent, index=5387674424
          kind=LocalDeclName
            kind=Number, index=0
            kind=Identifier, text="HTTPFieldSequence"
    kind=GenericSpecializationParam
      kind=Type
        kind=Structure
          kind=Module, text="Swift"
          kind=Identifier, text="String"
  kind=Function
    kind=Extension
      kind=Module, text="Swift"
      kind=Protocol
        kind=Module, text="Swift"
        kind=Identifier, text="Sequence"
    kind=Identifier, text="map"
    kind=LabelList
    kind=Type
      kind=DependentGenericType
        kind=DependentGenericSignature
          kind=DependentGenericParamCount, index=1
        kind=Type
          kind=FunctionType
            kind=ThrowsAnnotation
            kind=ArgumentTuple
              kind=Type
                kind=NoEscapeFunctionType
                  kind=ThrowsAnnotation
                  kind=ArgumentTuple
                    kind=Type
                      kind=DependentMemberType
                        kind=Type
                          kind=DependentGenericParamType
                            kind=Index, index=0
                            kind=Index, index=0
                        kind=DependentAssociatedTypeRef
                          kind=Identifier, text="Element"
                  kind=ReturnType
                    kind=Type
                      kind=DependentGenericParamType
                        kind=Index, index=1
                        kind=Index, index=0
            kind=ReturnType
              kind=Type
                kind=BoundGenericStructure
                  kind=Type
                    kind=Structure
                      kind=Module, text="Swift"
                      kind=Identifier, text="Array"
                  kind=TypeList
                    kind=Type
                      kind=DependentGenericParamType
                        kind=Index, index=1
                        kind=Index, index=0

kind=OpaqueReturnTypeParent, index=5389195064
kind=OpaqueReturnTypeParent, index=5387674424

This is somewhat bad. We could make isSimilarTo to take this into account. But it also means our "copy" functions don't work properly for this kind of node.

augusto2112 · 2024-09-17T20:07:33Z

@DmT021 looks like you fixed the GetNodeForPrintingImpl with the latest commit right?

@augusto2112 I made an equality function and caught an edge case when it fails: two consequent calls to demangler produce different results when a node of the kind OpaqueReturnTypeParent is present. For example $sSTsE3mapySayqd__Gqd__7ElementQzKXEKlF7Network10HTTPFieldsV6fields33_F32FFD35E65B6CA87CB92CAA4C947545LL3forQrAE9HTTPFieldV4NameV_tF0O8SequenceL_V_SSTg5012$s7Network10d17V6valuesSaySSGAA9o2V4P62V_tcigSSAGcfu_33_03793aef873d64d3ef254dad604a5331AGSSTf3nnpk_nTf1cn_n.
This is because Index is assigned as a pointer to a parent node (here).

Wow, great job catching this! I wasn't aware of OpaqueReturnTypeParent.

This is somewhat bad. We could make isSimilarTo to take this into account.

We could probably make our own "Verify" function that is only compiled in #ifndef NDEBUG to take that into account.

But it also means our "copy" functions don't work properly for this kind of node.

Do you mean just CopyNodeWithoutChildren or is there another function?

This is definitely an issue, but this issue already exists with the current code, right?

It looks like CopyNodeWithoutChildren always copies the node and then sets the children:

            NodePointer canonical = CopyNodeWithoutChildren(dem, node);
             for (NodePointer child : *node) {
               canonical->addChild(child, dem);
             }

Maybe instead of having CopyNodeWithoutChildren you could have a function with slightly different semantics: both clones the node and set the children in one go, and if the node that its copying is a OpaqueReturnType, also clone any children with type OpaqueReturnTypeParent.

What do you think?

lldb/source/Plugins/TypeSystem/Swift/TypeSystemSwiftTypeRef.cpp

DmT021 · 2024-09-17T20:31:43Z

@DmT021 looks like you fixed the GetNodeForPrintingImpl with the latest commit right?

It's a fix but I still don't know if that's correct. It surely makes the tests pass but I don't know why the original implementation was written and what problems it solves.

But it also means our "copy" functions don't work properly for this kind of node.

Do you mean just CopyNodeWithoutChildren or is there another function?

There was a copy routine in Transform before. I extracted it into a separate function - CopyNodeWithoutChildren. I haven't seen another one like this but maybe overlooked.

This is definitely an issue, but this issue already exists with the current code, right?

Right.

Maybe instead of having CopyNodeWithoutChildren you could have a function with slightly different semantics: both clones the node and set the children in one go, and if the node that its copying is a OpaqueReturnType, also clone any children with type OpaqueReturnTypeParent.

What do you think?

I think it makes sense to have a function like CopyNodeWithChildren, but I'd rather not implement special cases for some kinds of nodes in it. I spoke with Joe Groff today about the issue and we agreed on a fix in the demangler. The index field in OpaqueReturnTypeParent nodes is only used as a discriminator, so we can use any unique identifier here. So instead of the parent's pointer, we will store the full mangled name of the parent. This way the whole tree will become stable across all demanglers, and we won't have a special case in the equality and copy functions. I'm working on the fix now.

DmT021 · 2024-09-23T16:57:53Z

@augusto2112 can you take a look once more, please? I added the asserts as you asked.

augusto2112 requested review from adrian-prantl, augusto2112 and kastiglione August 29, 2024 16:37

augusto2112 reviewed Aug 29, 2024

View reviewed changes

DmT021 force-pushed the wp/lrucache-swift-demangle branch from e7e18d2 to cb6c653 Compare August 30, 2024 06:42

DmT021 marked this pull request as ready for review August 30, 2024 20:19

adrian-prantl reviewed Sep 5, 2024

View reviewed changes

augusto2112 reviewed Sep 5, 2024

View reviewed changes

DmT021 force-pushed the wp/lrucache-swift-demangle branch from cb6c653 to e49a36b Compare September 12, 2024 04:35

adrian-prantl reviewed Sep 12, 2024

View reviewed changes

lldb/source/Plugins/TypeSystem/Swift/SwiftDemangle.cpp Outdated Show resolved Hide resolved

adrian-prantl reviewed Sep 12, 2024

View reviewed changes

lldb/source/Plugins/TypeSystem/Swift/SwiftDemangle.cpp Outdated Show resolved Hide resolved

adrian-prantl reviewed Sep 12, 2024

View reviewed changes

augusto2112 reviewed Sep 13, 2024

View reviewed changes

LRUCache for Swift type system mangling/demangling

553cd0e

DmT021 force-pushed the wp/lrucache-swift-demangle branch from e49a36b to 553cd0e Compare September 15, 2024 07:00

augusto2112 reviewed Sep 16, 2024

View reviewed changes

Fix node mutation in Transform and GetNodeForPrintingImpl

e7a1989

augusto2112 reviewed Sep 17, 2024

View reviewed changes

lldb/source/Plugins/TypeSystem/Swift/TypeSystemSwiftTypeRef.cpp Outdated Show resolved Hide resolved

DmT021 added 2 commits September 19, 2024 00:02

Assert cached and newly unmangled trees are equal

9754562

Add CopyNodeWithChildrenReferences

363361c

		@@ -0,0 +1,28 @@
		//===-- SwiftDemangle.h ---------------------------------------- C++ --===//

[lldb] LRUCache for Swift type system mangling/demangling #9191

Are you sure you want to change the base?

[lldb] LRUCache for Swift type system mangling/demangling #9191

Conversation

DmT021 commented Aug 29, 2024 • edited Loading

DmT021 commented Aug 29, 2024

augusto2112 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DmT021 commented Aug 30, 2024 • edited Loading

DmT021 commented Sep 2, 2024

Calls to demangleSymbol

Calls to mangleNode

adrian-prantl commented Sep 5, 2024

DmT021 commented Sep 5, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

augusto2112 commented Sep 5, 2024

Choose a reason for hiding this comment

DmT021 commented Sep 5, 2024

augusto2112 commented Sep 5, 2024 • edited Loading

DmT021 commented Sep 5, 2024

augusto2112 commented Sep 10, 2024

DmT021 commented Sep 12, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

augusto2112 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

augusto2112 commented Sep 13, 2024

DmT021 commented Sep 15, 2024

DmT021 commented Sep 16, 2024 • edited Loading

Choose a reason for hiding this comment

DmT021 Sep 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

augusto2112 commented Sep 16, 2024

DmT021 commented Sep 16, 2024

DmT021 commented Sep 17, 2024 • edited Loading

augusto2112 commented Sep 17, 2024

DmT021 commented Sep 17, 2024

DmT021 commented Sep 23, 2024

DmT021 commented Aug 29, 2024 •

edited

Loading

DmT021 commented Aug 30, 2024 •

edited

Loading

Calls to `demangleSymbol`

Calls to `mangleNode`

augusto2112 commented Sep 5, 2024 •

edited

Loading

DmT021 commented Sep 16, 2024 •

edited

Loading

DmT021 Sep 16, 2024 •

edited

Loading

DmT021 commented Sep 17, 2024 •

edited

Loading