Fix detection of duplicate torch tensors #379

fxmarty · 2023-11-06T13:12:50Z

As per title. Two tensors with different data may share the same data_ptr in case they are views of an other tensor. For example:

import torch

a = torch.rand(8, 10, 50)

b = a[2:4]
c = a[4:6]

print(a.untyped_storage().data_ptr())
print(b.untyped_storage().data_ptr())
print(c.untyped_storage().data_ptr())

prints

93888779151936
93888779151936
93888779151936

Narsil · 2023-11-06T13:41:39Z

bindings/python/py_src/safetensors/torch.py

@@ -71,7 +75,8 @@ def _find_shared_tensors(state_dict: Dict[str, torch.Tensor]) -> List[Set[str]]:
    for k, v in state_dict.items():
        if v.device != torch.device("meta") and storage_ptr(v) != 0 and storage_size(v) != 0:
            # Need to add device as key because of multiple GPU.
-            tensors[(v.device, storage_ptr(v), storage_size(v))].add(k)
+            # Need to add storage_offset as key because views may share the same data_ptr.
+            tensors[(v.device, storage_ptr(v), storage_size(v), storage_offset(v))].add(k)


I think this line is correct.

It's trying to see which tensors have shared memory, which is indicated by storage.
offset will make you miss some shared tensors.

In my understanding, what this functions does is finding which tensors in the state_dict are "identical". Relying on data_ptr() is in this case not enough, as data_ptr is the same for views of a tensor - views that may represent different data. For example, bar = a[2:4] is not the same tensor as foo = a[4:6].

I'll try to come up with a minimal example to showcase what I mean.

@Narsil @LysandreJik Here is a minimal repro of the issue:

import torch import torch.nn as nn from safetensors.torch import _find_shared_tensors class MyModel(nn.Module): def __init__(self): super().__init__() self.qkv = nn.Linear(20, 30) self.q = nn.Linear(20, 10) self.q.weight = torch.nn.Parameter(self.qkv.weight[:10]) self.v = nn.Linear(20, 10) self.v.weight = torch.nn.Parameter(self.qkv.weight[10:20]) def forward(self, x): return x model = MyModel() shared_params = _find_shared_tensors(model.state_dict()) print("shared_params", shared_params)

printing: shared_params [{'v.weight', 'qkv.weight', 'q.weight'}, {'qkv.bias'}, {'q.bias'}, {'v.bias'}]

Okay but I understand that you argue that _find_shared_tensors is meant to group tensors sharing memory, no matter whether some of them are views and the actual data is different. Is that the case?

Despite the fix huggingface/transformers#27314, Transformers still calls safetensors.torch.save_file, and complains that tensors still have shared memory: https://github.com/fxmarty/safetensors/blob/f04d064884be5ede7b6f7d844ce22b793607d091/bindings/python/py_src/safetensors/torch.py#L474. Couldn't Transformers call save_model as the error suggests? Though I doubt this would help as save_model calls _remove_duplicate_names that in turns calls _find_shared_tensors.

Narsil · 2023-11-17T13:45:30Z

Fixed directly in transformers.

fix detection of duplicate tensors

f04d064

fxmarty mentioned this pull request Nov 6, 2023

Fix id_tensor_storage in case the tensor is a view of an other tensor huggingface/transformers#27314

Closed

fxmarty requested a review from Narsil November 6, 2023 13:16

Narsil reviewed Nov 6, 2023

View reviewed changes

Narsil closed this Nov 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix detection of duplicate torch tensors #379

Fix detection of duplicate torch tensors #379

fxmarty commented Nov 6, 2023

Narsil Nov 6, 2023

fxmarty Nov 6, 2023 •

edited

Loading

fxmarty Nov 9, 2023

fxmarty Nov 9, 2023 •

edited

Loading

Narsil commented Nov 17, 2023

Fix detection of duplicate torch tensors #379

Fix detection of duplicate torch tensors #379

Conversation

fxmarty commented Nov 6, 2023

Narsil Nov 6, 2023

Choose a reason for hiding this comment

fxmarty Nov 6, 2023 • edited Loading

Choose a reason for hiding this comment

fxmarty Nov 9, 2023

Choose a reason for hiding this comment

fxmarty Nov 9, 2023 • edited Loading

Choose a reason for hiding this comment

Narsil commented Nov 17, 2023

fxmarty Nov 6, 2023 •

edited

Loading

fxmarty Nov 9, 2023 •

edited

Loading