Possible bug in clean_obs_names function #1029

cs-tum · 2023-03-28T17:31:28Z

Dear scvelo team, thank you for your great work!

I ran into a problem where the clean_obs_names() function from scvelo/core/_anndata.py did not clean up all obs_names correctly (see below for an example).

I believe the for loop of that function might not actually work on the looped obs_name items but only on the first item adata.obs_names[0], possibly by mistake:

# Original code
for obs_name in adata.obs_names:
  start, end = re.search(alphabet * id_length, adata.obs_names[0]).span()
  new_obs_names.append(obs_name[start:end])
  prefixes.append(obs_name.replace(obs_name[start:end], ""))

I was able to fix this by replacing lines 63-66 with the following code:

# Modified code
for obs_name in adata.obs_names:
  # FIXED BY REPLACING adata.obs_names[0] with obs_name
  start, end = re.search(alphabet * id_length, obs_name).span()
  new_obs_names.append(obs_name[start:end])
  prefixes.append(obs_name.replace(obs_name[start:end], ""))

Can you confirm the issue or am I misinterpreting the usage?

Example: I had a merged anndata object where the obs_names did not have the same length, e.g. 221229_Test_Run_230123_1_Spl:AAAGCAAAGGCATTGGx and 221229_Test_Run_230123_5_Blood:TTTGTCATCTGGCGTGx. After running scv.utils.clean_obs_names(adata, id_length=16), the former barcode was correctly cleaned to AAAGCAAAGGCATTGG whereas the latter one turned into d:TTTGTCATCTGGCG.

The text was updated successfully, but these errors were encountered:

tingxie2020 · 2023-04-19T18:10:34Z

Dear scvelo team, thank you for your great work!
I have the same issue here.

The obs_names of my scanpy merged anndata object don't have the same length. e.g. xx1_AAACCCATCGCCGTGA-1 and xxxxxxxx3_TTTGTTGAGTAGTCTC-1. after running scv.utils.clean_obs_names(adata, id_length=16), the former barcode was corrected cleaned to AAACCCATCGCCGTGA whereas the latter one turned into xxxx3_TTTGTTGAGT.

At the same time, obs_names of my concatenated velocyto loom files don't have the same length. (I readin loom files by scv.read loom and then concatenate the anndatas after .var_names_make_unique). Before concatenate the obs_names was like: sample_alignments_xxxxx:AAGATAGGTCGCTTGGx, after concatenation, the obs_names turned into: AAGATAGGTCGC.

Could you let me know if you have any suggestions for these?

PhilippBanza · 2024-02-01T10:29:37Z

Dear scvelo team, thank you for your great work!

I ran into a problem where the clean_obs_names() function from scvelo/core/_anndata.py did not clean up all obs_names correctly (see below for an example).

I believe the for loop of that function might not actually work on the looped obs_name items but only on the first item adata.obs_names[0], possibly by mistake:
# Original code
for obs_name in adata.obs_names:
  start, end = re.search(alphabet * id_length, adata.obs_names[0]).span()
  new_obs_names.append(obs_name[start:end])
  prefixes.append(obs_name.replace(obs_name[start:end], ""))
I was able to fix this by replacing lines 63-66 with the following code:
# Modified code
for obs_name in adata.obs_names:
  # FIXED BY REPLACING adata.obs_names[0] with obs_name
  start, end = re.search(alphabet * id_length, obs_name).span()
  new_obs_names.append(obs_name[start:end])
  prefixes.append(obs_name.replace(obs_name[start:end], ""))
Can you confirm the issue or am I misinterpreting the usage?

Example: I had a merged anndata object where the obs_names did not have the same length, e.g. 221229_Test_Run_230123_1_Spl:AAAGCAAAGGCATTGGx and 221229_Test_Run_230123_5_Blood:TTTGTCATCTGGCGTGx. After running scv.utils.clean_obs_names(adata, id_length=16), the former barcode was correctly cleaned to AAAGCAAAGGCATTGG whereas the latter one turned into d:TTTGTCATCTGGCG.

Thank you for the solution, had the issue that after running scv.utils.clean_obs_names(adata, id_length=16) in some of my samples the barcode was splitted, your "correction" solved the issue!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible bug in clean_obs_names function #1029

Possible bug in clean_obs_names function #1029

cs-tum commented Mar 28, 2023

tingxie2020 commented Apr 19, 2023

PhilippBanza commented Feb 1, 2024

Possible bug in clean_obs_names function #1029

Possible bug in clean_obs_names function #1029

Comments

cs-tum commented Mar 28, 2023

tingxie2020 commented Apr 19, 2023

PhilippBanza commented Feb 1, 2024