Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible bug in clean_obs_names function #1029

Open
cs-tum opened this issue Mar 28, 2023 · 2 comments
Open

Possible bug in clean_obs_names function #1029

cs-tum opened this issue Mar 28, 2023 · 2 comments

Comments

@cs-tum
Copy link

cs-tum commented Mar 28, 2023

Dear scvelo team, thank you for your great work!

I ran into a problem where the clean_obs_names() function from scvelo/core/_anndata.py did not clean up all obs_names correctly (see below for an example).

I believe the for loop of that function might not actually work on the looped obs_name items but only on the first item adata.obs_names[0], possibly by mistake:

# Original code
for obs_name in adata.obs_names:
  start, end = re.search(alphabet * id_length, adata.obs_names[0]).span()
  new_obs_names.append(obs_name[start:end])
  prefixes.append(obs_name.replace(obs_name[start:end], ""))

I was able to fix this by replacing lines 63-66 with the following code:

# Modified code
for obs_name in adata.obs_names:
  # FIXED BY REPLACING adata.obs_names[0] with obs_name
  start, end = re.search(alphabet * id_length, obs_name).span()
  new_obs_names.append(obs_name[start:end])
  prefixes.append(obs_name.replace(obs_name[start:end], ""))

Can you confirm the issue or am I misinterpreting the usage?

Example: I had a merged anndata object where the obs_names did not have the same length, e.g. 221229_Test_Run_230123_1_Spl:AAAGCAAAGGCATTGGx and 221229_Test_Run_230123_5_Blood:TTTGTCATCTGGCGTGx. After running scv.utils.clean_obs_names(adata, id_length=16), the former barcode was correctly cleaned to AAAGCAAAGGCATTGG whereas the latter one turned into d:TTTGTCATCTGGCG.

@tingxie2020
Copy link

Dear scvelo team, thank you for your great work!
I have the same issue here.

The obs_names of my scanpy merged anndata object don't have the same length. e.g. xx1_AAACCCATCGCCGTGA-1 and xxxxxxxx3_TTTGTTGAGTAGTCTC-1. after running scv.utils.clean_obs_names(adata, id_length=16), the former barcode was corrected cleaned to AAACCCATCGCCGTGA whereas the latter one turned into xxxx3_TTTGTTGAGT.

At the same time, obs_names of my concatenated velocyto loom files don't have the same length. (I readin loom files by scv.read loom and then concatenate the anndatas after .var_names_make_unique). Before concatenate the obs_names was like: sample_alignments_xxxxx:AAGATAGGTCGCTTGGx, after concatenation, the obs_names turned into: AAGATAGGTCGC.

Could you let me know if you have any suggestions for these?

@PhilippBanza
Copy link

Dear scvelo team, thank you for your great work!

I ran into a problem where the clean_obs_names() function from scvelo/core/_anndata.py did not clean up all obs_names correctly (see below for an example).

I believe the for loop of that function might not actually work on the looped obs_name items but only on the first item adata.obs_names[0], possibly by mistake:

# Original code
for obs_name in adata.obs_names:
  start, end = re.search(alphabet * id_length, adata.obs_names[0]).span()
  new_obs_names.append(obs_name[start:end])
  prefixes.append(obs_name.replace(obs_name[start:end], ""))

I was able to fix this by replacing lines 63-66 with the following code:

# Modified code
for obs_name in adata.obs_names:
  # FIXED BY REPLACING adata.obs_names[0] with obs_name
  start, end = re.search(alphabet * id_length, obs_name).span()
  new_obs_names.append(obs_name[start:end])
  prefixes.append(obs_name.replace(obs_name[start:end], ""))

Can you confirm the issue or am I misinterpreting the usage?

Example: I had a merged anndata object where the obs_names did not have the same length, e.g. 221229_Test_Run_230123_1_Spl:AAAGCAAAGGCATTGGx and 221229_Test_Run_230123_5_Blood:TTTGTCATCTGGCGTGx. After running scv.utils.clean_obs_names(adata, id_length=16), the former barcode was correctly cleaned to AAAGCAAAGGCATTGG whereas the latter one turned into d:TTTGTCATCTGGCG.

Thank you for the solution, had the issue that after running scv.utils.clean_obs_names(adata, id_length=16) in some of my samples the barcode was splitted, your "correction" solved the issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants