simple-event-annotation.R: doesn't add SVLEN header #561

savalipa · 2022-02-08T19:34:22Z

Hello,

I am having trouble with the simple-event-annotation.R script. I have already tried to run the script for two different batches with SV results .vcf file generated with GRIDSS. One batch had one sample only, and the another had 4 samples. Nevertheless, I got this same error message for both batches, where "provided 68 variables to replace 67 variables" error is repeated:

Error in .subassign_columns(x, nsbs, value) : provided 68 variables to replace 67 variables Calls: [<- ... [<- -> mergeROWS -> mergeROWS -> .subassign_columns In addition: Warning message: info fields with no header: SVLEN Execution halted

I do not know how to start solving this. Is it due to the info field without header? Or something more complicated?

Best regards,
Salla

The text was updated successfully, but these errors were encountered:

d-cameron · 2022-02-09T09:48:40Z

I do not know how to start solving this. Is it due to the info field without header?

GRIDSS no longer writes the SVLEN header. The workaround is to add the SVLEN header before running the simple annotation script.

chunlinxiao · 2022-02-18T15:05:50Z

I got same error - after adding SVLEN, the error was gone.

One question related to this script, which generated one annotated.vcf file (turning those BND to respective simple SV types without filtering), and the other file was a simple.bed file, which presumably a good SV calls in bed format with additional filtering - what was really implemented for this filtering? should we just use this simple.bed file as a confident SV call set by gridss?

Also, I noticed that in annotated.vcf, for a given DEL, DUP, or INV, it still contains two rows (one for each BND), but I saw all the INS events, also have two rows - so two BNDs were used for defining an INS event, or something else?

savalipa · 2022-02-22T15:47:12Z

Thank you for your suggestions. Could you help me with providing your exact solutions (tool and/or script to run?)

chunlinxiao · 2022-02-24T14:40:36Z

what I did for the vcf, added a line to the header:

##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="SV_length">

then the error disappeared.

HuangYihang1222 · 2024-06-15T09:31:45Z

Can someone help me see what's the problem with my R script?

# 定义需要安装的包列表
packages <- c("BiocManager", "VariantAnnotation", "stringr", "StructuralVariantAnnotation")

# 检查包是否已安装，如果没有则安装
installed_packages <- rownames(installed.packages())
to_install <- packages[!packages %in% installed_packages]
if (length(to_install) > 0) {
  BiocManager::install(to_install)  # 使用BiocManager安装bioconductor包
  install.packages("stringr")  # stringr不是bioconductor包，使用install.packages安装
}

# 加载所需的包
library(BiocManager)
library(VariantAnnotation)
library(stringr)
library(StructuralVariantAnnotation)

# 定义简单的SV类型分类器
simpleEventType <- function(gr) {
  pgr <- partner(gr)
  return(ifelse(seqnames(gr) != seqnames(pgr), "CTX", # inter-chromosomal
                ifelse(strand(gr) == strand(pgr), "INV",
                       ifelse(gr$insLen >= abs(gr$svLen) * 0.7, "INS", # TODO: improve classification of complex events
                              ifelse(xor(start(gr) < start(pgr), strand(gr) == "-"), "DEL",
                                     "DUP")))))
}

# 定义处理VCF文件的函数
process_vcf <- function(vcf_path, output_dir, ref_path) {
  # 读取VCF文件
  vcf <- readVcf(vcf_path, ref_path)
  
  # 创建新的INFO字段
  new_info_fields <- data.frame(
    row.names = c("SIMPLE_TYPE", "SVLEN"),
    Number = c("1", "1"),
    Type = c("String", "Integer"),
    Description = c("Simple event type annotation based purely on breakend position and orientation.",
                    "SV length.")
  )
  
  # 将新的INFO字段与现有的INFO字段合并
  existing_info <- as.data.frame(info(header(vcf)))
  info_fields <- unique(as(rbind(existing_info, new_info_fields), "DataFrame"))
  
  # 更新VCF头部的INFO字段
  info(header(vcf)) <- info_fields

  # 计算SV类型并更新VCF对象
  gr <- breakpointRanges(vcf)
  svtype <- simpleEventType(gr)
  
  # 添加SIMPLE_TYPE信息到VCF的variants中
  info(vcf)$SIMPLE_TYPE <- NA_character_
  info(vcf[gr$sourceId])$SIMPLE_TYPE <- svtype
  info(vcf[gr$sourceId])$SVLEN <- gr$svLen
  
  # 写入VCF文件
  output_vcf_filename <- file.path(output_dir, paste0(basename(output_dir), "_annotated.vcf"))
  writeVcf(vcf, output_vcf_filename)
  
  # 过滤和生成BED文件
  gr <- gr[gr$FILTER == "PASS" & partner(gr)$FILTER == "PASS"]
  simplegr <- gr[svtype %in% c("INS", "INV", "DEL", "DUP")]
  simplebed <- data.frame(
    chrom = seqnames(simplegr),
    # call the centre of the homology/inexact interval
    start = as.integer((start(simplegr) + end(simplegr)) / 2),
    end = as.integer((start(partner(simplegr)) + end(partner(simplegr))) / 2),
    name = simpleEventType(simplegr),
    score = simplegr$QUAL,
    strand = "."
  )
  
  simplebed <- simplebed[simplebed$start < simplebed$end,]
  
  output_bed_filename <- file.path(output_dir, paste0(basename(output_dir), "_simple.bed"))
  write.table(simplebed, output_bed_filename, quote = FALSE, sep = "\t", row.names = FALSE, col.names = FALSE)
}

# 获取所有ERR*文件夹的路径
base_dir <- "/home/huangyh/EU_ragweed/SV_results/gridss"
err_dirs <- list.dirs(path = base_dir, full.names = FALSE, recursive = FALSE)
err_dirs <- err_dirs[grepl("^ERR", err_dirs)]  # 过滤出所有以"ERR"开头的目录

# 获取参考基因组的路径
ref_path <- "/home/huangyh/raw_data/ref_seq/Ragweed.genome.fa"

# 遍历每个ERR*文件夹中的VCF文件
for (err_dir in err_dirs) {
  vcf_files <- list.files(path = err_dir, pattern = "\\.vcf$", full.names = TRUE)
  
  for (vcf_file in vcf_files) {
    # 获取VCF文件所在的文件夹路径
    output_dir <- dirname(vcf_file)
    
    # 处理VCF文件并生成输出
    process_vcf(vcf_path = vcf_file, output_dir = output_dir, ref_path = ref_path)
  }
}

Then I got this:Error in .subassign_columns(x, nsbs, value) :
provided 68 variables to replace 67 variables
Calls: process_vcf ... [<- -> mergeROWS -> mergeROWS -> .subassign_columns
此外: Warning message:
In .breakpointRanges(x, ...) :
Removing 1 unpaired breakend variants. Use inferMissingBreakends=TRUE to recover with inferred partner breakends. Missing breakends: gridss30bb_2291h
停止执行

d-cameron changed the title ~~simple-event-annotation.R: Error in .subassign_columns(x, nsbs, value)~~ simple-event-annotation.R: doesn't add SVLEN header Feb 9, 2022

HuangYihang1222 mentioned this issue Jun 7, 2024

error with simple-event-annotation.R script #656

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

simple-event-annotation.R: doesn't add SVLEN header #561

simple-event-annotation.R: doesn't add SVLEN header #561

savalipa commented Feb 8, 2022

d-cameron commented Feb 9, 2022

chunlinxiao commented Feb 18, 2022 •

edited

Loading

savalipa commented Feb 22, 2022

chunlinxiao commented Feb 24, 2022

HuangYihang1222 commented Jun 15, 2024

simple-event-annotation.R: doesn't add SVLEN header #561

simple-event-annotation.R: doesn't add SVLEN header #561

Comments

savalipa commented Feb 8, 2022

d-cameron commented Feb 9, 2022

chunlinxiao commented Feb 18, 2022 • edited Loading

savalipa commented Feb 22, 2022

chunlinxiao commented Feb 24, 2022

HuangYihang1222 commented Jun 15, 2024

chunlinxiao commented Feb 18, 2022 •

edited

Loading