Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

features reference other features that do not exist in the file #4468

Open
jasongallant opened this issue Jun 21, 2024 · 10 comments · Fixed by #4465
Open

features reference other features that do not exist in the file #4468

jasongallant opened this issue Jun 21, 2024 · 10 comments · Fixed by #4465

Comments

@jasongallant
Copy link

I'm writing in with an odd behavior-- I'm using @jbrowse-react-linear-genome-view in a web app that I'm working on, loading gff files hosted on S3.

When I load the feature track, I get the error "features reference other features that do not exist in the file".

here's the stack trace:

Error: some features reference other features that do not exist in the file (or in the same '###' scope).

/projectpath/node_modules/@gmod/gff/src/parse.ts:227:1 (at Parser._emitAllUnderConstructionFeatures ()
/projectpath/node_modules/@gmod/gff/src/parse.ts:165:1 (at Parser.finish ()
/projectpath/node_modules/@gmod/gff/src/api.ts:498:1 (at Object.parseStringSync ()
/projectpath/node_modules/@jbrowse/plugin-gff3/esm/Gff3TabixAdapter/Gff3TabixAdapter.js:85:1 (at Gff3TabixAdapter.getFeaturesHelper ()
/projectpath/node_modules/@jbrowse/plugin-gff3/esm/Gff3TabixAdapter/Gff3TabixAdapter.js:39:1 (at async)

I spent a long time examining the GFF file for issues between parent and child, but could find nothing.

  • Interestingly, when I zoom to any point in the GFF, the models load with no problem.
  • When I load the exact same file from the same S3 path in Jbrowse Desktop, I have no issue.

I have verified this happens using both Chrome and Safari. Not sure what else to try.

@cmdcolin
Copy link
Collaborator

if you are able to can you send the GFF file? there are a couple reasons for this off the top of my head why this could happen but it might help to see it

can send to [email protected]

@jasongallant
Copy link
Author

Sent it along just now! Thanks for having a look.

@cmdcolin
Copy link
Collaborator

cmdcolin commented Jun 21, 2024

thanks for sending it. I believe that if you update your config to have this specific dontRedispatch line, then it should fix the issue you are seeing

{
      "type": "FeatureTrack",
      "trackId": "genes",
      "name": "genes",
      "adapter": {
        "type": "Gff3TabixAdapter",
        "dontRedispatch": ["contig", "region", "chromosome"], <-- this is the important line, specifically incorporating "contig" into the list
        "gffGzLocation": {
          "uri": "yourfile.gff.gz",
          "locationType": "UriLocation"
        },
        "index": {
          "location": {
            "uri": "yourfile.gff.gz.tbi",
            "locationType": "UriLocation"
          },
          "indexType": "TBI"
        }
      },
      "assemblyNames": ["yourasm"]
    }

just for full information about what this line means, the "dontRedispatch" field says that, with GFF3 tabix, when we request a genomic region of the file e.g. chr1:1500-1600

the response is (pseudo-gff)

chr1 gene 1000 2000 ID=mygene
chr1 exon 1550 1580 Parent=mygene
chr1 exon 1590 1650 Parent=mygene

so it is only the exons in that specific coordinate slice chr1:1500-1600 that are returned, but there may be other parts of that gene (e.g. more exons) outside of the range "chr1:1500-1600" so we "redispatch" (make anothe request against the tabix file) to the size of the largest feature in that returned results (chr1:1000-2000) which then returns the full feature. this is a heuristic though and we tell the system that we "dont redispatch" requests for features that commonly just cover the entire chromosome and never have child features like contig, chromosome, or region. this is just a tricky thing with GFF3 tabix but hope this helps! I proposed a PR to add contig to the default "dontRedispatch" set here #4465, but applying the above config will fix it in your current version :)

@jasongallant
Copy link
Author

Hi Colin,

Thanks for the quick attention on this, and the detailed explanation. I did wonder if this was what was going on. I tried implementing your suggestion here:

return {
type: "FeatureTrack",
trackId: annotation.Description, // Use a unique identifier for the trackId, assuming id is unique
name: annotation.Description, // Use the name from the annotation
assemblyNames: [assembly.ShortName], // Assuming you want to use the assembly's short name
category: ["Annotation"], // Static category for all
adapter: {
type: "Gff3TabixAdapter",
dontRedispatch: ["contig", "region", "chromosome"], //<-- this is the important line, specifically incorporating "contig" into the list
gffGzLocation: {
uri: annotationURL,
locationType: "UriLocation",
},
index: {
location: {
uri: indexURL,
locationType: "UriLocation",
},
},
},
};

But am still getting the same issue on this and other assemblies. it seems to happen at the particularly high zoom levels (many genes). For instance in the files that I sent you , at 0 zoom on scaffold_3, it originally loads as Zoom in to see features or force load, but when I click on force load, same problem as before.

@cmdcolin
Copy link
Collaborator

cmdcolin commented Jun 21, 2024

can you confirm that the dontRedispatch setting is active by going to the about track and showing that it is listed?

e.g. it is listed here
image

if it is not listed, it may be using the default which doesn't include the "contig" entry

i'm not able to reproduce it at high levels but I do think it's not out of the question that you are seeing it still, and have a possible explanation....it's alluded to in the PR but our gff parser has a notion of a buffer size that should probably just be removed, and in that case, it will require making a new release (which I can keep you posted on :)!

image

@cmdcolin
Copy link
Collaborator

Now I am a bit mystified...it would be quite weird if the "gff parser bufferSize" was actually an issue in this case becuase the parseStringSync function of our Gff3TabixAdapter uses sets bufferSize to Infinity so no way it would be too small... (https://github.com/GMOD/gff-js/blob/18002e87a1d10990c463a4ee924901e9fc77e9e1/src/api.ts#L488 used by

const features = gff.parseStringSync(gff3, {
)

do you know what version of @jbrowse/react-linear-genome-view you are using? (can type yarn why @jbrowse/react-linear-genome-view to check perhaps or click the "icon in the top right" of the app)

@jasongallant
Copy link
Author

Hi @cmdcolin can confirm that I'm using JBrowse v2.12.2 for @jbrowse/react-linear-genome-view, and that indeed
Screenshot 2024-06-21 at 4 00 25 PM config.adapter.dontRedispatch is set for config region and chromosome

@cmdcolin
Copy link
Collaborator

very interesting...I don't have a clue yet but I'll keep brainstorming. it is funny that I can't reproduce it (even tried the URLs that you posted directly in case it was something weird with that)

@jasongallant
Copy link
Author

jasongallant commented Jun 21, 2024

Should add that I'm creating a config file using react, I'm wondering if there might be an issue with that. Here's the full code:

                      import React, { useState, useEffect } from "react";
                      import { Box } from "@mui/material";
                      import { get } from "aws-amplify/api";
                      import "@fontsource/roboto";
                      import { getUrl } from "aws-amplify/storage";
                      import {
                        createViewState,
                        JBrowseLinearGenomeView,
                      } from "@jbrowse/react-linear-genome-view";

                      async function fetchAssemblyFile(key) {
                        try {
                          console.log("Fetching URL for:", key);
                          const urlResponse = await getUrl({
                            path: key,
                            options: { validateObjectExistence: true },
                          });
                          console.log("URL Response:", urlResponse.url.href);
                          if (!urlResponse || !urlResponse.url) {
                            console.error("Failed to get a valid URL for:", key);
                            return null;
                          } else {
                            return urlResponse.url;
                          }
                        } catch (error) {
                          console.error("Error fetching URL for:", key, error);
                          return null;
                        }
                      }

                      const GenomeBrowserComponent = ({ assembly }) => {
                        const [viewState, setViewState] = useState(null);

                        console.log(assembly.CompS3Path);

                        useEffect(() => {}, [viewState]);

                        useEffect(() => {}, [assembly]);

                        useEffect(() => {
                          const fetchPresignedUrls = async () => {
                            try {
                              // Fetching the assembly data
                              const assemblyData = await fetchAssemblyFile(assembly.CompS3Path);
                              const assemblyIndex = await fetchAssemblyFile(
                                assembly.AssembyIndexPath
                              );
                              const assemblyGZI = await fetchAssemblyFile(assembly.GZIPath);

                              if (assemblyData) {
                                console.log(assembly);
                                // Create tracks dynamically based on the annotations array
                                const trackPromises = assembly.annotations.items.map(
                                  async (annotation) => {
                                    const annotationURL = await fetchAssemblyFile(
                                      annotation.CompS3Path
                                    );
                                    const indexURL = await fetchAssemblyFile(annotation.IndexPath);

                                    return {
                                      type: "FeatureTrack",
                                      trackId: annotation.Description, // Use a unique identifier for the trackId, assuming `id` is unique
                                      name: annotation.Description, // Use the name from the annotation
                                      assemblyNames: [assembly.ShortName], // Assuming you want to use the assembly's short name
                                      category: ["Annotation"], // Static category for all
                                      adapter: {
                                        type: "Gff3TabixAdapter",
                                        dontRedispatch: ["contig", "region", "chromosome"], //<-- this is the important line, specifically incorporating "contig" into the list
                                        gffGzLocation: {
                                          uri: annotationURL,
                                          locationType: "UriLocation",
                                        },
                                        index: {
                                          location: {
                                            uri: indexURL,
                                            locationType: "UriLocation",
                                          },
                                        },
                                      },
                                    };
                                  }
                                );

                                const tracks = await Promise.all(trackPromises);

                                const state = createViewState({
                                  assembly: {
                                    name: assembly.ShortName,
                                    sequence: {
                                      type: "ReferenceSequenceTrack",
                                      trackId: assembly.ShortName + "-ReferenceSequenceTrack",
                                      adapter: {
                                        type: "BgzipFastaAdapter",
                                        fastaLocation: { uri: assemblyData },
                                        faiLocation: { uri: assemblyIndex },
                                        gziLocation: { uri: assemblyGZI },
                                      },
                                    },
                                  },
                                  tracks, // Adding the dynamically created tracks array
                                });
                                setViewState(state);
                              } else {
                                throw new Error("Invalid URL data received");
                              }
                            } catch (error) {
                              console.error("Error in fetchPresignedUrls:", error);
                            }
                          };

                          fetchPresignedUrls(); // Don't forget to call the function
                        }, [assembly]); // Add other dependencies to useEffect if needed

                        if (!viewState) {
                          return <div>Loading genome data...</div>;
                        }

                        return (
                          <Box>
                            <Box sx={{ height: "10px" }} />
                            <JBrowseLinearGenomeView viewState={viewState} />
                          </Box>
                        );
                      };
                      export default GenomeBrowserComponent;

@cmdcolin cmdcolin reopened this Jun 25, 2024
@cmdcolin cmdcolin transferred this issue from GMOD/jbrowse Jun 25, 2024
@cmdcolin
Copy link
Collaborator

cmdcolin commented Jun 25, 2024

I didn't mean to auto-close this issue. I just merged that one thing with dontRedispatch

I am trying to brainstorm, but i don't have too many concrete ideas.

  • browser cache issue. could make sure to clear browser cache to make sure
  • server side cache issue (this one can be especially relevant if you use AWS cloudfront but i don't think you are in this case...it can just be a thing where the server side is returning old results and could e.g. cause some mismatch of the tbi and the gff.gz perhaps). could rename the files server side to try to fix that.
  • some other server side issue...for example this previous issue you posted was quite weird to me and I couldn't fully understand the resulting invalid gzip behavior....could be additional add on things happening Loading Annotation Track From S3, incorrect gzip header check #4431 (comment)
  • something related to the s3 signed urls. can't really imagine what that would be. you mentioned it worked on desktop so it seems unlikely and I couldn't reproduce locally either pointing at the signed urls directly

as far as the code you posted above though, that seems probably fine. i know it's not super productive but if you want to do an office hours, might be able to live debug :) https://jbrowse.org/jb2/contact/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants