Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generation of large matrices and data frames takes forever #34

Open
cjfields opened this issue Feb 14, 2022 · 2 comments
Open

Generation of large matrices and data frames takes forever #34

cjfields opened this issue Feb 14, 2022 · 2 comments

Comments

@cjfields
Copy link
Contributor

cjfields commented Feb 14, 2022

Seeing some issues when working with very large data sets (~1000 samples or more, over 1M ASVs), where the simple text outputs take a long time. This primarily is when prepping for QIIME2:

# Generate OTU table for QIIME2 import (rows = ASVs, cols = samples)

or generating a new seq table with the modified IDs:

# Generate OTU table output (rows = samples, cols = ASV)

One workaround is to simply generate default outputs (seq tables and tax tables for phyloseq) but time out for other data, but this will require splitting out those steps, currently found in GenerateSeqTables.R and GenerateTaxTables.R.

@cjfields
Copy link
Contributor Author

The main culprit is really the seq table and the number of samples. With a current run we have a matrix of 960 sample IDs x 1.3M ASVs (with counts). The tax table with 1.3M ASVs and seven ranks (KPCOPGS) is relatively fast.

@cjfields
Copy link
Contributor Author

There is a bit of redundancy in GenerateSeqTables.R that should also be addressed, namely that seqtab_final.txt and seqtab_final.simple.txt are the same file; this likely occurs from some code rework that we when renaming ASVs. We can wait to address this when the split_denoise branch lands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant