Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Demultiplexed FASTQ support #23

Merged
merged 16 commits into from
Sep 6, 2023
Merged

Demultiplexed FASTQ support #23

merged 16 commits into from
Sep 6, 2023

Conversation

mtomko
Copy link
Collaborator

@mtomko mtomko commented Sep 5, 2023

No description provided.

This is a minor bugfix
For demultiplexed FASTQs, there is no column barcode policy
@mtomko mtomko self-assigned this Sep 5, 2023
@mtomko mtomko force-pushed the demultiplexed-support branch 3 times, most recently from 8738b9f to 46cc0ae Compare September 5, 2023 16:06
@mtomko mtomko force-pushed the demultiplexed-support branch 2 times, most recently from f404cdf to ddc22d9 Compare September 5, 2023 17:17
@mtomko mtomko marked this pull request as ready for review September 5, 2023 17:32
@@ -181,3 +181,19 @@ test-multiple-inputs:
diff lognormalized-counts.txt ../../test-data/lognormalized-counts.txt && \
diff barcode-counts.txt ../../test-data/barcode-counts.txt && \
diff correlation.txt ../../test-data/correlation.txt)

test-demultiplexed: wd = $(test-output-dir)/demultiplexed
test-demultiplexed:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The length limit for a unix command line is 4096 chars, which is starting to look achievable with this new demuxed input mode. I guess that is all the more reason to switch to a JSON config file--was that just for pq-launcher or for pq as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't planning to implement it in PoolQ in this PR. I think that is how PoolQ4 will work. In our case, the poolq launcher constructs the config object directly from its command line, but I'm planning to change the poolq launcher command line to take only 2 parameters: --db and --job-id, which it will use to read everything it needs from the database, obviating the need for command lines that might exceed the limit.

Copy link
Member

@tmgreen tmgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. I have one performance-related comment/question, and then 🧼 🧇 🦕

@tmgreen
Copy link
Member

tmgreen commented Sep 6, 2023

PQ4 might be a good time to make the switch from Array[Char] to Array[Byte] for all sequence stuff... With COMPACT_STRINGS under the hood it might be a lot faster.

@mtomko
Copy link
Collaborator Author

mtomko commented Sep 6, 2023

I don't know about COMPACT_STRINGS. I had just been agonizing over the decision to between String and Array[Char] in PQ4 and had arrived at the conclusion that using Array[Char] was going to be a huge hassle due to how Scala/Java treat arrays (reference equality, no hash codes - to gain those, I'd basically have to re-implement String, I think). I may be able traffick in Array[Byte] and convert to String when it comes time to look up things in the Reference though.

@mtomko
Copy link
Collaborator Author

mtomko commented Sep 6, 2023

Another PQ4 thought, I'm currently getting FASTQ records as Array[String] (4 elements in the array, 1 per line). It would be interesting to read inte Array[Array[Byte]] but I'd have to implement a lot of the low level processing on my own then. It might be a worthwhile exercise but it's definitely dipping into I/O stuff that I don't have a lot of experience with.

@mtomko mtomko changed the title Demultiplexed-support Demultiplexed FASTQ support Sep 6, 2023
@mtomko mtomko merged commit 3492522 into main Sep 6, 2023
1 check passed
@mtomko mtomko deleted the demultiplexed-support branch September 6, 2023 13:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants