Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Demultiplexed FASTQ support #23

Merged
merged 16 commits into from
Sep 6, 2023
18 changes: 17 additions & 1 deletion test-data/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -163,7 +163,7 @@ test-paired-end:
--row-barcode-policy PREFIX:CACCG@11 \
--rev-row-barcode-policy PREFIX:CGGTG@11 \
--col-barcode-policy FIXED@0)
(cd $(wd); diff counts.txt ../../test-data/paired-end/expected-counts.txt)
(cd $(wd); diff counts.txt ../../test-data/paired-end/expected-counts.txt)

test-multiple-inputs: wd = $(test-output-dir)/multiple-inputs
test-multiple-inputs:
Expand All @@ -181,3 +181,19 @@ test-multiple-inputs:
diff lognormalized-counts.txt ../../test-data/lognormalized-counts.txt && \
diff barcode-counts.txt ../../test-data/barcode-counts.txt && \
diff correlation.txt ../../test-data/correlation.txt)

test-demultiplexed: wd = $(test-output-dir)/demultiplexed
test-demultiplexed:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The length limit for a unix command line is 4096 chars, which is starting to look achievable with this new demuxed input mode. I guess that is all the more reason to switch to a JSON config file--was that just for pq-launcher or for pq as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't planning to implement it in PoolQ in this PR. I think that is how PoolQ4 will work. In our case, the poolq launcher constructs the config object directly from its command line, but I'm planning to change the poolq launcher command line to take only 2 parameters: --db and --job-id, which it will use to read everything it needs from the database, obviating the need for command lines that might exceed the limit.

@rm -rf $(wd)
@mkdir -p $(wd)
(cd $(wd); java $(jvm-args) -jar $(poolq-jar) \
--compat \
--col-reference $(conditions) \
--row-reference $(reference) \
--demultiplexed \
--reads TTGAACCG:../../test-data/demultiplexed/TTGAACCG.construct.fastq,CCGAGTTA:../../test-data/demultiplexed/CCGAGTTA.construct.fastq,TTGAGTAT:../../test-data/demultiplexed/TTGAGTAT.construct.fastq,CCTCCAAT:../../test-data/demultiplexed/CCTCCAAT.construct.fastq,GGTCACCG:../../test-data/demultiplexed/GGTCACCG.construct.fastq,TTGACAAT:../../test-data/demultiplexed/TTGACAAT.construct.fastq,AATCCAAT:../../test-data/demultiplexed/AATCCAAT.construct.fastq,TTCTCATA:../../test-data/demultiplexed/TTCTCATA.construct.fastq,AATCCACG:../../test-data/demultiplexed/AATCCACG.construct.fastq,AATCGTGC:../../test-data/demultiplexed/AATCGTGC.construct.fastq,AAGAACTA:../../test-data/demultiplexed/AAGAACTA.construct.fastq,CCAGTGAT:../../test-data/demultiplexed/CCAGTGAT.construct.fastq,GGTCGTGC:../../test-data/demultiplexed/GGTCGTGC.construct.fastq,TTAGACCG:../../test-data/demultiplexed/TTAGACCG.construct.fastq,GGTCCACG:../../test-data/demultiplexed/GGTCCACG.construct.fastq,CCGAACTA:../../test-data/demultiplexed/CCGAACTA.construct.fastq,AACTCACG:../../test-data/demultiplexed/AACTCACG.construct.fastq,AATCACTA:../../test-data/demultiplexed/AATCACTA.construct.fastq,GGTCCATA:../../test-data/demultiplexed/GGTCCATA.construct.fastq,GGTCTGCG:../../test-data/demultiplexed/GGTCTGCG.construct.fastq,CCAGTGGC:../../test-data/demultiplexed/CCAGTGGC.construct.fastq,AACTTGCG:../../test-data/demultiplexed/AACTTGCG.construct.fastq \
--row-barcode-policy PREFIX:CACCG@18)
(cd $(wd); diff counts.txt ../../test-data/expected-counts.txt && \
diff lognormalized-counts.txt ../../test-data/lognormalized-counts.txt && \
diff barcode-counts.txt ../../test-data/barcode-counts.txt && \
diff correlation.txt ../../test-data/correlation.txt)
168 changes: 168 additions & 0 deletions test-data/demultiplexed/AACTCACG.construct.fastq
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
@HWUSI-EAS100R:6:23:398:3989#1
TATGTAAAGGGTAAAAGTGCAGTGCACCGCCATAATACTAGGTGACAGA
+
=87E45/08.93:497@487-@91/-6;2.:;?475:24887;/7/8;5
@HWUSI-EAS100R:6:23:398:3989#15
TATGTAAAGGGTAAAAGTGCAGTGCACCGCCCAGGTCACTAAGTATATT
+
<=/0=2/9)?3=4?9=4-;99?50;.79979::B209=35:7745428<
@HWUSI-EAS100R:6:23:398:3989#30
TATGTAAAGGGTAAAAGTGCAGTGCACCGCGGAAAGGAATCCACATCAT
+
:54514?7582884?73E;<;:22=;-08627<67:<::292:=656;8
@HWUSI-EAS100R:6:23:398:3989#63
TATGTAAAGGGTAAAAGTGCAGTGCACCGTACAATATGCTACCTCCAAA
+
176682282<7983:62?981+1-02940788/49<767/9756605.:
@HWUSI-EAS100R:6:23:398:3989#84
TATGTAAAGGGTAAAAGTGCAGTGCACCGTACAATATGCTACCTCCAAA
+
:;;=:5519/=;158+728=35:9-93884@56;14:48?:37>6D.8:
@HWUSI-EAS100R:6:23:398:3989#145
TATGTAAAGGGTAAAAGTGCAGTGCACCGCCTTTGTACCACCAGCATGT
+
32;C661;4<3;C527431?7;?4)=;68:7:554:3:E;5;0=529+1
@HWUSI-EAS100R:6:23:398:3989#148
TATGTAAAGGGTAAAAGTGCAGTGCACCGTTGGACTACATAACCTGTAA
+
@:7@;A0913-015378B2;2*;.A1765;D:9=4972:5677=,6;,7
@HWUSI-EAS100R:6:23:398:3989#153
TATGTAAAGGGTAAAAGTGCAGTGCACCGGCAGTCCAGACATCAATTCA
+
:=49/445:2957=;752;;51<;6A9;797/:>32;<:3C7)025220
@HWUSI-EAS100R:6:23:398:3989#159
TATGTAAAGGGTAAAAGTGCAGTGCACCGGAAGATGCTGTCGTTGAAAT
+
/:28744;84879.7:49?75;74:=28<<569:6?;<;559<81;59:
@HWUSI-EAS100R:6:23:398:3989#203
TATGTAAAGGGTAAAAGTGCAGTGCACCGCCCAGGTCACTAAGTATATT
+
+;:270:52=464487;8==;662828854449D4+56<6/87749>>;
@HWUSI-EAS100R:6:23:398:3989#224
TATGTAAAGGGTAAAAGTGCAGTGCACCGTGTCTGTCCCTGTAGTATAT
+
4/98:,29<3188896=7846;565>6>:850775:<=45:237/99:<
@HWUSI-EAS100R:6:23:398:3989#345
TATGTAAAGGGTAAAAGTGCAGTGCACCGGGCCAAATTAATGACATATT
+
45;1.872567A608<40723548=95676>>889>62<-A54396229
@HWUSI-EAS100R:6:23:398:3989#348
TATGTAAAGGGTAAAAGTGCAGTGCACCGGCTGGAGGAGAGAGCACCAA
+
3958;685342;547:<98;44027:3562>058=3795=4/;3:2;84
@HWUSI-EAS100R:6:23:398:3989#387
TATGTAAAGGGTAAAAGTGCAGTGCACCGTACAATATGCTACCTCCAAA
+
?43@71?7=:4<6=31472=971/4174688<<:?:6<4>2896;=4>2
@HWUSI-EAS100R:6:23:398:3989#390
TATGTAAAGGGTAAAAGTGCAGTGCACCGCCTCCGTTCTGATACTCACA
+
;;>>877;8;<6B7=<44;6:==::55;6173?6055838954@?93;8
@HWUSI-EAS100R:6:23:398:3989#392
TATGTAAAGGGTAAAAGTGCAGTGCACCGCTATATCAACAGCACCGTGA
+
;406?>968<:@6553:>;13596=7;:=;0056/0616769@>.567@
@HWUSI-EAS100R:6:23:398:3989#412
TATGTAAAGGGTAAAAGTGCAGTGCACCGGGCCAAATTAATGACATATT
+
88:71+9<9+75959:3@2;/9*38,5A:82489667=675*98884:3
@HWUSI-EAS100R:6:23:398:3989#494
TATGTAAAGGGTAAAAGTGCAGTGCACCGATGTTTAGATGTGGATCTTT
+
4956;1,6;915684>317=4;90=6;?<.0B;4=/7204;8=44,8=@
@HWUSI-EAS100R:6:23:398:3989#548
TATGTAAAGGGTAAAAGTGCAGTGCACCGTGTGGACCGTCGTCATCATT
+
;90399;716@7@363<3887B*9:;36?3=643975;:=97;/93688
@HWUSI-EAS100R:6:23:398:3989#549
TATGTAAAGGGTAAAAGTGCAGTGCACCGCAAGACGCGCATCATTTACT
+
6999::5:;686<0:=:35;12::<@;8;<24<:660579878C87538
@HWUSI-EAS100R:6:23:398:3989#550
TATGTAAAGGGTAAAAGTGCAGTGCACCGCCGGCAGACTTGTGGCTACA
+
96>951>2:0=6>99=392:6962<7>4617A566730/8940690:-C
@HWUSI-EAS100R:6:23:398:3989#580
TATGTAAAGGGTAAAAGTGCAGTGCACCGGTCTGATTGAAGCCTACTAA
+
629@23?933:/=;7.<@795>89:385/3;6781-56:86;7:7:67;
@HWUSI-EAS100R:6:23:398:3989#585
TATGTAAAGGGTAAAAGTGCAGTGCACCGCACTGGCTTCATGTGAACTT
+
5?76.69+193>/14A2;254<08<:7:;74>38@@76958942:40;7
@HWUSI-EAS100R:6:23:398:3989#664
TATGTAAAGGGTAAAAGTGCAGTGCACCGACAACCAGGTCAGACGCAGA
+
1<2<<,362<651457;15?=3:9772:0A88@56046>4A.04/7278
@HWUSI-EAS100R:6:23:398:3989#679
TATGTAAAGGGTAAAAGTGCAGTGCACCGGAAGATGCTGTCGTTGAAAT
+
7:C25848>6=5425=<:/84;/>271>16/91A770855.43328</.
@HWUSI-EAS100R:6:23:398:3989#696
TATGTAAAGGGTAAAAGTGCAGTGCACCGCTATGAAGCTCTCACGGTTA
+
?556331:&36;53>885:@74185C?2>6?3-==:0;6;;2/?39289
@HWUSI-EAS100R:6:23:398:3989#700
TATGTAAAGGGTAAAAGTGCAGTGCACCGCCTGCTATAACCTCACTTTA
+
33>5643797655307>::A482<+-60?9A63;43>;>18;:<8<147
@HWUSI-EAS100R:6:23:398:3989#705
TATGTAAAGGGTAAAAGTGCAGTGCACCGCGGAGATCCATCTCCTTTGT
+
=@957>846:4;;42:?=318==/;779<:7?;/5C444;6;:;606:5
@HWUSI-EAS100R:6:23:398:3989#710
TATGTAAAGGGTAAAAGTGCAGTGCACCGGCTAAATTGAAACCTGGAAT
+
77=54D2967=;=1855;693563/6.D215<886.88367/864:178
@HWUSI-EAS100R:6:23:398:3989#722
TATGTAAAGGGTAAAAGTGCAGTGCACCGCACTACCAAATTGATACAAA
+
605@.;950:<B918.;8810>8)66>4?6;8/026/973:7687478/
@HWUSI-EAS100R:6:23:398:3989#772
TATGTAAAGGGTAAAAGTGCAGTGCACCGCAACCTGGAGACGCAGCACA
+
<0;C64A5=0;976<6/:;:3634:4<:757?84178+:?84;?:<9<9
@HWUSI-EAS100R:6:23:398:3989#785
TATGTAAAGGGTAAAAGTGCAGTGCACCGCCTAATCATCTCGATCTGGT
+
:,<.9490:83<80<5<6<3:;27277<28.41:2:01<6867@,=522
@HWUSI-EAS100R:6:23:398:3989#792
TATGTAAAGGGTAAAAGTGCAGTGCACCGGCGCAACATGATGAGAAGTT
+
<98A98:66<;>4>894;;+<67-/0-65/<43-548720;37889>.:
@HWUSI-EAS100R:6:23:398:3989#812
TATGTAAAGGGTAAAAGTGCAGTGCACCGATGTTTAGATGTGGATCTTT
+
9,;17:4@-=9;/>,709=63=;9.<?G78<+9152897681=;:7637
@HWUSI-EAS100R:6:23:398:3989#816
TATGTAAAGGGTAAAAGTGCAGTGCACCGGGCCAAATTAATGACATATT
+
<369,7<@<984.;;682B9<<7546:;<:4451:/743;62?19653;
@HWUSI-EAS100R:6:23:398:3989#846
TATGTAAAGGGTAAAAGTGCAGTGCACCGCGAGTGAGGAATTTGTTCAA
+
445<7>26=9><7><4,?<973789168<463866;8734089.695/9
@HWUSI-EAS100R:6:23:398:3989#860
TATGTAAAGGGTAAAAGTGCAGTGCACCGCGTGTACTTTGGATCTGGGT
+
7398-;6678;1B69=7I448;335;=A=:2=0?7;687;6103862:.
@HWUSI-EAS100R:6:23:398:3989#905
TATGTAAAGGGTAAAAGTGCAGTGCACCGTCGCTATGCATGAACTGTTA
+
?:58;0B7=62:>=36985795;5?;63?:<4?A398*16.?*6-98,9
@HWUSI-EAS100R:6:23:398:3989#906
TATGTAAAGGGTAAAAGTGCAGTGCACCGCTAAATGTGGACAAAGTAAT
+
B2<74;;6,<728=24:38873367:10<C7;;6:638:B;56:59;7;
@HWUSI-EAS100R:6:23:398:3989#950
TATGTAAAGGGTAAAAGTGCAGTGCACCGCAACCTGGAGACGCAGCACA
+
>7:46=757=560<A63524-6597D9/97C9>'3:=@?4089><9932
@HWUSI-EAS100R:6:23:398:3989#951
TATGTAAAGGGTAAAAGTGCAGTGCACCGGTCTGCACCTTGCAGTCTTA
+
1=?92824-3<5-71-3B9864=;3:53CC>6636>99@B77;7774=;
@HWUSI-EAS100R:6:23:398:3989#963
TATGTAAAGGGTAAAAGTGCAGTGCACCGGTCTGATTGAAGCCTACTAA
+
<50<4<5<49>>:534)88<9::=:6.:,7495403766778.575>;8
Loading