-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow non-owned Sequence types #19
Comments
I can think of 3 main ways to handle this...
While my default temptation was to follow Rust tradition for Chesterton's fence reasons, the more I think about it the more a generic wrapper on anything that derefs to a nucleotide slice sounds appealing for composability reasons; I'd love to be able to use other kinds of backing storage. If all Also, wrapper types and extension traits aren't mutually exclusive. We could have extension traits do the heavy lifting, and have methods of the same name on wrapper types that mostly defer to the trait implementation (and possibly rewrap the output if we care about e.g. dna sequence windows being dna sequences). |
I'm not sure what the two-types solution would look like. Are you suggesting something like an unsized I think extension traits could be nice. It would be cool if people could bring their own storage, like |
I've pushed a protoype branch and I'd love some feedback about whether this seems like a potentially acceptable direction to head in. First some quick example code: use quickdna::Nucleotides; // extension trait
// This approach should be backwards compatible
let dna: DnaSequence<Nucleotide> = DnaSequence::from_str("ATCGAATTCCGG").unwrap();
// IMHO, translation tables really feel like functions mapping codons to amino acids
// I'd be sorely tempted to also make these accessible as ordinary functions
let ncbi1 = TranslationTable::Ncbi1.to_fn();
// Everything is lazy, so (almost) no resources have been used yet. Note that this
// is an iter of Option<AminoAcid>, in case the translation table is missing something
let protein = dna.codons().map(ncbi1);
// Because everything is just an iterator, we can choose whatever storage we like
// and rely on built-in Rust code for hoisting the Option out during collection.
let protein: Option<SmallVec<[AminoAcid; 10]>> = protein.collect();
let protein = protein.expect("Unable to translate to amino acids");
// Again this supports Vecs, SmallVecs, arrays or anything else that coerces to a slice
use Nucleotide::*;
let dna = [A, T, C, G];
// Like before, this is lazy; it has consumed almost no resources.
let frames = dna.reverse_complement().frames();
// Unfortunately, because frames is a SmallVec of iterators, we have to clone() to extract from it.
let v: Vec<_> = frames[1].clone().collect();
assert_eq!(v, [G, A, T]); The approach I took definitely isn't perfect; sadly I wasn't able to figure out a way to make the extension trait work for all nucleotide iterators while retaining the convenience of only needing to import a single trait. But perhaps something vaguely like this might be something to work towards? What do other people think? (any ideas for improvements or other approaches?) |
It hit me that in order to save other people effort, I should probably describe the false start I went on before reaching the above: I wanted to make an extension trait for anything that implements something like |
I've also pushed another version of this prototype to swooster/dna-prototype-general. In exchange for not supporting custom types that implement |
I like this API! In #22, @vgel commented that one of my changes towards Rust-y trait-y Anyway, that's why I think it's a good idea to benchmark quickdna at this point and get a better idea of the impact of these changes. (Also, I should note, maybe we don't need to worry about the code becoming e.g. 10% slower if it's still 90x faster than Biopython and maybe not a bottleneck for our purposes.) |
@lynn: Completely agreed regarding the tradeoff of slightly suboptimal performance in exchange for making it convenient to avoid allocations/copies... Case in point, I've just submitted the above API as PR #24, which includes a benchmark for generating windows the way synthclient does vs reducing allocations/copies via iterator/reference-based windowing; at least on my laptop, the latter is about 18x faster for a 100kbp sequence. |
Currently DnaSequence owns a
Vec<Nucleotide>
. But we'd also like to be able to have a DnaSequence backed by a borrowed&[Nucleotide]
. Same for ProteinSequence.Thought from @vgel: use generics, so
DnaSequence<Vec<Nucleotide>>
vsDnaSequence<&[Nucleotide]>
, and then maybe nice aliases for these. Maybe even makeDnaSequence<[Nucleotide; N]>
work. This is what SmallVec does.Alternatively, follow the Rust tradition of two typenames for borrowed vs. owned, like
str
vsString
andPath
vsPathBuf
.The text was updated successfully, but these errors were encountered: