We all started with our ABC’s and somehow we ended up falling in love with AGTC’s. But when we step into NGS…. OMG. Let’s dive into the genetic dictionary that is Next-generation sequencing terminology.
Most of you already know the acronyms DNA and RNA. But, in the Next Generation Sequencing world, there is a whole lot more. When you are starting to think about your next sequencing experiment, people may ask if you are doing WES, WGS, or target-seq?
WES is Whole Exome Sequencing and WGS is Whole Genome Sequencing, and target-seq is Targeted Sequencing. There are also a whole bunch of other sequencing terms out there like RNA-Seq, ChiP-Seq, and ChiA-PET .
Yes… I said it, ChiA-PET. But not like the ones you used to grow on your kitchen counter.
Some other acronyms may be terms related to variant and mutation analysis. For example, your friend may ask, “Are you worried about HPs near your SNPs, SNVs, Indels, or CNVs in your sample? What????
After you perform your sequencing run, there may be questions about how your sequencing results turned out. People may ask how was the AQ, or what was the MRL, or for Ion Torrent technology, how many ISPs did you get?
If you need to know, AQ refers to Alignment Quality, in other words, how well did the sequence of your sample align with the reference sequence. As for MRL, this is in reference to Mean Read Length. In NGS, because you generate millions of reads or fragments of DNA, the length of the reads can vary in size, hence there is an average or mean length of the read. And finally, Ion Torrent users are also concerned with ISPs or Ion Sphere Particles. The more ISPs you have the more reads can be sequenced.
If these terms make your head spin, wait until you hear the terms used by your friendly bioinformatics scientist. When you talk to them, they will ask you if you have a BAM or uBAM file, or they will say did you get a FASTQ file? Then, they want to know if you need a FASTA and VCF output file. — Whatever happened to a simple spreadsheet of my results?
Allow me to try to explain this language. The BAM file is a Binary Aligned/Mapped file with sequencing reads. A uBAM is an binary file with unaligned/unmapped reads. The FASTQ file is a list of reads generated with quality scores for each read while a FASTA file is simply a file with sequence information. Finally, a VCF file is a Variant Calling Format file that describes the variant of interest and its location. Whew…..Ok, Got it?
There you have it. The basic glossary of acronyms and terms we commonly use in NGS talk. If you have more questions on NGS, Submit them at thermofisher.com/ask and subscribe to our channel to see more videos like this.
And remember, when in doubt, just Seq It Out
uBAM= UNALIGNED binary ALIGNED/mapped file… where is the logis behind this format ?
Why not generating fastq files ? Same information than ubam but smaller size, could be compressed, standard format and could be processed by the vast majority of ngs softwares…
Dear Thomas,
Our Ion torrent sequencer will generate BAM file as the output only. If the run was associated with a reference, then the BAM file will be a mapped BAM file, you can find the mapping location of each reads; If the reference genome was not assigned, the BAM will be a unaligned BAM, which has no mapping locations.
The reason we use BAM as our output file is because we can store the flow space information in it. The flow space information is critical for Torrent Variant Caller (TVC) to call the SNV accurately. This is the reason why our user should/prefer to use BAM+TMAP (our aligner) + TVC to call SNV.
In case the users want to use FASTQ file, they can run the FileExporter plugin to generate the FASTQ file easily.
I hope this helps.