Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
data:ont [2024/04/11 11:03] – Added information about pod5 tools and splitting the big file Richard Bowersdata:ont [2024/06/10 11:03] (current) Richard Bowers
Line 15: Line 15:
 ==== Pod5 Files ==== ==== Pod5 Files ====
  
-Pod5 is a proprietary format developed by Oxford Nanopore. It can be considered an intermediate format, but can also be reprocessed in a way that the Illumina intermediate files cannot. Thus we always deliver the Pod5 equivalent of the BAM or FASTQ from a sequencing run.+Pod5 is a proprietary format developed by Oxford Nanopore. It can be considered an intermediate format, but can also be reprocessed in a way that the Illumina intermediate files cannot. Thus we always deliver the Pod5 files as produced by the sequencer with the BAM or FASTQ data.
  
 ===== File Naming ===== ===== File Naming =====
  
-The files will be named using the same pattern as files from [[data:illumina|Illumina sequencing]]:+The ONT data files have an additional component to them, here referred to as ''<hash>''. This is an arbitrary hexadecimal number generated by the sequencer and put into the run identifier. For example, in ''20240521_1826_1C_PAW28166_38d23152'' the //hash// is ''38d23152''. This is required because an ONT run can be stopped and restarted with the same pool and flowcell, and in such a case the files produced would overwrite one another on the FTP site and when downloaded with //clarity-tools.jar//
 + 
 +In all other respects, the files will be named using the same pattern as files from [[data:illumina|Illumina sequencing]]:
  
 <code> <code>
-<SLX>.<barcode>.<flowcell>.s_<lane>.r_<read>.bam +<SLX>.<barcode>.<flowcell>.<hash>.s_<lane>.r_<read>.bam 
-<SLX>.<barcode>.<flowcell>.s_<lane>.md5sums.txt +<SLX>.<barcode>.<flowcell>.<hash>.s_<lane>.md5sums.txt 
-<SLX>.<barcode>.<flowcell>.s_<lane>.r_<read>.pod5 +<SLX>.<flowcell>.<hash>.s_<lane>.lostreads.bam 
-<SLX>.<barcode>.<flowcell>.s_<lane>.pod5.md5sums.txt +<SLX>.<flowcell>.<hash>.s_<lane>.lostreads.md5sums.txt
-<SLX>.<flowcell>.s_<lane>.lostreads.bam +
-<SLX>.<flowcell>.s_<lane>.lostreads.md5sums.txt +
-<SLX>.<flowcell>.s_<lane>.lostreads.pod5 +
-<SLX>.<flowcell>.s_<lane>.lostreads.pod5.md5sums.txt+
 </code> </code>
  
Line 35: Line 33:
  
 <code> <code>
-<SLX>.NoIndex.<flowcell>.s_<lane>.r_<read>.bam +<SLX>.NoIndex.<flowcell>.<hash>.s_<lane>.r_<read>.bam 
-<SLX>.NoIndex.<flowcell>.md5sums.txt +<SLX>.NoIndex.<flowcell>.<hash>.md5sums.txt 
-<SLX>.NoIndex.<flowcell>.s_<lane>.r_<read>.pod5 +</code> 
-<SLX>.NoIndex.<flowcell>.pod5.md5sums.txt+ 
 +The Pod5 files are delivered in a TAR fileThe structure inside this file is the directory structure of the run's //pod5// directory. 
 + 
 +<code> 
 +<SLX>.<flowcell>.<hash>.s_<lane>.pod5.tar 
 +<SLX>.<flowcell>.<hash>.s_<lane>.pod5.tar.md5sums.txt
 </code> </code>
  
Line 58: Line 61:
  
 The library comes with some Python tools around the C++ core that allow you to manipulate the files. There is one shortcoming in the tool set though: the ability to easily split a large Pod5 file into chunks of a fixed size (by number of reads). We have created a tool for this job, which is available at [[https://github.com/crukci-bioinformatics/pod5split]]. The library comes with some Python tools around the C++ core that allow you to manipulate the files. There is one shortcoming in the tool set though: the ability to easily split a large Pod5 file into chunks of a fixed size (by number of reads). We have created a tool for this job, which is available at [[https://github.com/crukci-bioinformatics/pod5split]].
- 
-The PromethION creates many Pod5 files as it runs. It is impractical for us to distribute this collection of many files easily, so these small files are merged into one very big Pod5 file (using the ''pod5 merge'' tool). The ONT **//get the name from Matt//** pipeline runs very slowly with a single big file though, so we recommend splitting the file again using the tool we have written.