Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
support:demultiplexing [2024/04/11 13:17] – Completed page with links to sections. Richard Bowerssupport:demultiplexing [2025/08/06 09:10] (current) Richard Bowers
Line 3: Line 3:
 The data files the CRUK-CI sequencing service provides after sequencing your samples are demultiplexed according to the barcodes provided in the submission spreadsheet. It sometimes happens that users of the service make a mistake when filling in this spreadsheet and put an incorrect index against one or more samples, or samples are omitted from the submission spreadsheet. This manifests as those samples' FASTQ files being much smaller than one might expect, or not present at all. The data files the CRUK-CI sequencing service provides after sequencing your samples are demultiplexed according to the barcodes provided in the submission spreadsheet. It sometimes happens that users of the service make a mistake when filling in this spreadsheet and put an incorrect index against one or more samples, or samples are omitted from the submission spreadsheet. This manifests as those samples' FASTQ files being much smaller than one might expect, or not present at all.
  
-The good news is that the reads are still present in the data: they've just not been allocated to the sample. There is a file for each lane of sequencing called SLX-????.<flow cell id>.s_?.r_?.lostreads.fq.gz that contains all the reads that could not be allocated to any of the indexes listed in the submission spreadsheet.+The good news is that the reads are still present in the data: they've just not been allocated to the sample. There is a file for each lane of sequencing called ''SLX-????.<flow cell id>.s_?.r_?.lostreads.fq.gz'' that contains all the reads that could not be allocated to any of the indexes listed in the submission spreadsheet.
  
 The bad news is that correcting the submission information regarding the indexes attached to the samples is troublesome to the point of impossibility in our Clarity LIMS system. We cannot therefore fix your submission to rerun demultiplexing and attach those files so you can fetch them with the download tool or on the FTP site. Rerunning demultiplexing is also not an automatic process: a little time and effort will be needed to reprocess the FASTQ files. The bad news is that correcting the submission information regarding the indexes attached to the samples is troublesome to the point of impossibility in our Clarity LIMS system. We cannot therefore fix your submission to rerun demultiplexing and attach those files so you can fetch them with the download tool or on the FTP site. Rerunning demultiplexing is also not an automatic process: a little time and effort will be needed to reprocess the FASTQ files.
Line 15: Line 15:
 The initial creation and demultiplexing of the FASTQ files is done with [[https://support.illumina.com/sequencing/sequencing_software/bcl-convert.html|Illumina's BCL Convert program]]. This can only work from the intermediate proprietary files produced by the sequencer, so for any demultiplexing from FASTQ to FASTQ we use our own program, //demuxFQ//. You can download the tool using the links below: The initial creation and demultiplexing of the FASTQ files is done with [[https://support.illumina.com/sequencing/sequencing_software/bcl-convert.html|Illumina's BCL Convert program]]. This can only work from the intermediate proprietary files produced by the sequencer, so for any demultiplexing from FASTQ to FASTQ we use our own program, //demuxFQ//. You can download the tool using the links below:
  
-  * [[https://genomicshelp.cri.camres.org/tools/demultiplexer.rhel.tar.gz|Redhat / CentOS binary]] (RHEL 7 or newer) +  * [[https://genomicshelp.cruk.cam.ac.uk/tools/demultiplexer.rhel.tar.gz|Redhat / CentOS binary]] (RHEL 7 or newer) 
-  * [[https://genomicshelp.cri.camres.org/tools/demultiplexer.macos.tar.gz|MAC OS X binary]] +  * [[https://genomicshelp.cruk.cam.ac.uk/tools/demultiplexer.macos.tar.gz|MAC OS X binary]] 
-  * [[https://genomicshelp.cri.camres.org/tools/demultiplexer.cygwin64.zip|Windows Cygwin binary]] (requires Cygwin 64 bit) +  * [[https://genomicshelp.cruk.cam.ac.uk/tools/demultiplexer_src.tar.gz|Source tar ball]]
-  * [[https://genomicshelp.cri.camres.org/tools/demultiplexer_src.tar.gz|Source tar ball]]+
  
 If you build from source, the INSTALL file in the tar ball gives instructions on how to build the program. If you build from source, the INSTALL file in the tar ball gives instructions on how to build the program.
Line 59: Line 58:
  
 Regardless of the type of barcoding kit, you will need to produce a configuration file per read for your data. If your data is single end, one file is sufficient. If it is paired end, you will need a second configuration file with different file names for read two. In our naming conventions (as above), this is the "//r_?//" part of the name. So we have //r_1// for read one and //r_2// for read two. The easiest way to produce the configuration files is to prepare for one read, check it is correct, make a copy and tweak the file names for the second read, keeping the same index sequences. Regardless of the type of barcoding kit, you will need to produce a configuration file per read for your data. If your data is single end, one file is sufficient. If it is paired end, you will need a second configuration file with different file names for read two. In our naming conventions (as above), this is the "//r_?//" part of the name. So we have //r_1// for read one and //r_2// for read two. The easiest way to produce the configuration files is to prepare for one read, check it is correct, make a copy and tweak the file names for the second read, keeping the same index sequences.
 +
 +==== Two or More Barcodes per Sample ====
 +
 +It is possible to demultiplex in such a way as to put reads from more than one barcode into a single file. One can do this by listing each barcode sequence in the index file with the same file name.
 +
 +<code>
 +AGTTCC  SLX-9214.Sample01.r_1.fq.gz  Sample_01
 +AGTCAA  SLX-9214.Sample01.r_1.fq.gz  Sample_01
 +CAGATC  SLX-9214.Sample03.r_1.fq.gz  Sample_03
 +CTTGTA  SLX-9214.Sample03.r_1.fq.gz  Sample_03
 +</code>
 +
 +The example above will create two FASTQ files, each containing reads tagged with two different barcodes.
 +
 +The most likely real world example of this is the old style single index 10x SIGA/SINA barcoding. These have four barcodes per sample, so do need to be combined into a single file for their relevant sample. For example, to demultiplex and extract the SINAA1 and SINAA2 barcodes from a multiplexed file one would use an index file similar to below.
 +
 +<code>
 +AAACGGCG  SLX-1000.SINAA1.r_1.fq.gz
 +CCTACCAT  SLX-1000.SINAA1.r_1.fq.gz
 +GGCGTTTC  SLX-1000.SINAA1.r_1.fq.gz
 +TTGTAAGA  SLX-1000.SINAA1.r_1.fq.gz
 +AGCCCTTT  SLX-1000.SINAA2.r_1.fq.gz
 +CAAGTCCA  SLX-1000.SINAA2.r_1.fq.gz
 +GTGAGAAG  SLX-1000.SINAA2.r_1.fq.gz
 +TCTTAGGC  SLX-1000.SINAA2.r_1.fq.gz
 +</code>
 +
 +The examples here are for single index, but dual index works in exactly the same way with the additional column for the i5 sequence.
  
 ===== Running the Demultiplexer ===== ===== Running the Demultiplexer =====