Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| data:retrieval [2024/04/09 10:49] – ↷ Page moved and renamed from retrieving_data to data:retrieval Bioinformatics service admin | data:retrieval [2025/08/20 10:04] (current) – Ania Piskorz | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| ====== Retrieving Your Sequencing Data ====== | ====== Retrieving Your Sequencing Data ====== | ||
| - | |||
| - | ===== Data Files ===== | ||
| - | |||
| - | Your sequencing data will made available in the standard FASTQ file format. We also provide some smaller files giving information about those files, and a report for each lane of sequencing. | ||
| - | |||
| - | Processing the sequenced run folders is done using Illumina' | ||
| - | |||
| - | ==== FASTQ Files ==== | ||
| - | |||
| - | Your data will be demultiplexed according to the information supplied in your submission for sequencing. What you receive will depend on the type of sequencing done. | ||
| - | |||
| - | If one is downloading files outside of the CRUK-CI institute, it is strongly recommended that one checks the files have no corruption during transfer (usually truncation rather than corruption). We provide checksums for the FASTQ files, which can be used to make sure all the files have transferred properly. | ||
| - | |||
| - | === Regular Sequencing === | ||
| - | |||
| - | You will normally receive one (single read) or two (paired end) FASTQ files per sample for each lane of sequencing, plus one or two additional FASTQ files that contain the reads that the demultiplexer (Illumina' | ||
| - | |||
| - | < | ||
| - | < | ||
| - | </ | ||
| - | |||
| - | A small subset of kits (library types) also return an index read: a separate FASTQ file or pair of files containing the index reads with quality scores. These files will be named with an " | ||
| - | |||
| - | < | ||
| - | < | ||
| - | </ | ||
| - | |||
| - | The lost reads file will have a different naming pattern: | ||
| - | |||
| - | < | ||
| - | < | ||
| - | </ | ||
| - | |||
| - | There will also be checksum files for each sample and the lost reads. | ||
| - | |||
| - | < | ||
| - | < | ||
| - | < | ||
| - | </ | ||
| - | |||
| - | External users of our service in particular should use the checksums to make sure your data files have copied to your local systems without corruption or truncation. See the section below for instructions on how to do this. | ||
| - | |||
| - | === Custom Indexing === | ||
| - | |||
| - | If your submission requested custom indexing, you will receive one set of files containing all your reads. The barcode will be " | ||
| - | |||
| - | < | ||
| - | < | ||
| - | < | ||
| - | < | ||
| - | </ | ||
| - | |||
| - | === No Indexing or Inline Indexing === | ||
| - | |||
| - | Similar to custom indexing, you will receive one set of files containing all your reads. The barcode will be " | ||
| - | |||
| - | < | ||
| - | < | ||
| - | < | ||
| - | </ | ||
| - | |||
| - | === 10x Sequencing === | ||
| - | |||
| - | As of November 2020 we supply 10x data as a set of FASTQ files in the same manner as any other regular multiplexed library type, for both single index (SI-GA and SI-NA) and dual index (TT/NT/TN) kits. If you have received your single cell data files as compressed FASTQ files, refer to the regular sequencing section. Also refer to the section on file name conversion if these files need to be renamed for 10x downstream pipelines. | ||
| - | |||
| - | The FASTQ files you receive for SI-GA and SI-NA indexing will have all four index sequences for each barcode combined in the set of FASTQ files for the sample. A quick run through a demultiplexer will confirm the presence of the four indexes at approximately 25% of reads for each one. | ||
| - | |||
| - | //The rest of this section refers to single cell sequencing before November 2020, when single cell data was delivered as a TAR file containing twelve or sixteen FASTQ files. If you have received a TAR file per sample, continue reading.// | ||
| - | |||
| - | With the single index SI-GA and SI-NA indexing (and indeed previous, now discontinued, | ||
| - | |||
| - | < | ||
| - | < | ||
| - | < | ||
| - | < | ||
| - | < | ||
| - | </ | ||
| - | |||
| - | Inside each archive there will be twelve or sixteen FASTQ files. These will be named as they were produced by // | ||
| - | |||
| - | For example, for a sample that was labelled with the 10x SINA H9 barcode, the tar archive will contain: | ||
| - | |||
| - | < | ||
| - | SIGAH9_S5_L001_I1_001.fastq.gz | ||
| - | SIGAH9_S5_L001_R1_001.fastq.gz | ||
| - | SIGAH9_S5_L001_R2_001.fastq.gz | ||
| - | SIGAH9_S6_L001_I1_001.fastq.gz | ||
| - | SIGAH9_S6_L001_R1_001.fastq.gz | ||
| - | SIGAH9_S6_L001_R2_001.fastq.gz | ||
| - | SIGAH9_S7_L001_I1_001.fastq.gz | ||
| - | SIGAH9_S7_L001_R1_001.fastq.gz | ||
| - | SIGAH9_S7_L001_R2_001.fastq.gz | ||
| - | SIGAH9_S8_L001_I1_001.fastq.gz | ||
| - | SIGAH9_S8_L001_R1_001.fastq.gz | ||
| - | SIGAH9_S8_L001_R2_001.fastq.gz | ||
| - | </ | ||
| - | |||
| - | The sample number (the second part of the file name, " | ||
| - | |||
| - | ==== Supporting Files ==== | ||
| - | |||
| - | There will be some additional files delivered with the FASTQ data. These hold some statistics and QC test results that may be useful. | ||
| - | |||
| - | - ''< | ||
| - | - ''< | ||
| - | |||
| - | ==== The QC Report ==== | ||
| - | |||
| - | We deliver with the data a report that we also use to ensure the sequencing has gone as expected. As of summer 2020, this is a MultiQC report containing the individual reports previously delivered separately. | ||
| - | |||
| - | - Our demuliplexing and barcode balance reports. These give charts and numbers for the reads created from the sequencing. If there are indexing problems they' | ||
| - | - Our single cell reports. These are only present for 10x single cell lanes and are produced per sample. | ||
| - | - [[https:// | ||
| - | - [[https:// | ||
| - | |||
| - | This report will be named ''< | ||
| - | |||
| - | ==== File Name Conversion ==== | ||
| - | |||
| - | Some tools require the FASTQ files to be named as they would be when delivered by // | ||
| - | |||
| - | It changes our file names to the pattern: | ||
| - | |||
| - | < | ||
| - | < | ||
| - | </ | ||
| - | |||
| - | **barcode** is the barcode as it appears in our file name; **number** is an arbitrary sample number that // | ||
| - | |||
| - | The tool can be run with the command: | ||
| - | |||
| - | < | ||
| - | python3 crukci_to_illumina.py [<fastq directory> | ||
| - | </ | ||
| - | |||
| - | If //fastq directory// is not given, the script will look at files in the current directory. It does not recurse into subdirectories. | ||
| - | |||
| - | ===== Retrieving Files ===== | ||
| There are two methods we employ to allow users of the sequencing service to retrieve their sequencing data. Persons working inside CRUK-CI should use our data download tool; everyone else must fetch their data from our FTP site. | There are two methods we employ to allow users of the sequencing service to retrieve their sequencing data. Persons working inside CRUK-CI should use our data download tool; everyone else must fetch their data from our FTP site. | ||
| Line 143: | Line 5: | ||
| ==== CRUK-CI Researchers ==== | ==== CRUK-CI Researchers ==== | ||
| - | We provide a tool for downloading files for projects, libraries and runs that you can use from the command line. The full user manual for the download tool can be found [[https:// | + | We provide a tool for downloading files for projects, libraries and runs that you can use from the command line. The full user manual for the download tool can be found [[https:// |
| ==== External Sequencing Service Users ==== | ==== External Sequencing Service Users ==== | ||