There are two methods we employ to allow users of the sequencing service to retrieve their sequencing data. Persons working inside CRUK-CI should use our data download tool; everyone else must fetch their data from our FTP site.
We provide a tool for downloading files for projects, libraries and runs that you can use from the command line. The full user manual for the download tool can be found on this internal web page. Please visit the page to download the tool, find instructions for how to use it and also how to install Java on your personal machine (link requires you to be in the building or running the VPN).
Outside of CRUK-CI, data is delivered through the FTP site: ftp1.cruk.cam.ac.uk.
This is an FTP site running the FTP protocol with TLS encryption (sometimes known as FTPS). You should be able to connect to the site using any up to date FTP client you choose. Your group will have been provided a user name and password for accessing the site when your group arranged to use the CRUK-CI sequencing service. Your data will be in a private region of the server only accessible with your group's credentials. The site is read only.
Files are available on the FTP site for a guaranteed thirty (30) days after sequencing. You MUST fetch the files in this time period. The files will be removed from the FTP site once this time has elapsed.
It is your responsibility thereafter to look after the files and store them as you see fit. We do not hold external groups' data in the CRUK-CI archives.
It is possible that the files do not transfer correctly from the FTP site on the first go. We've never really known of corruption during a transfer but truncated files do happen, where the connection is dropped before the transfer is complete. External users of the service must check the transfer is accurate after taking a copy. We provide checksums for the FASTQ files, and one can make sure that the local copy of the files match the original by running the md5sum command once the download has finished.
The checksum files have the suffix “.md5sums.txt” and can be used to check the FASTQ by running:
md5sum -c *.md5sums.txt
Any files that are not reported as “OK” should be downloaded again and rechecked.
There are many FTP clients available on the web one can use to fetch files from our FTP site. We officially support two of them:
There are others you can use, though we have not properly tested them:
On Linux, you might find that these programs are available through the platform's package management system.
We have become aware of some users using the Mac's Finder application to connect to the FTP server and copy the files. While convenient, it appears that Finder can silently truncate files while copying if the connection to the FTP server drops. Thus we do not recommend using Finder or Windows Explorer to copy the files: use a proper FTP program that will report errors. Above all, and regardless of the program used, you must check your files against the checksums after downloading as described above.
Most FTP clients, and certainly lftp and FileZilla, handle the TLS encryption automatically. Other clients may need you to specifically tell it to use an encrypted connection.
Occasionally people tell us they cannot connect, and so far this has always been problems at the client end. Here are some things to check.
FTPS is not the same as SFTP. The former is the FTP protocol with encryption, the latter is file transfer over the secure shell protocol. You cannot use sftp or scp with our FTP server.FTPS or SSL encryption is turned on in your client if it has explicit options for this.ping ftp1.cruk.cam.ac.uk” from the command line. The server will echo back a reply if your pings are getting through. If ping says the packets are not being returned, please check with your IT department to check network connectivity. The problems have never yet been at the CRUK-CI end; if our FTP site does need to be taken out of commission for a while we will let all our collaborators know beforehand.ithelpdesk@cruk.cam.ac.uk and cc Genomics Core-Genomics-Staff@cruk.cam.ac.uk. Note this is a different address to the usual Genomics help desk and should only be used for queries about connectivity problems to our FTP server; all other queries need to go to the Genomics help desk as normal.