Functional Genomics Centre Submissions

Sharing Sequencing Data Across the Collaboration

There are some customisations made for the Cancer Research UK (CRUK)/Astra Zeneca (AZ) Functional Genomics Centre (FGC) with regard to data sharing. We have built into the submission system the ability for members of the FGC from either side of the collaboration to share their sequencing data with colleagues in the other. AZ members therefore can choose to share their sequencing data with CRUK members, and CRUK members can choose to share with AZ members.

Sharing is done at the (Clarity) project level. At the start of the submission process there is a link to a page to create a new project. For FGC members only, this page has a check box that says “Share with Astra Zeneca” (CRUK users) or “Share with Cancer Research UK” (AZ users). Checking the box will cause the sequencing data files to be uploaded to both FGC FTP accounts. Leaving the box unchecked will result in the data files being uploaded to only the FTP account appropriate for the submitter.

There is a chance to change the sharing of the project during each submission. Once the submission form has been submitted and found to have no errors, the confirmation page now has a check box very similar to that on the project creation page. You may decide to change the sharing status of the project at this point.

One cannot choose to share or not share at the submission level. The choice is at the project level, so when making a submission do pay attention to the project. It may be worth including an agreed term in the project name that indicates a project is shared or not, but neither the submission system nor Clarity implement any kind of control of this.

Changing the sharing status of a project only affects sequencing data yet to be produced. It does not share data files already uploaded to the FTP site for past runs, nor does choosing to now not share remove files already on the FTP site. The sharing state is examined during data production and is not tied to the sharing state at the time of submission. Thus if a submission is made to a project without sharing, but between submission and sequencing the sharing option of the project is turned on, the submission predating the change will also be published to both FTP accounts. Likewise the reverse: if a submission is made with sharing but between submission and sequencing the sharing option is turned off, the earlier submission will only be uploaded to the FTP account of the submitter's group. Therefore it is advised to choose which Clarity projects are shared and which are not at the outset, and submit accordingly.

Data retrieval is via ftp3.cruk.cam.ac.uk using the credentials provided. ftp3 is an encrypted FTP server, so remember to add the options to your FTP client required for transferring using the FTPS protocol (not SFTP, which is completely different). For lftp, this page gives some information. Other FTP clients are available of course: please refer to their instructions for enabling FTPS.

For those who are interested, this blog article and this article discuss the differences between the FTP protocols.