Pipeline

Pre-requisite softwares

To run the pipeline successfully one needs to install/download several third-party tools.

Tools

Before going into the installation procedure. We should go over the actual input format. The input folder contains the raw sequencing reads.

The operation starts from the BASECALLS_DIR, that contains the actual intensities and the bcl files. For example the basic structure of such file will be

➜slide-seq-pipeline git:(main) ls /home/hsarkar/code/slide-seq-data/P25255/220329_A00689_0513_BHYK23DRXY/
Config            InterOp                 Recipe           RunInfo.xml        SequenceComplete.txt
CopyComplete.txt  Logs                    RTA3.cfg         runParameters.xml  Thumbnail_Images
Data              P25255_SampleSheet.txt  RTAComplete.txt  RunParameters.xml

The bcl files are present in Data/Intensities/BaseCalls/. The RunInfo.xml used to be the file that has the information that is to be parsed before writing down the information.

Use the existing package so that we can import existing structures from the slideseq-tools.

Install existing slideseq-tools package

>git clone https://github.com/MacoskoLab/slideseq-tools.git

Create a spread-sheet that contains some of the following fields that we need to feel. An example spread-sheet is provided in spreadsheet folder of the repo.

Index(['library', 'date', 'flowcell', 'run_name', 'bclpath', 'lane',
'sample_barcode', 'bead_structure', 'reference', 'run_barcodematching',
'locus_function_list', 'start_sequence', 'base_quality',
'min_transcripts_per_cell', 'email', 'puckcaller_path', 'bead_type',
'gen_read1_plot', 'gen_downsampling'],
dtype='object')

Desctiption

library: Name of the library (Each puck will have a specic name) puckcaller_path: A location containing the files from the image analysis pipeline, BeasBarcodes.txt and BeadLocations.txt run_name: Depends on how many different runs we are having BCLPath: Actual location to the BCL files sample_barcod: It’s a 8bp barcode that can be obtained from RunInfo.xml file with the column name index (I am not sure about this yet) reference: The actual reference file I used GRCh38.fasta bead_structure: Determines the actual length of the cell barcode and the corresponding UMI

Cosntruction

The construction of the dataframe with the help of the RunInfo.xml is depicted in the next embedded notebook.