Code Structure¶
Following are some details regarding a part of SPID’s implementation. Hopefully this will give a good starting point to anyone who wishes to dive into the code.
FlowcellDemultiplexer¶
The flow cell demultiplexing starting point.
Initiates a LaneDemultiplexerProcess for each lane which requires demultiplexing
All lane demultiplexing processes are run in parallel. The processes are independent, however the output is written to the same directories. Collisions are avoided since created files contain the lane number.
LaneDemultiplexerProcess¶
Initiates a LaneDemultiplexer
LaneDemultiplexer¶
Demultiplexes a single lane.
Initiates multiple BatchDemultiplexerProcess processes and FlushProcess processes according to given parameters.
Coordinates between the different processes using a manager process which manages several shared objects:
input_files_queue¶
shared between demultiplexers so that each demultiplexer works on different input
output_buffers_queue¶
Shared between demultiplexers and flushers - demultiplexer fill the queue, flushers read from queue and write to output files. This is the highest memory-consuming shared object
output_locks¶
Shared between flushers to avoid simultaneous writing to the same files
num_sequences_per_sample_dict¶
Shared between flushers to make sure the number of reads written to each output file does not pass the maximal threshold
BatchDemultiplexerProcess¶
Polls input_files_queue until it is empty.
Creates an InputBatchDemultiplexer for an input files batch and demultiplexes it.
InputBatchDemultiplexer¶
Holds an output buffer for each sample.
Uses SingleFastqFileSetReader to read fragments from input FASTQ files.
Uses an instance of FragmentDemultiplexer to demultiplex each fragment.
Whenever an output buffer is full, it is placed in the shared output_buffers_queue
SingleFastqFileSetReader¶
Reads sequences from a single set of FASTQ files.
An example for such a set in case of paired-end run for example, can be:
lane3_NoIndex_L002_R1_001.fastq.gz
lane2_NoIndex_L002_R2_001.fastq.gz
lane2_NoIndex_L002_R3_001.fastq.gz
In this case, a sequence (or fragment) is constructed of 3 reads and therefore should be read from all 3 files
Uses threading to read all FASTQ files in parallel.
FragmentDemultiplexer¶
This class tries to determine to which of the samples a certain fragment belongs.
It uses pre-built tag trees which will be described in a different document.
FlushProcess¶
Polls output_buffers_queue until it is empty.
Retrieves an output buffer from queue
Obtains a shared lock for the buffer’s sample to insure a single flush per sample
Flushes the output