============== Code Structure ============== Following are some details regarding a part of SPID's implementation. Hopefully this will give a good starting point to anyone who wishes to dive into the code. FlowcellDemultiplexer --------------------- The flow cell demultiplexing starting point. Initiates a `LaneDemultiplexerProcess`_ for each lane which requires demultiplexing All lane demultiplexing processes are run in parallel. The processes are independent, however the output is written to the same directories. Collisions are avoided since created files contain the lane number. LaneDemultiplexerProcess ------------------------ Initiates a LaneDemultiplexer LaneDemultiplexer ----------------- Demultiplexes a single lane. Initiates multiple `BatchDemultiplexerProcess`_ processes and `FlushProcess`_ processes according to given parameters. Coordinates between the different processes using a manager process which manages several shared objects: input_files_queue ~~~~~~~~~~~~~~~~~ shared between demultiplexers so that each demultiplexer works on different input output_buffers_queue ~~~~~~~~~~~~~~~~~~~~ Shared between demultiplexers and flushers - demultiplexer fill the queue, flushers read from queue and write to output files. This is the highest memory-consuming shared object output_locks ~~~~~~~~~~~~ Shared between flushers to avoid simultaneous writing to the same files num_sequences_per_sample_dict ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Shared between flushers to make sure the number of reads written to each output file does not pass the maximal threshold BatchDemultiplexerProcess ------------------------- Polls input_files_queue until it is empty. Creates an InputBatchDemultiplexer for an input files batch and demultiplexes it. InputBatchDemultiplexer ----------------------- Holds an output buffer for each sample. Uses `SingleFastqFileSetReader`_ to read fragments from input FASTQ files. Uses an instance of `FragmentDemultiplexer`_ to demultiplex each fragment. Whenever an output buffer is full, it is placed in the shared `output_buffers_queue`_ SingleFastqFileSetReader ------------------------ Reads sequences from a single set of FASTQ files. An example for such a set in case of paired-end run for example, can be:: lane3_NoIndex_L002_R1_001.fastq.gz lane2_NoIndex_L002_R2_001.fastq.gz lane2_NoIndex_L002_R3_001.fastq.gz In this case, a sequence (or fragment) is constructed of 3 reads and therefore should be read from all 3 files Uses threading to read all FASTQ files in parallel. FragmentDemultiplexer --------------------- This class tries to determine to which of the samples a certain fragment belongs. It uses pre-built tag trees which will be described in a different document. FlushProcess ------------ Polls output_buffers_queue until it is empty. Retrieves an output buffer from queue Obtains a shared lock for the buffer's sample to insure a single flush per sample Flushes the output