Sort different amplicons into a fully stratified samples x amplicons structure based on primer matches.

sortAmplicons(
  MA,
  filedir = "stratified_files",
  n = 1e+06,
  countOnly = FALSE,
  rmPrimer = TRUE,
  ...
)

# S4 method for MultiAmplicon
sortAmplicons(
  MA,
  filedir = "stratified_files",
  n = 1e+06,
  countOnly = FALSE,
  rmPrimer = TRUE,
  ...
)

Arguments

MA

MultiAmplicon-class object containing a set of paired end files and a primer-pairs set.

filedir

path to an existing or newly to be created folder on your computer. If existing it has to be empty.

n

parameter passed to the yield functions of package ShortRead. This controls the memory consumption during streaming. Lower values result in lower memory requirements but might result longer processing time due to more repeated I/O operations reading the sequence files.

countOnly

logical argument if set TRUE only a matrix of read counts is returned

rmPrimer

logical, indicating whether primer sequences should be removed during sorting

...

additional parameter so be passed to Biostrings::isMatchingStartingAt. Be careful when using multiple starting positions or allowing error. This could lead to read pairs being assigned to multiple amplicons.

Value

MultiAmplicon: By default (countOnly=FALSE) a MultiAmplicon-class object is returned with the stratifiedFiles slot populated. Stratified file names are constructed using a unique string created by tempfile and stored in the given filedir (by default R's tempdir). If the countOnly is set only a numeric matrix of read counts is returned.

Details

This function uses isMatchingStartingAt to match primer sequences at the first position of forward and reverse sequences. These primer sequences can be removed. The remaining sequences of interest are written to files to allow processing via standard metabarcoding pipelines.

Author

Emanuel Heitlinger

Examples

primerF <- c("AGAGTTTGATCCTGGCTCAG", "ACTCCTACGGGAGGCAGC", "GAATTGACGGAAGGGCACC", "YGGTGRTGCATGGCCGYT") primerR <- c("CTGCWGCCNCCCGTAGG", "GACTACHVGGGTATCTAATCC", "AAGGGCATCACAGACCTGTTAT", "TCCTTCTGCAGGTTCACCTAC") PPS <- PrimerPairsSet(primerF, primerR) fastq.dir <- system.file("extdata", "fastq", package = "MultiAmplicon") fastq.files <- list.files(fastq.dir, full.names=TRUE) Ffastq.file <- fastq.files[grepl("F_filt", fastq.files)] Rfastq.file <- fastq.files[grepl("R_filt", fastq.files)] PRF <- PairedReadFileSet(Ffastq.file, Rfastq.file) MA <- MultiAmplicon(PPS, PRF)
#> Error in sample_names(object@sampleData): could not find function "sample_names"
## sort into amplicons MA1 <- sortAmplicons(MA)
#> Error in h(simpleError(msg, call)): error in evaluating the argument 'MA' in selecting a method for function 'sortAmplicons': object 'MA' not found