The central data structure of the MultiAmplicon package

The MultiAmplicon class is a container that stores at least primer pairs, read files and progressively processed data in an 'amplicon x samples' format. The slots in this object are incrementally filled with by running wrappers functions (mostly around functions from the dada2 package). The object is treated (subsetted etc.) like a (pseudo) matrix, colums are samples, rows are different amplicons.

Subsetting for MultiAmplicon objects should conveniently subset all (potentially) filled slots

MultiAmplicon(
  PrimerPairsSet = PrimerPairsSet(),
  PairedReadFileSet = PairedReadFileSet(),
  .Data = matrix(ncol = 0, nrow = 0),
  stratifiedFiles = list(),
  sampleData = new("sample_data", data.frame(row.names = names(PairedReadFileSet),
    readsF = PairedReadFileSet@readsF, readsR = PairedReadFileSet@readsF)),
  derep = list(),
  dada = list(),
  mergers = list(),
  sequenceTable = list(),
  sequenceTableNoChime = list(),
  taxonTable = list()
)

getPrimerPairsSet(MA)

getPairedReadFileSet(MA)

getRawCounts(MA)

getStratifiedFilesF(MA, simplify = TRUE)

getStratifiedFilesR(MA, simplify = TRUE)

getDerepF(MA, simplify = TRUE)

getDerepR(MA, simplify = TRUE)

getDadaF(MA, simplify = TRUE)

getDadaR(MA, simplify = TRUE)

getMergers(MA, simplify = TRUE)

getSequenceTable(MA, simplify = TRUE)

getSequenceTableNoChime(MA, simplify = TRUE)

getTaxonTable(MA, simplify = TRUE)

getSequencesFromTable(MA)

# S4 method for MultiAmplicon
getSequencesFromTable(MA)

# S4 method for MultiAmplicon
show(object)

# S4 method for MultiAmplicon,index,index,ANY
[(x, i, j, ..., drop = FALSE)

# S4 method for MultiAmplicon,index,missing,ANY
[(x, i, j, ..., drop = FALSE)

# S4 method for MultiAmplicon,missing,index,ANY
[(x, i, j, ..., drop = FALSE)

Arguments

PrimerPairsSet	a set of primer pairs specifiying your amplicons see `PrimerPairsSet-class`
PairedReadFileSet	a set of paired end sequencing data files `PairedReadFileSet-class`
.Data	Users should not supply this parameter, the slot is created by `sortAmplicons`.
stratifiedFiles	Users should not supply this parameter, the slot is created by `sortAmplicons`.
sampleData	Users should not supply this parameter. It's filled with a sample_data object from `phyloseq`. The slot is created from sample names (same as `colnames(MA)`) and more data can be added by `addSampleData`.
derep	Users should not supply this parameter, the slot is created by `derepMulti`
dada	Users should not supply this parameter, the slot is created by `dadaMulti`
mergers	Users should not supply this parameter, the slot is created by `mergeMulti`
sequenceTable	Users should not supply this parameter, the slot is created by `makeSequenceTableMulti`
sequenceTableNoChime	Users should not supply this parameter, the slot is created by `removeChimeraMulti`
taxonTable	Users should not supply this parameter, the slot is created by `blastTaxAnnot`. It's filled with a list of taxonomyTable objects from `phyloseq`.
MA	MultiAmplicon-class object
simplify	Should a list of objects be simplified to only one object if it has length one?
object	A `MultiAmplicon-class` object.
x	MultiAmplicon-class object
i	numeric, logical or names vector for subsetting rows (== amplicons)
j	numeric, logical or names vector for subsetting columns (== read files, corresponding usually to samples)
...	not used
drop	should not be used

Functions

MultiAmplicon: Constructor for MultiAmplicon-class

Slots

PrimerPairsSet: The primer pairs used in your experiment to specify amplicons stored in a PrimerPairsSet-class object.
PairedReadFileSet: The (quality filtered) fastq files (one file pair for each sample) that store your sequencing data.
.Data: A numeric matrix of sequencing read counts per amplicon and sample. Created by the function sortAmplicons in the MultiAmplicon pipeline.
sampleData: A sample_data object from phyloseq. The slot is created from sample names (names of the PrimerPairsSet, which have tto be the same as colnames(MA)). More data can be added by addSampleData.
stratifiedFiles: temporary files as a result of stratifying into amplicons and samples using the MultiAmplicon pipeline function sortAmplicons. Forward (sometimes called R1) and reverse (sometimes called R2) files are stored as a (amplicons x samples) matrix of PairedReadFileSet-class objects.
derep: A list of PairedDerep-class objects containing pairs of derep-class objects created by dada2’s derepFastq function or withing the MultiAmplicon pipeline by derepMulti.
dada: A list of PairedDada-class object containing pairs of dada-class objects created by dada2’s dada function. Within the MultiAmplicon pipeline this slot is filled by dadaMulti.
mergers: A list of objects containing merged pairs of forward and reverse reads as created by by dada2’s mergePairs function. Within the MultiAmplicon pipeline this slot is filled by mergeMulti.
sequenceTable: A list of matrix objects created by dada2’s makeSequenceTable. Samples (in rows) and amplified sequence variants (ASVs) in columns. Within the MultiAmplicon pipeline this slot is filled by makeSequenceTableMulti.
sequenceTableNoChime: A list of matrix objects created by dada2’s removeBimeraDenovo. Samples (in rows) and ASVs screened for PCR chimeras in columns. Within the MultiAmplicon pipeline this slot is filled by removeChimeraMulti.
taxonTable: A list of matrix objects created by a function for taxonomical annotation (for example blastTaxAnnot. ASVs are in rows and taxnomical ranks are in columns. MultiAmplicon(PrimerPairsSet, PairedReadFileSet)

Author

Emanuel Heitlinger

Examples


primerF <- c("AGAGTTTGATCCTGGCTCAG", "ACTCCTACGGGAGGCAGC",
            "GAATTGACGGAAGGGCACC", "YGGTGRTGCATGGCCGYT")
primerR <- c("CTGCWGCCNCCCGTAGG", "GACTACHVGGGTATCTAATCC",
             "AAGGGCATCACAGACCTGTTAT", "TCCTTCTGCAGGTTCACCTAC")

PPS <- PrimerPairsSet(primerF, primerR)

fastq.dir <- system.file("extdata", "fastq", package = "MultiAmplicon")
fastq.files <- list.files(fastq.dir, full.names=TRUE)
Ffastq.file <- fastq.files[grepl("F_filt", fastq.files)]
Rfastq.file <- fastq.files[grepl("R_filt", fastq.files)]

PRF <- PairedReadFileSet(Ffastq.file, Rfastq.file)

MA <- MultiAmplicon(PPS, PRF)

## sort into amplicons
MA1 <- sortAmplicons(MA, filedir=tempfile(pattern = "dir"))
#> creating directory /tmp/RtmpTPTbQy/dirfd1a1657d6ac
#> finished sorting 0 sequencing reads for S00_F_filt.fastq.gz in 
#>  /localstorage/ele/git_projects/MultiAmplicon/inst/extdata/fastq/S00_F_filt.fastq.gz and 
#>   /localstorage/ele/git_projects/MultiAmplicon/inst/extdata/fastq/S00_R_filt.fastq.gz 
#>  into 0  amplicons and written into /tmp/RtmpTPTbQy/dirfd1a1657d6ac
#> finished sorting 0 sequencing reads for S01_F_filt.fastq.gz in 
#>  /localstorage/ele/git_projects/MultiAmplicon/inst/extdata/fastq/S01_F_filt.fastq.gz and 
#>   /localstorage/ele/git_projects/MultiAmplicon/inst/extdata/fastq/S01_R_filt.fastq.gz 
#>  into 0  amplicons and written into /tmp/RtmpTPTbQy/dirfd1a1657d6ac
#> finished sorting 9348 sequencing reads for S02_F_filt.fastq.gz in 
#>  /localstorage/ele/git_projects/MultiAmplicon/inst/extdata/fastq/S02_F_filt.fastq.gz and 
#>   /localstorage/ele/git_projects/MultiAmplicon/inst/extdata/fastq/S02_R_filt.fastq.gz 
#>  into 4  amplicons and written into /tmp/RtmpTPTbQy/dirfd1a1657d6ac
#> finished sorting 5 sequencing reads for S03_F_filt.fastq.gz in 
#>  /localstorage/ele/git_projects/MultiAmplicon/inst/extdata/fastq/S03_F_filt.fastq.gz and 
#>   /localstorage/ele/git_projects/MultiAmplicon/inst/extdata/fastq/S03_R_filt.fastq.gz 
#>  into 1  amplicons and written into /tmp/RtmpTPTbQy/dirfd1a1657d6ac
#> finished sorting 14 sequencing reads for S04_F_filt.fastq.gz in 
#>  /localstorage/ele/git_projects/MultiAmplicon/inst/extdata/fastq/S04_F_filt.fastq.gz and 
#>   /localstorage/ele/git_projects/MultiAmplicon/inst/extdata/fastq/S04_R_filt.fastq.gz 
#>  into 3  amplicons and written into /tmp/RtmpTPTbQy/dirfd1a1657d6ac
#> finished sorting 51777 sequencing reads for S05_F_filt.fastq.gz in 
#>  /localstorage/ele/git_projects/MultiAmplicon/inst/extdata/fastq/S05_F_filt.fastq.gz and 
#>   /localstorage/ele/git_projects/MultiAmplicon/inst/extdata/fastq/S05_R_filt.fastq.gz 
#>  into 3  amplicons and written into /tmp/RtmpTPTbQy/dirfd1a1657d6ac
#> finished sorting 51777 sequencing reads for S05D_F_filt.fastq.gz in 
#>  /localstorage/ele/git_projects/MultiAmplicon/inst/extdata/fastq/S05D_F_filt.fastq.gz and 
#>   /localstorage/ele/git_projects/MultiAmplicon/inst/extdata/fastq/S05D_R_filt.fastq.gz 
#>  into 3  amplicons and written into /tmp/RtmpTPTbQy/dirfd1a1657d6ac
#> finished sorting 17358 sequencing reads for S06_F_filt.fastq.gz in 
#>  /localstorage/ele/git_projects/MultiAmplicon/inst/extdata/fastq/S06_F_filt.fastq.gz and 
#>   /localstorage/ele/git_projects/MultiAmplicon/inst/extdata/fastq/S06_R_filt.fastq.gz 
#>  into 4  amplicons and written into /tmp/RtmpTPTbQy/dirfd1a1657d6ac

## Only after sorting the MultiAmplicon object is really poplated
## with sensible data, now matrix-like access to different 
## amplicons (primer pairs) and different sequencing read files
## (usually samples) is implemented

## the number of amplicons (primer pairs)
nrow(MA)
#> [1] 0

## the number of samples (sequencing read file pairs)
ncol(MA)
#> [1] 0

## dereplication is currently not supported
## MA2 <- derepMulti(MA1)

### use dada directly after sorting
MA3 <- dadaMulti(MA1, selfConsist = TRUE)
#> 
#> 
#> amplicon AGAGTTTGATCCTGGCTCAG.CTGCWGCCNCCCGTAGG: dada estimation of sequence variants from 6 of 8 possible sample files
#> calling dada with selfConsist=TRUE parameters
#> Initializing error rates to maximum possible estimate.
#> selfConsist step 1 ......
#>    selfConsist step 2
#>    selfConsist step 3
#>    selfConsist step 4
#>    selfConsist step 5
#>    selfConsist step 6
#> Convergence after  6  rounds.
#> Initializing error rates to maximum possible estimate.
#> selfConsist step 1 ......
#>    selfConsist step 2
#>    selfConsist step 3
#>    selfConsist step 4
#>    selfConsist step 5
#> Convergence after  5  rounds.
#> 
#> 
#> amplicon ACTCCTACGGGAGGCAGC.GACTACHVGGGTATCTAATCC: dada estimation of sequence variants from 5 of 8 possible sample files
#> calling dada with selfConsist=TRUE parameters
#> Initializing error rates to maximum possible estimate.
#> selfConsist step 1 .....
#>    selfConsist step 2
#>    selfConsist step 3
#>    selfConsist step 4
#>    selfConsist step 5
#>    selfConsist step 6
#> Convergence after  6  rounds.
#> Initializing error rates to maximum possible estimate.
#> selfConsist step 1 .....
#>    selfConsist step 2
#>    selfConsist step 3
#>    selfConsist step 4
#>    selfConsist step 5
#> Convergence after  5  rounds.
#> 
#> 
#> amplicon GAATTGACGGAAGGGCACC.AAGGGCATCACAGACCTGTTAT: dada estimation of sequence variants from 2 of 8 possible sample files
#> calling dada with selfConsist=TRUE parameters
#> Initializing error rates to maximum possible estimate.
#> selfConsist step 1 ..
#>    selfConsist step 2
#>    selfConsist step 3
#>    selfConsist step 4
#>    selfConsist step 5
#> Convergence after  5  rounds.
#> Initializing error rates to maximum possible estimate.
#> selfConsist step 1 ..
#>    selfConsist step 2
#>    selfConsist step 3
#>    selfConsist step 4
#>    selfConsist step 5
#> Convergence after  5  rounds.
#> 
#> 
#> amplicon YGGTGRTGCATGGCCGYT.TCCTTCTGCAGGTTCACCTAC: dada estimation of sequence variants from 5 of 8 possible sample files
#> calling dada with selfConsist=TRUE parameters
#> Initializing error rates to maximum possible estimate.
#> selfConsist step 1 .....
#>    selfConsist step 2
#>    selfConsist step 3
#>    selfConsist step 4
#>    selfConsist step 5
#> Convergence after  5  rounds.
#> Initializing error rates to maximum possible estimate.
#> selfConsist step 1 .....
#>    selfConsist step 2
#>    selfConsist step 3
#>    selfConsist step 4
#> Convergence after  4  rounds.

MA4 <- mergeMulti(MA3, justConcatenate=TRUE)
#> 
#> merging sequences from 6 samples for amplicon AGAGTTTGATCCTGGCTCAG.CTGCWGCCNCCCGTAGG
#> calling mergePairs with justConcatenate=TRUE parameters
#> 
#> merging sequences from 5 samples for amplicon ACTCCTACGGGAGGCAGC.GACTACHVGGGTATCTAATCC
#> calling mergePairs with justConcatenate=TRUE parameters
#> 
#> merging sequences from 2 samples for amplicon GAATTGACGGAAGGGCACC.AAGGGCATCACAGACCTGTTAT
#> calling mergePairs with justConcatenate=TRUE parameters
#> 
#> merging sequences from 5 samples for amplicon YGGTGRTGCATGGCCGYT.TCCTTCTGCAGGTTCACCTAC
#> calling mergePairs with justConcatenate=TRUE parameters

MA5 <- makeSequenceTableMulti(MA4)
#> calling makeSequenceTable with default parameters
#> calling makeSequenceTable with default parameters
#> calling makeSequenceTable with default parameters
#> calling makeSequenceTable with default parameters

MA6 <- removeChimeraMulti(MA5, mc.cores=1)
#> calling removeBimeraDenovo with default parameters
#> calling removeBimeraDenovo with default parameters
#> calling removeBimeraDenovo with default parameters
#> calling removeBimeraDenovo with default parameters

Arguments

Functions

Slots

See also

Author

Examples