$$ \newcommand{\problemdivider}{\begin{center}\large \bf\ldots\ldots\ldots\ldots\ldots\ldots\end{center}} \newcommand{\subproblemdivider}{\begin{center}\large \bf\ldots\ldots\end{center}} \newcommand{\pdiv}{\problemdivider} \newcommand{\spdiv}{\subproblemdivider} \newcommand{\ba}{\begin{align*}} \newcommand{\ea}{\end{align*}} \newcommand{\rt}{\right} \newcommand{\lt}{\left} \newcommand{\bp}{\begin{problem}} \newcommand{\ep}{\end{problem}} \newcommand{\bsp}{\begin{subproblem}} \newcommand{\esp}{\end{subproblem}} \newcommand{\bssp}{\begin{subsubproblem}} \newcommand{\essp}{\end{subsubproblem}} \newcommand{\atag}[1]{\addtocounter{equation}{1}\label{#1}\tag{\arabic{section}.\alph{subsection}.\alph{equation}}} \newcommand{\btag}[1]{\addtocounter{equation}{1}\label{#1}\tag{\arabic{section}.\alph{equation}}} \newcommand{\ctag}[1]{\addtocounter{equation}{1}\label{#1}\tag{\arabic{equation}}} \newcommand{\dtag}[1]{\addtocounter{equation}{1}\label{#1}\tag{\Alph{chapter}.\arabic{section}.\arabic{equation}}} \newcommand{\unts}[1]{\ \text{#1}} \newcommand{\textop}[1]{\operatorname{#1}} \newcommand{\textopl}[1]{\operatornamewithlimits{#1}} \newcommand{\prt}{\partial} \newcommand{\pderi}[3]{\frac{\prt^{#3}#1}{\prt #2^{#3}}} \newcommand{\deri}[3]{\frac{d^{#3}#1}{d #2^{#3}}} \newcommand{\del}{\vec\nabla} \newcommand{\exval}[1]{\langle #1\rangle} \newcommand{\bra}[1]{\langle #1|} \newcommand{\ket}[1]{|#1\rangle} \newcommand{\ham}{\mathcal{H}} \newcommand{\arr}{\mathfrak{r}} \newcommand{\conv}{\mathop{\scalebox{2}{\raisebox{-0.2ex}{$\ast$}}}} \newcommand{\bsm}{\lt(\begin{smallmatrix}} \newcommand{\esm}{\end{smallmatrix}\rt)} \newcommand{\bpm}{\begin{pmatrix}} \newcommand{\epm}{\end{pmatrix}} \newcommand{\bdet}{\lt|\begin{smallmatrix}} \newcommand{\edet}{\end{smallmatrix}\rt|} \newcommand{\bs}[1]{\boldsymbol{#1}} \newcommand{\uvec}[1]{\bs{\hat{#1}}} \newcommand{\qed}{\hfill$\Box$} $$
Tags:
  • genomics
  • Zero initialize bam1_t memory!

    htslib expects bam1_t to be initially zeroed, e.g.

    // on the stack
    bam1_t badbamrecord; // any attempts to use this will likely crash!
    bam1_t goodbamrecord = {0}; // this works.
    
    // on the heap
    bam1_t* badbamrecord_ptr = malloc(sizeof(bam1_t)); // will likely fail
    bam1_t* goodbamrecord_ptr = calloc(1, sizeof(bam1_t)); // good.
    

    Iterator weirdness

    When iterating over a file sans index, i.e. by invoking sam_itr_queryi with idx == NULL and tid == HTS_IDX_REST, you need to access the header via sam_hdr_read (even if nothing is being done with it) in order to advance the file pointer up to the first read:

    htsFile* bam = hts_open(<path/to/bam>, "r");
    hts_itr_t* bamiter = sam_itr_queryi(NULL, HTS_IDX_REST, 0, 0);
    
    bam1_t bamrecord = {0};
    
    //sam_hdr_read(bam); // uncomment for iterator to work
    
    while(sam_itr_next(bam, bamiter, &bamrecord) >= 0) {
       // process read
    }
    

    If the header is not read, then the first call to sam_itr_next will try unpacking a read from the start of the file, which will fail.

    Output weirdness

    To output an uncompressed BAM, you cannot use

    htsFile* bam_out = hts_open("-", "wbu");
    

    as you might expect; this will segfault. You must instead use

    htsFile* bam_out = hts_open("-", "wb0");
    

    This is because a BAM must still be in block GZIP format, even if no compression is applied.

    Note that specifying output "wu" will output a plaintext SAM file.