Skip to main content


Accelerating Bioinformatics with NVBIO

NVBIO is an open-source C++ template library of high performance parallel algorithms and containers designed by NVIDIA to accelerate sequence analysis and bioinformatics applications. NVBIO has a threefold focus:
  1. Performance, providing a suite of state-of-the-art parallel algorithms that offer a significant leap in performance;
  2. Reusability, providing a suite of highly expressive and flexible template algorithms that can be easily configured and adjusted to the many different usage scenarios typical in bioinformatics;
  3. Portability, providing a completely cross-platform suite of tools, that can be easily switched from running on NVIDIA GPUs to multi-core CPUs by changing a single template parameter.

Exponential Parallelism

We built NVBIO because we believe only the exponentially increasing parallelism of many-core GPU architectures can provide the immense computational capability required by the exponentially increasing sequencing throughput.
There is a common misconception that GPUs only excel at highly regular, floating point intensive applications, but today’s GPUs are fully programmable parallel processors, offering superior memory bandwidth and latency hiding characteristics, and R&D efforts at NVIDIA and elsewhere have proved that they can be a perfect match even for branchy, integer-heavy bioinformatics applications. The caveat is that legacy applications need to be rethought for fine-grained parallelism.
Many CPU algorithms are designed to run on few cores and scale to a tiny number of threads. When the number of threads is measured in the thousands, rather than dozens—a fact that all applications inevitably must consider—applications must tackle fundamental problems related to load balancing, synchronization, and execution and memory divergence.
NVBIO does just that, providing both low-level primitives that can be used from either CPU/host or GPU/device threads, as well as novel, highly parallel high-level primitives designed to scale from the ground up.

Broad Feature Spectrum

NVBIO offers a fairly broad suite of features; here is a short summary of some of the most relevant ones.
  • Wide support for many string and string-set representations, including packed encodings for arbitrary alphabets (e.g. DNA, DNA+N, IUPAC, Protein, ASCII, etc).
  • Parallel FM-index and Bidirectional FM-index construction and lookup for arbitrary string and string-sets.
  • Novel, state-of-the-art parallel suffix sorting and BWT construction algorithms for very large strings and string sets, containing hundreds of billions of symbols. (Also see a paper on our parallel BWT algorithm.)
  • Wavelet Trees and other succint data structures of large alphabet strings.
  • Efficient parallel k-Maximal Extension Match filters
  • Parallel Bloom Filter construction and lookup.
  • State-of-the-art Q-Gram Index (also known as kmer index) construction and lookup.
  • A state-of-the-art Dynamic Programming Alignment module, containing Edit-Distance, Smith-Waterman and Gotoh aligners with support for global, semi-global and local alignment in both banded and full matrices, on arbitrary string encodings.
  • A high performance I/O module supporting common sequencing formats (e.g. FASTA, FASTQ, BAM) with transparent, parallel compression and decompression.

Expressive Performance Portability

Overall, the three guiding principles mentioned in the introduction could be probably summarized in a single phrase: expressive performance portability. Unlike other libraries which focus on solving a single specific instance of a problem on predetermined data types, NVBIO embraces the STL paradigm [as does Thrust] and offers fully configurable algorithms and containers, which are adapted and optimized at compile time for the required user-defined types and the backend architecture.
For example, consider the problem of finding the optimal alignment of two strings. Many existing libraries target different kinds of dynamic programming algorithms; for example, SSW for Smith-Waterman alignment of ASCII strings on the CPU, or CUDASW++ for the GPU. But most of these libraries tackle the solution of a single, specific problem, on a single platform, and are not very flexible.
However, in bioinformatics more than in other fields, developers use many variants of similar algorithms in specific situations. For example, one application might require use of the Gotoh Smith-Waterman scoring with affine gap penalties to compute the global aligment score of two ASCII encoded protein strings, while a short read mapper might want to use the fast Myers bit-vector algorithm for computing the edit-distance between two DNA strings packed with two bits per characters, while yet another application might need to perform banded, local alignment of four-bit packed DNA+N strings with the plain Smith-Waterman scoring.
Predetermining and hardwiring all possibilities would be essentially impossible: as the following table shows it would require 3 \times 3 \times 2 \times 3 \times 3 \times 2 = 108 almost equivalent code paths!
AlignerAlignment TypeMatrix TypePattern TypeText TypeBackend
NVBIO takes a different path, enabling the user to quickly select the most appropriate algorithms and containers through simple template specialization, as the following example code shows.

using namespace nvbio;

void alignment_test()
    const char* text = "AAAAGGGTGCTCAA";
    const char* pattern = "GGGTGCTCAA";

    // our DNA alphabet size, in bits
    const uint32 ALPHABET_SIZE = AlphabetTraits<DNA>::SYMBOL_SIZE;

    // instantiate two DNA packed host vectors
    PackedVector<host_tag, ALPHABET_SIZE> dna_text( length(text) );
    PackedVector<host_tag, ALPHABET_SIZE> dna_pattern( length(pattern) );

    // and fill it in with the contents of our ASCII strings, converted
    // to a 2-bit DNA alphabet (i.e. A = 0, C = 1, G = 2, T = 3)
    assign(length(text), from_string<DNA>(text), dna_text.begin());
    assign(length(pattern), from_string<DNA>(pattern), 

    // perform a banded edit-distance, semi-global alignment
    const uint32 BAND_WIDTH = 5;
    const int32 ed = aln::banded_alignment_score<BAND_WIDTH>(
                                 // build an edit-distance aligner
        dna_pattern,             // pattern string
        dna_text,                // text string
        -255 );                  // minimum accepted score

   // build a Gotoh scoring scheme
   const aln::SimpleGotohScheme scoring(
      2,   // match bonus
     -1,   // mismatch penalty
     -1,   // gap open penalty
     -1 )  // gap extension penalty

   // perform a full-matrix, local alignment
   aln::Best2Sink best2;   // keep the best 2 scores
                                        // build a local Gotoh aligner
      dna_pattern,                      // pattern string
      dna_text,                         // text string
      -255,                             // minimum accepted score
      best2 );                          // alignment sink
Notice that while the alignment functions in this case have been invoked by a host thread on iterators over host containers, they can just as easily be invoked by a GPU thread inside a CUDA kernel!
NVBIO also offers parallel batch alignment functions, that can be used to easily align pairs of corresponding strings in two string sets, as the following example shows.
void batch_alignment_test()
    // build two concatenated string sets, one for the patterns, 
    // containing a concatenated sequence of strings of 100 
    // characters each, and one for the texts, 
    // containing 200 characters each
    const uint32 n_strings   = 10000;
    const uint32 pattern_len = 100;
    const uint32 text_len    = 200;

    // setup the strings on the host
    nvbio::vector<host_tag, uint8> h_pattern(n_strings * pattern_len);
    nvbio::vector<host_tag, uint8> h_text(n_strings * text_len);

    // fill their content with random characters
    for (uint32 i = 0; i < h_pattern.size(); i++) 
      h_pattern[i] = uint8( drand48() * 255.0f );
    for (uint32 i = 0; i < text_storage.size(); i++)    
      h_text[i] = uint8( drand48() * 255.0f );

    // copy the strings storage to the device
    nvbio::vector<device_tag, uint8> d_pattern( h_pattern );
    nvbio::vector<device_tag, uint8> d_text( h_text );

    // allocate two vectors representing the string offsets
    nvbio::vector<device_tag, uint32> d_pattern_offsets( n_strings+1 );
    nvbio::vector<device_tag, uint32> d_text_offsets( n_strings+1 );

    // prepare the string offsets using Thrust's sequence() 
    // function, setting up the offset of pattern i as i * pattern_len, 
    // and the offset of text i as i * text_len
    thrust::sequence( d_pattern_offsets.begin(), 
                      d_pattern_offsets.end(), 0u, pattern_len );
    thrust::sequence( d_text_offsets.begin(), 
                      d_text_offsets.end(), 0u, text_len );

    // prepare a vector of alignment sinks
    nvbio::vector<device_tag, aln::BestSink > 
        sinks( n_strings );

    // and execute the batch alignment, on a GPU device
            <aln::SEMI_GLOBAL, MyersTag<ALPHABET_SIZE> >(),
        make_concatenated_string_set( n_strings, 
                                      d_pattern_offsets.begin() ),
        make_concatenated_string_set( n_strings, 
                                      d_text_offsets.begin() ),
        aln::DeviceThreadScheduler() );

High Composability

NVBIO’s containers and algorithms are designed to be highly composable. For example, consider how many aligners use ad-hoc FM-index implementations to perform seed mapping on DNA strands. Now, what if after having spent several months working on your custom FM-index you’d want to write a protein search tool?
Proteins require a much larger alphabet, and the simple rank dictionaries based on occurrence count tables require exponential storage in the number of bits per character. NVBIO’s Wavelet Trees can be used as an efficient rank dictionary for large alphabet strings, and can be easily used together with NVBIO’s FM-index class, as shown in the following example (See the full example on Github).
void protein_search()
    // this string will be our protein database we want to search
    // just a short string with all protein characters in sequence
    const char* proteins = "ACDEFGHIKLMNOPQRSTVWYBZX";
    const uint32 text_len = 24;
    const uint32 alphabet_bits = 
        AlphabetTraits< PROTEIN >::SYMBOL_SIZE;
    const uint32 alphabet_size = 1u << alphabet_bits;

    // allocate a host packed vector
    PackedVector<host_tag, alphabet_bits, true> h_text( text_len );

    // pack the string
    from_string<PROTEIN>( proteins, 
                                proteins + text_len, 
                                h_text.begin() );

    // copy it to the device
    PackedVector<device_tag, alphabet_bits, true> d_text( h_text );

    // allocate a vector for the BWT
    PackedVector<device_tag, alphabet_bits, true> d_bwt( text_len + 1 );

    // build the BWT
    const uint32 primary = cuda::bwt( text_len, 
                                      d_bwt.begin(), NULL );

    // define our wavelet tree storage type and its plain view
    typedef WaveletTreeStorage< device_tag > wavelet_tree_type;
    typedef WaveletTreeStorage< device_tag >::const_plain_view_type

    // build a wavelet tree
    wavelet_tree_type wavelet_bwt;

    // setup the wavelet tree
    setup( text_len, d_bwt.begin(), wavelet_bwt );

    typedef nvbio::vector<device_tag, uint32>::const_iterator 
    typedef fm_index<wavelet_tree_view_type, null_type, l2_iterator>

    // take the const view of the wavelet tree
    const wavelet_tree_view_type wavelet_bwt_view = 
        plain_view( (const wavelet_tree_type&)wavelet_bwt );

    // build the L2 table
    nvbio::vector<device_tag,uint32> L2(alphabet_size+1);

    // note we perform this on the host, even though the wavelet 
    // tree is on the device - gonna be slow, but it's not much 
    // stuff, and this is just an example anyway... and we just 
    // want to show that NVBIO is designed to make everything work!
    L2[0] = 0;
    for (uint32 i = 1; i <= alphabet_size; ++i)
        L2[i] = L2[i-1] + rank( wavelet_bwt_view, text_len, i-1u );

    // build the FM-index
    const fm_index_type fmi(
        null_type() );

    // do some string matching using our newly built FM-index - once 
    // again we are doing it on the host, though all data is on the 
    // device: the entire loop would be better moved to the device 
    // in a real app.
    fprintf(stderr, " string matching:\n");
    for (uint32 i = 0; i < text_len; ++i)
        // match the i-th suffix of the text
        const uint32 pattern_len = text_len - i;

        // compute the SA range containing the occurrences 
        // of the pattern we are after
        const uint2 range = match( fmi, 
                                   d_text.begin() + i, 
                                   pattern_len );

        // print the number of occurrences of our pattern, 
        // equal to the SA range size
        fprintf(stderr, "  rank(%s): %u\n", 
                proteins + i, 1u + range.y - range.x);

High Performance

The following graphs show the staggering performance of some of NVBIO’s parallel algorithms running on NVIDIA Tesla K40 Accelerators, for tasks such as FM-index lookup, DP alignment and BWT construction for very large read collections.
NVBIO DP alignment throughput, comparing to the SSW library running on 20 CPU threads.
NVBIO DP alignment throughput, comparing to the SSW library running on 20 CPU threads.
NVBIO FM-Index Search Speedup on an NVIDIA K40 compared to 20 CPU threads.
NVBIO FM-Index Search Speedup on an NVIDIA K40 compared to 20 CPU threads.
NVBIO BWT speedup, comparing an NVIDIA K40 GPU to dual-socket Xeon E5-2690 v2.
NVBIO BWT speedup, comparing an NVIDIA K40 GPU to dual-socket Xeon E5-2690 v2.


NVBIO is interoperable both with the high level Thrust C++ parallel algorithms library and lower level OpenMP and CUDA environments. Like Thrust, NVBIO offers full performance portability, providing both CUDA and multi-core CPU implementations for most of its algorithms.


NVBIO contains a suite of prebuilt high-performance applications.
  • nvBowtie: an industrial-strength replacement for the popular bowtie2 aligner, providing almost the same exact specificity and sensitivity characteristics, at a great speed advantage.
  • nvBWT: a single-string BWT construction application, which can construct the BWT of a human genome in just under 2 minutes on a single Tesla K40—compared to the 15+ minutes required by state-of-the-art CPU software.
  • nvSetBWT: a state-of-the-art read indexing application, which can build the BWT of 1.2B x 100bp reads in under 1.7 hours.

Open Source

NVBIO is open-source software, distributed under the BSD Licence. The primary developers are Jacopo Pantaleoni and Nuno Subtil.

Try NVBIO Today

You can get NVBIO on Github at NVBIO has been designed for GPUs supporting at least CUDA Compute Capability 3.5. Due to the high memory requirements typical of bionformatics algorithms, NVIDIA Tesla K20, K20x or K40 and above GPUs are recommended.
NVBIO is under active development, and as we are writing this article, we are targeting even more applications and bioinformatics domains. If you have specific comments or requests, please comment below or check out the NVBIO Users Forum.

About Jacopo Pantaleoni

Jacopo Pantaleoni
Jacopo Pantaleoni is a Senior Architect at NVIDIA, where he currently leads the development of NVBIO, a high performance library for sequence analysis. As part of this effort he has developed the first and fastest massively parallel algorithm for incremental BWT and FM-index construction of large-scale genomic databases (, as well as nvBowtie, a full-featured re-engineering of the popular Bowtie2 aligner targeting multi-GPU systems. He has also been driving collaborations with the Broad, Life Technologies and other research centers and institutes in order to spread the adoption of GPU acceleration in bioinformatics. Prior to that, he has worked as a Senior Research Scientist at NVIDIA Research, where he has been studying new stochastic methods for physically based light transport simulations, developing novel rendering techniques for the film visual effects industry (contributing key technology - the PantaRay engine - for the Oscar winning special effects in Avatar), providing new algorithms for real time ray tracing and rasterization, and studying the feasibility of complex, programmable software pipelines on massively parallel processors. Prior to joining NVIDIA in 2008, he has founded Lightflow Technologies in 1999, developing one of the first publically available global illumination engines. He later worked as a rendering expert in the film industry, has co-authored a book on production rendering and joined mental images in 2005 where he worked as Lead Rendering Research Scientist to develop interactive photorealistic rendering solutions. Jacopo studied Mathematics at the University of Padova, Italy.

Popular posts from this blog

Hidden Wiki

Welcome to The Hidden WikiNew hidden wiki url 2015 http://zqktlwi4fecvo6ri.onion Add it to bookmarks and spread it!!!
Editor's picks Bored? Pick a random page from the article index and replace one of these slots with it.
The Matrix - Very nice to read. How to Exit the Matrix - Learn how to Protect yourself and your rights, online and off. Verifying PGP signatures - A short and simple how-to guide. In Praise Of Hawala - Anonymous informal value transfer system. Volunteer Here are five different things that you can help us out with.
Plunder other hidden service lists for links and place them here! File the SnapBBSIndex links wherever they go. Set external links to HTTPS where available, good certificate, and same content. Care to start recording onionland's history? Check out Onionland's Museum Perform Dead Services Duties. Introduction - Clearnet search engine for Tor Hidden Services (allows you to add new sites to its database). DuckDuckGo - A Hidden S…


Good News [May 08, 2015]: IDM developers got smarter, but the crackers are always a step ahead. Follow this article and send an email to if you are desperate. I can NOT post any crack here for legal reasons. Happy Downloading with IDM. ;) *********** first tip is to use latest crack for idm from idm universal web crack and make sure u are using all latest vers I am sure many of us are too much dependent on Internet Download Manager a.k.a. IDM. The main reason didn’t permanently switch to linux was IDM. I mainly use it for batch downloading and download streaming videos. Till yesterday, IDM was working fine with me (of course with fake serial numbers, keygen, crack, patch etc. which could be found with little effort). But few days ago, with the latest update version 6.18 build 7 (released on Nov 09, 2013) Internet Download Manager was literally had a breakthrough and crushed all the serial numbers, …

DoubleAgent Attack Turns Your Antivirus Into Malware And Hijacks Your PC

Short Bytes: Cybellum security researchers have uncovered a new attack mechanism that can be used to take control of your antivirus and turn it into a malware. Called DoubleAgent, this attack exploits an old and undocumented vulnerability in Windows operating system. This Zero Day code injection technique affects all major antivirus vendors and has the power to hijack permissions. The security researchers from Cybellum have found a new technique that can be used by the cybercriminals to hijack your computer by injecting malicious code. This new Zero-Day attack can be used to take full control over all the major antivirus software. Instead of hiding from the antivirus, this attack takes control of the antivirus itself. Called DoubleAgent, this attack makes use of a 15-year-old legitimate feature of Windows (read vulnerability)–that’s why it can’t be patched. It affects all versions of Microsoft Windows. Cybellum blog mentions that this flaw is still unpatched by most antivirus v…