NAME
    Bio::WebService::LANL::SequenceLocator - Locate sequences within HIV
    using LANL's web tool

SYNOPSIS
        use Bio::WebService::LANL::SequenceLocator;
    
        my $locator = Bio::WebService::LANL::SequenceLocator->new(
            agent_string => 'Your Organization - you@example.com',
        );
        my @sequences = $locator->find([
            "agcaatcagatggtcagccaaaattgccctatagtgcagaacatccaggggcaagtggtacatcaggccatatcacctagaactttaaatgca",
        ]);

    See "EXAMPLE RESULTS" below.

DESCRIPTION
    This library provides simple programmatic access to LANL's HIV sequence
    locator <http://www.hiv.lanl.gov/content/sequence/LOCATE/locate.html>
    web tool and is also used to power a simple, JSON-based web API
    <http://indra.mullins.microbiol.washington.edu/locate-sequence/> for the
    same tool (via Bio::WebService::LANL::SequenceLocator::Server).

    Nearly all of the information output by LANL's sequence locator is
    parsed and provided by this library, though the results do vary slightly
    depending on the base type of the query sequence. Multiple query
    sequences can be located at the same time and results will be returned
    for all.

    Results are extracted from both tab-delimited files provided by LANL as
    well as the HTML itself.

EXAMPLE RESULTS
        # Using @sequences from the SYNOPSIS above
        use JSON;
        print encode_json(\@sequences);
    
        __END__
        [
           {
              "query" : "sequence_1",
              "query_sequence" : "AGCAATCAGATGGTCAGCCAAAATTGCCCTATAGTGCAGAACATCCAGGGGCAAGTGGTACATCAGGCCATATCACCTAGAACTTTAAATGCA",
              "base_type" : "nucleotide",
              "reverse_complement" : "0",
              "alignment" : "\n Query AGCAATCAGA TGGTCAGCCA AAATTGCCCT ATAGTGCAGA ACATCCAGGG  50\n       ::::::::    ::::::::: ::::: :::: :::::::::: :::::::::: \n  HXB2 AGCAATCA-- -GGTCAGCCA AAATTACCCT ATAGTGCAGA ACATCCAGGG  1208\n\n Query GCAAGTGGTA CATCAGGCCA TATCACCTAG AACTTTAAAT GCA  93\n       :::: ::::: :::::::::: :::::::::: :::::::::: ::: \n  HXB2 GCAAATGGTA CATCAGGCCA TATCACCTAG AACTTTAAAT GCA  1251\n\n  ",
              "hxb2_sequence" : "AGCAATCA---GGTCAGCCAAAATTACCCTATAGTGCAGAACATCCAGGGGCAAATGGTACATCAGGCCATATCACCTAGAACTTTAAATGCA",
              "similarity_to_hxb2" : "94.6",
              "start" : "373",
              "end" : "462",
              "genome_start" : "1162",
              "genome_end" : "1251",
              "polyprotein" : "Gag",
              "region_names" : [
                 "Gag",
                 "p17",
                 "p24"
              ],
              "regions" : [
                 {
                    "cds" : "Gag",
                    "aa_from_protein_start" : [ "125", "154" ],
                    "na_from_cds_start" : [ "373", "462" ],
                    "na_from_hxb2_start" : [ "1162", "1251" ],
                    "na_from_query_start" : [ "1", "93" ],
                    "protein_translation" : "SNQMVSQNCPIVQNIQGQVVHQAISPRTLNA"
                 },
                 {
                    "cds" : "p17",
                    "aa_from_protein_start" : [ "125", "132" ],
                    "na_from_cds_start" : [ "373", "396" ],
                    "na_from_hxb2_start" : [ "1162", "1185" ],
                    "na_from_query_start" : [ "1", "27" ],
                    "protein_translation" : "SNQMVSQNC"
                 },
                 {
                    "cds" : "p24",
                    "aa_from_protein_start" : [ "1", "22" ],
                    "na_from_cds_start" : [ "1", "66" ],
                    "na_from_hxb2_start" : [ "1186", "1251" ],
                    "na_from_query_start" : [ "28", "93" ],
                    "protein_translation" : "PIVQNIQGQVVHQAISPRTLNA"
                 }
              ]
           }
        ]

METHODS
  new
    Returns a new instance of this class. An optional parameter
    "agent_string" should be provided to identify yourself to LANL out of
    politeness. See the "SYNOPSIS" for an example.

  find
    Takes an array ref of sequence strings. Sequences may be in amino acids
    or nucleotides and mixed freely. Sequences should not be in FASTA
    format.

    If sequence bases are not clearly nucleotides or clearly amino acids,
    LANL seems to default to nucleotides. This can be an issue for some
    sequences since the full alphabet for nucleotides overlaps with the
    alphabet for amino acids. To overcome this problem, you may specify
    "base => 'nucleotide'" or "base => 'amino acid'" after the array ref of
    sequences. This forces every sequence to be interpreted as nucleotides
    or amino acids, so you cannot mix base types in your sequences if you
    use this option. "n", "nuc", and "nucleotides" are accepted aliases for
    "nucleotide". "a", "aa", "amino", and "amino acids" are accepted aliases
    for "amino acid".

    Returns a list of hashrefs when called in list context, otherwise
    returns an arrayref of hashrefs.

    See "EXAMPLE RESULTS" for the structure of the data returned.

AUTHOR
    Thomas Sibley <trsibley@uw.edu>

COPYRIGHT
    Copyright 2014 by the Mullins Lab, Department of Microbiology,
    University of Washington.

LICENSE
    Licensed under the same terms as Perl 5 itself.