Interpol: An R package for preprocessing of protein sequences

Heider, Dominik; Hoffmann, Daniel

doi:10.1186/1756-0381-4-16

Heider, Dominik; Hoffmann, Daniel:

Interpol : An R package for preprocessing of protein sequences

In: BioData Mining, Band 4 (2011), S. 16

2011Artikel/Aufsatz in ZeitschriftOA Gold

MedizinBiologieForschungszentren » Zentrum für Medizinische Biotechnologie (ZMB)Fakultät für Biologie » Bioinformatics and Computational Biophysics

Damit verbunden: 1 Publikation(en)

Titel in Englisch:

Interpol : An R package for preprocessing of protein sequences

Autor*in:

Heider, Dominik^UDE;Hoffmann, Daniel^UDE

Erscheinungsjahr:

2011

Open Access?:

OA Gold

DOI

10.1186/1756-0381-4-16

DuEPublico 1 ID

47013

URN

urn:nbn:de:hbz:464-20180912-155910-3

Notiz:

OA Förderung 2011

Sprache des Textes:

Englisch

Erschienen in

BioData Mining

Verlag:

BioMed Central

in:

Band 4 (2011), S. 16

ISSN

1756-0381

ZDB ID

2438773-3

Abstract in Englisch:

Background: Most machine learning techniques currently applied in the literature need a fixed dimensionality of input data. However, this requirement is frequently violated by real input data, such as DNA and protein sequences, that often differ in length due to insertions and deletions. It is also notable that performance in classification and regression is often improved by numerical encoding of amino acids, compared to the commonly used sparse encoding. Results: The software "Interpol" encodes amino acid sequences as numerical descriptor vectors using a database of currently 532 descriptors (mainly from AAindex), and normalizes sequences to uniform length with one of five linear or non-linear interpolation algorithms. Interpol is distributed with open source as platform independent R-package. It is typically used for preprocessing of amino acid sequences for classification or regression. Conclusions: The functionality of Interpol widens the spectrum of machine learning methods that can be applied to biological sequences, and it will in many cases improve their performance in classification and regression.

Universitätsbibliographie

Publikationsverzeichnis der Universität Duisburg-Essen

Heider, Dominik: Interpol : An R package for preprocessing of protein sequences

Abstract in Englisch: