La IA está empezando a operar sobre uno de los sistemas de información más complejos que conocemos: el genoma
Un artículo reciente publicado en la revista @Nature presenta Evo 2, un modelo fundacional de IA entrenado directamente sobre secuencias de ADN
Los modelos lingüísticos aprenden patrones en grandes corpus de texto; Evo 2 aprende el “lenguaje del ADN” a partir de enormes bases de datos genómicas
Ha sido entrenado con más de 9 billones de pares de bases de más de 100.000 genomas de bacterias, arqueas y eucariotasEste entrenamiento permite:
•Predecir el efecto funcional de mutaciones
•Analizar variantes clínicas humanas
•Detectar estructuras funcionales (sitios de unión, intrones, elementos regulatorios)
•Generar nuevas secuencias genómicas que se parezcan a genomas reales
Este trabajo marca un punto de inflexión: una fase de biología generativa, donde la comprensión computacional del genoma empieza a ser lo suficientemente rica como para proponer nuevas arquitecturas biológicas
Las implicaciones potenciales van desde la interpretación de variantes genéticas humanas hasta el diseño de sistemas biológicos en biotecnología o medicina
Referencia: Nature (2026) – Genome modelling and design across all domains of life with Evo 2
https://www.nature.com/articles/s41586-026-10176-5
Genome modelling and design across all domains of life with Evo 2
All of life encodes information with DNA. Although tools for genome sequencing, synthesis and editing have transformed biological research, we still lack sufficient understanding of the immense complexity encoded by genomes to predict the effects of many classes of genomic changes or to intelligently compose new biological systems. Artificial intelligence models that learn information from genomic sequences across diverse organisms have increasingly advanced prediction and design capabilities1,2. Here we introduce Evo 2, a biological foundation model trained on 9 trillion DNA base pairs from a highly curated genomic atlas spanning all domains of life to have a 1 million token context window with single-nucleotide resolution. Evo 2 learns to accurately predict the functional impacts of genetic variation—from noncoding pathogenic mutations to clinically significant BRCA1 variants—without task-specific fine-tuning. Mechanistic interpretability analyses reveal that Evo 2 learns representations associated with biological features, including exon–intron boundaries, transcription factor binding sites, protein structural elements and prophage genomic regions. The generative abilities of Evo 2 produce mitochondrial, prokaryotic and eukaryotic sequences at genome scale with greater naturalness and coherence than previous methods. Evo 2 also generates experimentally validated chromatin accessibility patterns when guided by predictive models3,4 and inference-time search. We have made Evo 2 fully open, including model parameters, training code5, inference code and the OpenGenome2 dataset, to accelerate the exploration and design of biological complexity.

No hay comentarios:
Publicar un comentario