traductor

miércoles, 6 de diciembre de 2023

Introducing Gemini: our largest and most capable AI model

 


La carrera por la inteligencia artificial (IA) se ha convertido en una prueba de velocidad. A los avances en ChatGPT, que ya va por su cuarta versión, y los consecutivos anuncios de las grandes multinacionales de sus propios sistemas, Google ha respondido este miércoles con el lanzamiento de Gemini, una plataforma de inteligencia artificial multimodal que puede procesar y generar texto, código, imágenes, audio y vídeo desde distintas fuentes de datos. La versión Ultra, “disponible a comienzos del próximo año”, según ha anunciado Eli Collins, vicepresidente de productos en Google DeepMind, supera a los humanos en comprensión masiva del lenguaje multitarea (MMLU, por sus siglas en inglés), una referencia de evaluación creada a partir de 57 materias de ciencias, tecnología, ingeniería, matemáticas (STEM), humanidades y ciencias sociales.

Introducing Gemini: our largest and most capable AI model

Every technology shift is an opportunity to advance scientific discovery, accelerate human progress, and improve lives. I believe the transition we are seeing right now with AI will be the most profound in our lifetimes, far bigger than the shift to mobile or to the web before it. AI has the potential to create opportunities — from the everyday to the extraordinary — for people everywhere. It will bring new waves of innovation and economic progress and drive knowledge, learning, creativity and productivity on a scale we haven’t seen before.

That’s what excites me: the chance to make AI helpful for everyone, everywhere in the world.

Nearly eight years into our journey as an AI-first company, the pace of progress is only accelerating: Millions of people are now using generative AI across our products to do things they couldn’t even a year ago, from finding answers to more complex questions to using new tools to collaborate and create. At the same time, developers are using our models and infrastructure to build new generative AI applications, and startups and enterprises around the world are growing with our AI tools.

This is incredible momentum, and yet, we’re only beginning to scratch the surface of what’s possible.

We’re approaching this work boldly and responsibly. That means being ambitious in our research and pursuing the capabilities that will bring enormous benefits to people and society, while building in safeguards and working collaboratively with governments and experts to address risks as AI becomes more capable. And we continue to invest in the very best tools, foundation models and infrastructure and bring them to our products and to others, guided by our AI Principles.

Now, we’re taking the next step on our journey with Gemini, our most capable and general model yet, with state-of-the-art performance across many leading benchmarks. Our first version, Gemini 1.0, is optimized for different sizes: Ultra, Pro and Nano. These are the first models of the Gemini era and the first realization of the vision we had when we formed Google DeepMind earlier this year. This new era of models represents one of the biggest science and engineering efforts we’ve undertaken as a company. I’m genuinely excited for what’s ahead, and for the opportunities Gemini will unlock for people everywhere


Introducing Gemini

By Demis Hassabis, CEO and Co-Founder of Google DeepMind, on behalf of the Gemini team

AI has been the focus of my life's work, as for many of my research colleagues. Ever since programming AI for computer games as a teenager, and throughout my years as a neuroscience researcher trying to understand the workings of the brain, I’ve always believed that if we could build smarter machines, we could harness them to benefit humanity in incredible ways.

This promise of a world responsibly empowered by AI continues to drive our work at Google DeepMind. For a long time, we’ve wanted to build a new generation of AI models, inspired by the way people understand and interact with the world. AI that feels less like a smart piece of software and more like something useful and intuitive — an expert helper or assistant.

Today, we’re a step closer to this vision as we introduce Gemini, the most capable and general model we’ve ever built.

Gemini is the result of large-scale collaborative efforts by teams across Google, including our colleagues at Google Research. It was built from the ground up to be multimodal, which means it can generalize and seamlessly understand, operate across and combine different types of information including text, code, audio, image and video.

Gemini is also our most flexible model yet — able to efficiently run on everything from data centers to mobile devices. Its state-of-the-art capabilities will significantly enhance the way developers and enterprise customers build and scale with AI.

We’ve optimized Gemini 1.0, our first version, for three different sizes:

  • Gemini Ultra — our largest and most capable model for highly complex tasks.
  • Gemini Pro — our best model for scaling across a wide range of tasks.
  • Gemini Nano — our most efficient model for on-device tasks.

State-of-the-art performance

We've been rigorously testing our Gemini models and evaluating their performance on a wide variety of tasks. From natural image, audio and video understanding to mathematical reasoning, Gemini Ultra’s performance exceeds current state-of-the-art results on 30 of the 32 widely-used academic benchmarks used in large language model (LLM) research and development.

With a score of 90.0%, Gemini Ultra is the first model to outperform human experts on MMLU (massive multitask language understanding), which uses a combination of 57 subjects such as math, physics, history, law, medicine and ethics for testing both world knowledge and problem-solving abilities.

Our new benchmark approach to MMLU enables Gemini to use its reasoning capabilities to think more carefully before answering difficult questions, leading to significant improvements over just using its first impression.


Gemini Ultra also achieves a state-of-the-art score of 59.4% on the new MMMU benchmark, which consists of multimodal tasks spanning different domains requiring deliberate reasoning.

With the image benchmarks we tested, Gemini Ultra outperformed previous state-of-the-art models, without assistance from object character recognition (OCR) systems that extract text from images for further processing. These benchmarks highlight Gemini’s native multimodality and indicate early signs of Gemini's more complex reasoning abilities.

See more details in our Gemini technical report.

Next-generation capabilities

Until now, the standard approach to creating multimodal models involved training separate components for different modalities and then stitching them together to roughly mimic some of this functionality. These models can sometimes be good at performing certain tasks, like describing images, but struggle with more conceptual and complex reasoning.

We designed Gemini to be natively multimodal, pre-trained from the start on different modalities. Then we fine-tuned it with additional multimodal data to further refine its effectiveness. This helps Gemini seamlessly understand and reason about all kinds of inputs from the ground up, far better than existing multimodal models — and its capabilities are state of the art in nearly every domain.

Learn more about Gemini’s capabilities and see how it works.

Sophisticated reasoning

Gemini 1.0’s sophisticated multimodal reasoning capabilities can help make sense of complex written and visual information. This makes it uniquely skilled at uncovering knowledge that can be difficult to discern amid vast amounts of data.

Its remarkable ability to extract insights from hundreds of thousands of documents through reading, filtering and understanding information will help deliver new breakthroughs at digital speeds in many fields from science to finance.


Proyecto Gemini, el nuevo AI de Google

Gemini es multimodal. Puede reconocer imágenes y hablar en tiempo real. Gemini Ultra es el PRIMER modelo de IA en superar a los expertos humanos en el benchmark MMLU, con una puntuación superior al 90%.

Gemini posee un razonamiento sofisticado, multimodalidad y codificación avanzada. El modelo también es avanzado en matemáticas, algo que GPT no puede hacer. En esta demostración, 

Gemini comprende muy bien la ciencia. Puede buscar y extraer información de miles de documentos de investigación. Puede comprender no solo texto sino también gráficos y otros elementos visuales. Ultra potente.

Gemini está disponible en tres tamaños: Ultra para tareas complejas (disponible a principios del próximo año) 

Pro para escalabilidad en una variedad de tareas. (Disponible desde hoy) 

 Nano para tareas en dispositivos móviles. (especialmente en Pixel)

 El rendimiento de Gemini Ultra supera los resultados actuales de vanguardia en 30 de las 32 referencias utilizadas en la investigación y el desarrollo de LLM. Es mejor que GPT4

Gemini Pro se desplegará gratuitamente en Bard y en las aplicaciones de Google. En seis de ocho pruebas, Gemini Pro superó a GPT-3.5. Por lo tanto, es el chatbot gratuito más poderoso del mercado hoy 

(1) Oriol Vinyals on X: "Exciting times, welcome Gemini (and MMLU>90)! State-of-the-art on 30 out of 32 benchmarks across text, coding, audio, images, and video, with a single model 🤯 Co-leading Gemini has been my most exciting endeavor, fueled by a very ambitious goal. And that is just the beginning!… https://t.co/AQ5MvJA4up" / X (twitter.com)

Introducing Gemini: Google’s most capable AI model yet (blog.google)

https://www.youtube.com/watch?time_continue=1&v=sPiOP_CB54A&embeds_referring_euri=https%3A%2F%2Ftwitter.com%2F&source_ve_path=Mjg2NjY&feature=emb_logo

Avances respecto a Gemini que prometían superar a GPT en varios niveles.

Ahora los primeros análisis públicos lo confirman. Además es un modelo multimodal nativo, no un conjunto de modelos conectados como GPT-4, y eso es un gran paso adelante.



- El CEO de Google #DeepMind dice que su próximo algoritmo eclipsará a #ChatGPT. - AI.El CEO de Google #DeepMind dice que su próximo algoritmo eclipsará a #ChatGPT.

#Gemini con su mezcla de técnicas predictivas y generativas, el resultado va a ser verdaderamente acelerador para la industria AI.


No hay comentarios: