Posts with «information technology» label

Google I/O 2024: Everything revealed including Gemini AI, Android 15 and more

At the end of I/O, Google’s annual developer conference at the Shoreline Amphitheater in Mountain View, Google CEO Sundar Pichai revealed that the company had said “AI” 121 times. That, essentially, was the crux of Google’s two-hour keynote — stuffing AI into every Google app and service used by more than two billion people around the world. Here are all the major updates from Google's big event, along with some additional announcements that came after the keynote.

Gemini 1.5 Flash and updates to Gemini 1.5 Pro

Google

Google announced a brand new AI model called Gemini 1.5 Flash, which it says is optimised for speed and efficiency. Flash sits between Gemini 1.5 Pro and Gemini 1.5 Nano, which its the company’s smallest model that runs locally on device. Google said that it created Flash because developers wanted a lighter and less expensive model than Gemini Pro to build AI-powered apps and services while keeping some of the things like a long context window of one million tokens that differentiates Gemini Pro from competing models. Later this year, Google will double Gemini’s context window to two million tokens, which means that it will be able to process two hours of video, 22 hours of audio, more than 60,000 lines of code or more than 1.4 million words at the same time.

Project Astra

Google

Google showed off Project Astra, an early version of a universal assistant powered by AI that Google’s DeepMind CEO Demis Hassabis said was Google’s version of an AI agent “that can be helpful in everyday life.”

In a video that Google says was shot in a single take, an Astra user moves around Google’s London office holding up their phone and pointing the camera at various things — a speaker, some code on a whiteboard, and out a window — and has a natural conversation with the app about what it seems. In one of the video’s most impressive moments, the correctly tells the user where she left her glasses before without the user ever having brought up the glasses.

The video ends with a twist — when the user finds and wears the missing glasses, we learn that they have an onboard camera system and are capable of using Project Astra to seamlessly carry on a conversation with the user, perhaps indicating that Google might be working on a competitor to Meta’s Ray Ban smart glasses.

Ask Google Photos

Google

Google Photos was already intelligent when it came to searching for specific images or videos, but with AI, Google is taking things to the next level. If you’re a Google One subscriber in the US, you will be able to ask Google Photos a complex question like “show me the best photo from each national park I’ve visited" when the feature rolls out over the next few months. Google Photos will use GPS information as well as its own judgement of what is “best” to present you with options. You can also ask Google Photos to generate captions to post the photos to social media.

Veo and Imagen 3

Google

Google’s new AI-powered media creation engines are called Veo and Imagen 3. Veo is Google’s answer to OpenAI’s Sora. It can produce “high-quality” 1080p videos that can last “beyond a minute”, Google said, and can understand cinematic concepts like a timelapse.

Imagen 3, meanwhile, is a text-to-image generator that Google claims handles text better than its previous version, Imagen 2. The result is the company’s highest quality” text-to-image model with “incredible level of detail” for “photorealistic, lifelike images” and fewer artifacts — essentially pitting it against OpenAI’s DALLE-3.

Big updates to Google Search

Google

Google is making big changes to how Search fundamentally works. Most of the updates announced today like the ability to ask really complex questions (“Find the best yoga or pilates studios in Boston and show details on their intro offers and walking time from Beacon Hill.”) and using Search to plan meals and vacations won’t be available unless you opt in to Search Labs, the company’s platform that lets people try out experimental features.

But a big new feature that Google is calling AI Overviews and which the company has been testing for a year now, is finally rolling out to millions of people in the US. Google Search will now present AI-generated answers on top of the results by default, and the company says that it will bring the feature to more than a billion users around the world by the end of the year.

Gemini on Android

Google

Google is integrating Gemini directly into Android. When Android 15 releases later this year, Gemini will be aware of the app, image or video that you’re running, and you’ll be able to pull it up as an overlay and ask it context-specific questions. Where does that leave Google Assistant that already does this? Who knows! Google didn’t bring it up at all during today’s keynote.

WearOS 5 battery life improvements

Google isn't quite ready to roll out the latest version of it smartwatch OS, but it is promising some major battery life improvements when it comes. The company said that Wear OS 5 will consume 20 percent less power than Wear OS 4 if a user runs a marathon. Wear OS 4 already brought battery life improvements to smartwatches that support it, but it could still be a lot better at managing a device's power. Google also provided developers with a new guide on how to conserve power and battery, so that they can create more efficient apps.

Android 15 anti-theft features

Android 15's developer preview may have been rolling for months, but there are still features to come. Theft Detection Lock is a new Android 15 feature that will use AI (there it is again) to predict phone thefts and lock things up accordingly. Google says its algorithms can detect motions associated with theft, like those associated with grabbing the phone and bolting, biking or driving away. If an Android 15 handset pinpoints one of these situations, the phone’s screen will quickly lock, making it much harder for the phone snatcher to access your data.

There were a bunch of other updates too. Google said it would add digital watermarks to AI-generated video and text, make Gemini accessible in the side panel in Gmail and Docs, power a virtual AI teammate in Workspace, listen in on phone calls and detect if you’re being scammed in real time, and a lot more.


Catch up on all the news from Google I/O 2024 right here!

Update May 15, 2:45PM ET: This story was updated after being published to include details on new Android 15 and WearOS 5 announcements made following the I/O 2024 keynote.

This article originally appeared on Engadget at https://www.engadget.com/google-io-2024-everything-revealed-including-gemini-ai-android-15-and-more-210414423.html?src=rss

Google's Wear OS 5 promises better battery life

Google has unveiled Wear OS 5 at its I/O developer conference today, giving us a glimpse of new features and other improvements coming with the platform. The company isn't quite ready to roll out the final version of the wearable OS, but its developer preview already features enhanced battery life. As an example, Google said Wear OS 5 will consume 20 percent less power than Wear OS 4 if the user runs a marathon. Wear OS 4 already brought battery life improvements to smartwatches that support it, but it could still be a lot better at managing a device's power. Google also provided developers with a new guide on how to conserve power and battery, so that they can create more efficient apps.

In addition, Google has launched new features in Watch Face Format, allowing developers to make more types of watch faces that show different kinds of information. It has enabled the creation of apps that can show current weather information at a glance with this update, including the temperature and chances of rain. The company is also adding support for new complication types. They include "goal progress," which suits data wherein the user has a target but can exceed it, and "weighted elements," which can be used to represent discrete subsets of data.

Wear OS 5 could give rise to new apps and new functionalities in old apps, as well. Google's Health Connect API for the platform will allow apps to access user data even while they're only running in the background. It will also enable them to access health information over the past 30 days, though users will have to give their explicit permission before apps can take advantage of both features. Finally, Wear OS 5's Health Services API supports new data types for running, such as ground contact time and stride length.

Google didn't announce when Wear OS 5 will be available, but its predecessor, Wear OS 4, launched with the Samsung Galaxy Watch 6 in August 2023. Based on the timeline and the devices that support the current platform, Watch OS 5 could launch with the Samsung Galaxy 7 or the Pixel Watch 3 later this year.

Catch up on all the news from Google I/O 2024 right here!

This article originally appeared on Engadget at https://www.engadget.com/googles-wear-os-5-promises-better-battery-life-182834300.html?src=rss

Xbox Cloud Gaming finally supports keyboard and mouse inputs on web browsers

Microsoft just released a new update for Xbox Cloud Gaming that finally brings mouse and keyboard support, after teasing the feature for years. The tool is currently in beta release and works with both the Edge and Chrome web browsers. It looks pretty simple to use. Just select a game that supports a mouse and keyboard and have at it.

You can also instantly switch between a mouse/keyboard combination to a standard controller by pressing the Xbox button on the controller or pressing a key on the keyboard. The company says it’ll be rolling out badges later in the month to alert users which games support mouse and keyboard inputs.

For now, there’s support for 26 games. These include blockbusters like ARK Survival Evolved, Halo Infinite and, of course, Fortnite. Smaller games like High on Life and Pentiment can also be controlled via mouse and keyboard. Check the above link for the full list.

Microsoft hasn’t said what took it so long to get this going. The feature was originally presumed to launch back in June of 2022, but we didn’t get a progress update until two months ago. No matter the reason, KBM setups are practically a requirement for first-person shooters and, well, better late than never.

This article originally appeared on Engadget at https://www.engadget.com/xbox-cloud-gaming-finally-supports-keyboard-and-mouse-inputs-on-web-browsers-165215925.html?src=rss

Apple brings eye-tracking to recent iPhones and iPads

Ahead of Global Accessibility Awareness Day this week, Apple is issuing its typical annual set of announcements around its assistive features. Many of these are useful for people with disabilities, but also have broader applications as well. For instance, Personal Voice, which was released last year, helps preserve someone's speaking voice. It can be helpful to those who are at risk of losing their voice or have other reasons for wanting to retain their own vocal signature for loved ones in their absence. Today, Apple is bringing eye-tracking support to recent models of iPhones and iPads, as well as customizable vocal shortcuts, music haptics, vehicle motion cues and more. 

Built-in eye-tracking for iPhones and iPads

The most intriguing feature of the set is the ability to use the front-facing camera on iPhones or iPads (at least those with the A12 chip or later) to navigate the software without additional hardware or accessories. With this enabled, people can look at their screen to move through elements like apps and menus, then linger on an item to select it. 

That pause to select is something Apple calls Dwell Control, which has already been available elsewhere in the company's ecosystem like in Mac's accessibility settings. The setup and calibration process should only take a few seconds, and on-device AI is at work to understand your gaze. It'll also work with third-party apps from launch, since it's a layer in the OS like Assistive Touch. Since Apple already supported eye-tracking in iOS and iPadOS with eye-detection devices connected, the news today is the ability to do so without extra hardware.

Vocal shortcuts for easier hands-free control

Apple is also working on improving the accessibility of its voice-based controls on iPhones and iPads. It again uses on-device AI to create personalized models for each person setting up a new vocal shortcut. You can set up a command for a single word or phrase, or even an utterance (like "Oy!" perhaps). Siri will understand these and perform your designated shortcut or task. You can have these launch apps or run a series of actions that you define in the Shortcuts app, and once set up, you won't have to first ask Siri to be ready. 

Another improvement coming to vocal interactions is "Listen for Atypical Speech," which has iPhones and iPads use on-device machine learning to recognize speech patterns and customize their voice recognition around your unique way of vocalizing. This sounds similar to Google's Project Relate, which is also designed to help technology better understand those with speech impairments or atypical speech.

To build these tools, Apple worked with the Speech Accessibility Project at the Beckman Institute for Advanced Science and Technology at the University of Illinois Urbana-Champaign. The institute is also collaborating with other tech giants like Google and Amazon to further development in this space across their products.

Music haptics in Apple Music and other apps

For those who are deaf or hard of hearing, Apple is bringing haptics to music players on iPhone, starting with millions of songs on its own Music app. When enabled, music haptics will play taps, textures and specialized vibrations in tandem with the audio to bring a new layer of sensation. It'll be available as an API so developers can bring greater accessibility to their apps, too. 

Help in cars — motion sickness and CarPlay

Drivers with disabilities need better systems in their cars, and Apple is addressing some of the issues with its updates to CarPlay. Voice control and color filters are coming to the interface for vehicles, making it easier to control apps by talking and for those with visual impairments to see menus or alerts. To that end, CarPlay is also getting bold and large text support, as well as sound recognition for noises like sirens or honks. When the system identifies such a sound, it will display an alert at the bottom of the screen to let you know what it heard. This works similarly to Apple's existing sound recognition feature in other devices like the iPhone.

Apple

For those who get motion sickness while using their iPhones or iPads in moving vehicles, a new feature called Vehicle Motion Cues might alleviate some of that discomfort. Since motion sickness is based on a sensory conflict from looking at stationary content while being in a moving vehicle, the new feature is meant to better align the conflicting senses through onscreen dots. When enabled, these dots will line the four edges of your screen and sway in response to the motion it detects. If the car moves forward or accelerates, the dots will sway backwards as if in reaction to the increase in speed in that direction.

Other Apple Accessibility updates

There are plenty more features coming to the company's suite of products, including Live Captions in VisionOS, a new Reader mode in Magnifier, support for multi-line braille and a virtual trackpad for those who use Assistive Touch. It's not yet clear when all of these announced updates will roll out, though Apple has historically made these features available in upcoming versions of iOS. With its developer conference WWDC just a few weeks away, it's likely many of today's tools get officially released with the next iOS.

This article originally appeared on Engadget at https://www.engadget.com/apple-brings-eye-tracking-to-recent-iphones-and-ipads-140012990.html?src=rss

The Morning After: The biggest news from Google's I/O keynote

Google boss, Sundar Pichai, wrapped up the company’s I/O developer conference by noting its almost-two-hour presentation had mentioned AI 121 times. It was everywhere.

Google’s newest AI model, Gemini 1.5 Flash, is built for speed and efficiency. The company said it created Flash because developers wanted a lighter, less expensive model than Gemini Pro to build AI-powered apps and services.

Google says it’ll double Gemini’s context window to two million tokens, enough to process two hours of video, 22 hours of audio, more than 60,000 lines of code or 1.4 million-plus words at the same time.

But the bigger news is how the company is sewing AI into all the things you’re already using. With search, it’ll be able to answer your complex questions (a la Copilot in Bing), but for now, you’ll have to sign up to the company’s Search Labs to try that out. AI-generated answers will also appear alongside typical search results, just in case the AI knows better.

Google Photos was already pretty smart at searching for specific images or videos, but with AI, Google is taking things to the next level. If you’re a Google One subscriber in the US, you will be able to ask Google Photos a complex question, like show me the best photo from each national park I’ve visited. You can also ask Google Photos to generate captions for you.

And, if you have an Android, Gemini is integrating directly into the device. Gemini will know the app, image or video you’re running, and you’ll be able to pull it up as an overlay and ask it context-specific questions, like how to change settings or maybe even who’s displayed on screen. 

While these were the bigger beats, there was an awful lot to chew over. Check out all the headlines right here.

— Mat Smith

The biggest stories you might have missed

Google wants you to relax and have a natural chat with Gemini Live

Google Pixel 8a review

Google unveils Veo and Imagen 3, its latest AI media creation models

​​You can get these reports delivered daily direct to your inbox. Subscribe right here!

Google reveals its visual AI assistant, Project Astra

Full of potential.

Google

One of Google’s bigger projects is its visual multimodal AI assistant, currently called Project Astra. It taps into your smartphone (or smart glasses) camera and can contextually analyze and answer questions on the things it sees. Project Astra can offer silly wordplay suggestions, as well as identify and define the things it sees. A video demo shows Project Astra identifying the tweeter part of a speaker. It’s equal parts impressive and, well, familiar. We tested it out, right here.

Continue reading.

X now treats the term cisgender as a slur

Elon Musk continues to add policy after baffling policy.

The increasingly unhinged world of X (Twitter) now considers the term ‘cisgender’ a slur. Owner Elon Musk posted last June, to the delight of his unhingiest users, that “‘cis’ or ‘cisgender’ are considered slurs on this platform.” On Tuesday, X reportedly began posting an official warning. A quick reminder: It’s not a slur.

Continue reading.

OpenAI co-founder Ilya Sutskever is leaving the company

He’s moving to a new project.

Ilya Sutskever announced on X, formerly Twitter, he’s leaving OpenAI almost a decade after he co-founded the company. He’s confident OpenAI “will build [artificial general intelligence] that is both safe and beneficial” under the leadership of CEO Sam Altman, President Greg Brockman and CTO Mira Murati. While Sutskever and Altman praised each other in their farewell messages, the two were embroiled in the company’s biggest scandal, last year. Sutskever, who was a board member then, was involved in both of their dismissals.

Continue reading.

This article originally appeared on Engadget at https://www.engadget.com/the-morning-after-the-biggest-news-from-googles-io-keynote-111531702.html?src=rss

Everything announced at Google I/O 2024 including Gemini AI, Project Astra, Android 15 and more

At the end of I/O, Google’s annual developer conference at the Shoreline Amphitheater in Mountain View, Google CEO Sundar Pichai revealed that the company had said “AI” 121 times. That, essentially, was the crux of Google’s two-hour keynote — stuffing AI into every Google app and service used by more than two billion people around the world. Here are all the major updates that Google announced at the event.

Gemini 1.5 Flash and updates to Gemini 1.5 Pro

Google

Google announced a brand new AI model called Gemini 1.5 Flash, which it says is optimised for speed and efficiency. Flash sits between Gemini 1.5 Pro and Gemini 1.5 Nano, which its the company’s smallest model that runs locally on device. Google said that it created Flash because developers wanted a lighter and less expensive model than Gemini Pro to build AI-powered apps and services while keeping some of the things like a long context window of one million tokens that differentiates Gemini Pro from competing models. Later this year, Google will double Gemini’s context window to two million tokens, which means that it will be able to process two hours of video, 22 hours of audio, more than 60,000 lines of code or more than 1.4 million words at the same time.

Project Astra

Google

Google showed off Project Astra, an early version of a universal assistant powered by AI that Google’s DeepMind CEO Demis Hassabis said was Google’s version of an AI agent “that can be helpful in everyday life.”

In a video that Google says was shot in a single take, an Astra user moves around Google’s London office holding up their phone and pointing the camera at various things — a speaker, some code on a whiteboard, and out a window — and has a natural conversation with the app about what it seems. In one of the video’s most impressive moments, the correctly tells the user where she left her glasses before without the user ever having brought up the glasses.

The video ends with a twist — when the user finds and wears the missing glasses, we learn that they have an onboard camera system and are capable of using Project Astra to seamlessly carry on a conversation with the user, perhaps indicating that Google might be working on a competitor to Meta’s Ray Ban smart glasses.

Ask Google Photos

Google

Google Photos was already intelligent when it came to searching for specific images or videos, but with AI, Google is taking things to the next level. If you’re a Google One subscriber in the US, you will be able to ask Google Photos a complex question like “show me the best photo from each national park I’ve visited" when the feature rolls out over the next few months. Google Photos will use GPS information as well as its own judgement of what is “best” to present you with options. You can also ask Google Photos to generate captions to post the photos to social media.

Veo and Imagen 3

Google

Google’s new AI-powered media creation engines are called Veo and Imagen 3. Veo is Google’s answer to OpenAI’s Sora. It can produce “high-quality” 1080p videos that can last “beyond a minute”, Google said, and can understand cinematic concepts like a timelapse.

Imagen 3, meanwhile, is a text-to-image generator that Google claims handles text better than its previous version, Imagen 2. The result is the company’s highest quality” text-to-image model with “incredible level of detail” for “photorealistic, lifelike images” and fewer artifacts — essentially pitting it against OpenAI’s DALLE-3.

Big updates to Google Search

Google

Google is making big changes to how Search fundamentally works. Most of the updates announced today like the ability to ask really complex questions (“Find the best yoga or pilates studios in Boston and show details on their intro offers and walking time from Beacon Hill.”) and using Search to plan meals and vacations won’t be available unless you opt in to Search Labs, the company’s platform that lets people try out experimental features.

But a big new feature that Google is calling AI Overviews and which the company has been testing for a year now, is finally rolling out to millions of people in the US. Google Search will now present AI-generated answers on top of the results by default, and the company says that it will bring the feature to more than a billion users around the world by the end of the year.

Gemini on Android

Google

Google is integrating Gemini directly into Android. When Android 15 releases later this year, Gemini will be aware of the app, image or video that you’re running, and you’ll be able to pull it up as an overlay and ask it context-specific questions. Where does that leave Google Assistant that already does this? Who knows! Google didn’t bring it up at all during today’s keynote.

There were a bunch of other updates too. Google said it would add digital watermarks to AI-generated video and text, make Gemini accessible in the side panel in Gmail and Docs, power a virtual AI teammate in Workspace, listen in on phone calls and detect if you’re being scammed in real time, and a lot more.


Catch up on all the news from Google I/O 2024 right here!

This article originally appeared on Engadget at https://www.engadget.com/everything-announced-at-google-io-2024-including-gemini-ai-project-astra-android-15-and-more-210414580.html?src=rss

Google builds Gemini right into Android, adding contextual awareness within apps

Google just announced some nifty improvements to its Gemini AI chatbot for Android devices as part of the company’s I/O 2024 event. The AI is now part of the Android operating system, allowing it to integrate in a more comprehensive way.

The coolest new feature wouldn’t be possible without that integration with the underlying OS. Gemini is now much better at understanding context as you control apps on the smartphone. What does this mean exactly? Once the tool officially launches as part of Android 15, you’ll be able to bring up a Gemini overlay that rests on top of the app you’re using. This will allow for context-specific actions and queries.

Google gives the example of quickly dropping generated images into Gmail and Google Messages, though you may want to steer clear of historical images for now. The company also teased a feature called “Ask This Video” that lets users pose questions about a particular YouTube video, which the chatbot should be able to answer.

It’s easy to see where this tech is going. Once Gemini has access to the lion’s share of your app library, it should be able to actually deliver on some of those lofty promises made by rival AI companies like Humane and Rabbit. Google says it's “just getting started with how on-device AI can change what your phone can do” so we imagine future integration with apps like Uber and Doordash, at the very least.

Circle to Search is also getting a boost thanks to on-board AI. Users will be able to circle just about anything on their phone and receive relevant information. Google says people will be able to do this without having to switch apps. This even extends to math and physics problems, just circle for the answer, which is likely to please students and frustrate teachers.

This article originally appeared on Engadget at https://www.engadget.com/google-builds-gemini-right-into-android-adding-contextual-awareness-within-apps-180413356.html?src=rss

Google expands digital watermarks to AI-made video

As Google starts to make its latest video-generation tools available, the company says it has a plan to ensure transparency around the origins of its increasingly realistic AI-generated clips. All video made by the company’s new Veo model in the VideoFX app will have digital watermarks thanks to Google’s SynthID system.

SynthID is Google’s digital watermarking system that started rolling out to AI-generated images last year. The tech embeds imperceptible watermarks into AI-made content so that AI detection tools can recognize that the content was generated by AI. Considering that Veo, the company’s latest video generation model previewed onstage at I/O, can create longer and higher-res clips than what was previously possible, tracking the source of such content will be increasingly important.

During a briefing with reporters, DeepMind CEO Demis Hassabis said that SynthID watermarks would also expand to AI-generated text. As generative AI models advance, more companies have turned to watermarking amid fears that AI could fuel a new wave of misinformation. Watermarking systems would give platforms like Google a framework for detecting AI-generated content that may otherwise be impossible to distinguish. TikTok and Meta have also recently announced plans to support similar detection tools on their platforms and label more AI content in their apps.

Of course, there are still significant questions about whether digital watermarks on their own offer sufficient protection against deceptive AI content. Researchers have shown that watermarks can be easy to evade. But making AI-made content detectable in some way is an important first step toward transparency.

Catch up on all the news from Google I/O 2024 right here!

This article originally appeared on Engadget at https://www.engadget.com/google-expands-digital-watermarks-to-ai-made-video-175232320.html?src=rss

Google just snuck a pair of AR glasses into a Project Astra demo at I/O

In a video demonstrating the prowess of its new Project Astra app, the person demonstrating asked Gemini "do you remember where you saw my glasses?" The AI impressively responded "Yes, I do. Your glasses were on a desk near a red apple," despite said object not actually being in view when the question was asked. But these weren't your bog-standard visual aid. These glasses had a camera onboard and some sort of visual interface!

The tester picked up their glasses and put them on, and proceeded to ask the AI more questions about things they were looking at. Clearly, there is a camera on the device that's helping it take in the surroundings, and we were shown some sort of interface where a waveform moved to indicate it was listening. Onscreen captions appeared to reflect the answer that was being read aloud to the wearer, as well. So if we're keeping track, that's at least a microphone and speaker onboard too, along with some kind of processor and battery to power the whole thing. 

We only caught a brief glimpse of the wearable, but from the sneaky seconds it was in view, a few things were evident. The glasses had a simple black frame and didn't look at all like Google Glass. They didn't appear very bulky, either. 

In all likelihood, Google is not ready to actually launch a pair of glasses at I/O. It breezed right past the wearable's appearance and barely mentioned them, only to say that Project Astra and the company's vision of "universal agents" could come to devices like our phones or glasses. We don't know much else at the moment, but if you've been mourning Google Glass or the company's other failed wearable products, this might instill some hope yet.

Catch up on all the news from Google I/O 2024 right here!

This article originally appeared on Engadget at https://www.engadget.com/google-just-snuck-a-pair-of-ar-glasses-into-a-project-astra-demo-at-io-172824539.html?src=rss

Google's Project Astra uses your phone's camera and AI to find noise makers, misplaced items and more.

When Google first showcased its Duplex voice assistant technology at its developer conference in 2018, it was both impressive and concerning. Today, at I/O 2024, the company may be bringing up those same reactions again, this time by showing off another application of its AI smarts with something called Project Astra. 

The company couldn't even wait till its keynote today to tease Project Astra, posting a video to its social media of a camera-based AI app yesterday. At its keynote today, though, Google's DeepMind CEO Demis Hassabis shared that his team has "always wanted to develop universal AI agents that can be helpful in everyday life." Project Astra is the result of progress on that front. 

What is Project Astra?

According to a video that Google showed during a media briefing yesterday, Project Astra appeared to be an app which has a viewfinder as its main interface. A person holding up a phone pointed its camera at various parts of an office and verbally said "Tell me when you see something that makes sound." When a speaker next to a monitor came into view, Gemini responded "I see a speaker, which makes sound."

The person behind the phone stopped and drew an onscreen arrow to the top circle on the speaker and said, "What is that part of the speaker called?" Gemini promptly responded "That is the tweeter. It produces high-frequency sounds."

Then, in the video that Google said was recorded in a single take, the tester moved over to a cup of crayons further down the table and asked "Give me a creative alliteration about these," to which Gemini said "Creative crayons color cheerfully. They certainly craft colorful creations."

Wait, were those Project Astra glasses? Is Google Glass back?

The rest of the video goes on to show Gemini in Project Astra identifying and explaining parts of code on a monitor, telling the user what neighborhood they were in based on the view out the window. Most impressively, Astra was able to answer "Do you remember where you saw my glasses?" even though said glasses were completely out of frame and were not previously pointed out. "Yes, I do," Gemini said, adding "Your glasses were on a desk near a red apple."

After Astra located those glasses, the tester put them on and the video shifted to the perspective of what you'd see on the wearable. Using a camera onboard, the glasses scanned the wearer's surroundings to see things like a diagram on a whiteboard. The person in the video then asked "What can I add here to make this system faster?" As they spoke, an onscreen waveform moved to indicate it was listening, and as it responded, text captions appeared in tandem. Astra said "Adding a cache between the server and database could improve speed."

The tester then looked over to a pair of cats doodled on the board and asked "What does this remind you of?" Astra said "Schrodinger's cat." Finally, they picked up a plush tiger toy, put it next to a cute golden retriever and asked for "a band name for this duo." Astra dutifully replied "Golden stripes."

How does Project Astra work?

This means that not only was Astra processing visual data in realtime, it was also remembering what it saw and working with an impressive backlog of stored information. This was achieved, according to Hassabis, because these "agents" were "designed to process information faster by continuously encoding video frames, combining the video and speech input into a timeline of events, and caching this information for efficient recall."

It was also worth noting that, at least in the video, Astra was responding quickly. Hassabis noted in a blog post that "While we’ve made incredible progress developing AI systems that can understand multimodal information, getting response time down to something conversational is a difficult engineering challenge."

Google has also been working on giving its AI more range of vocal expression, using its speech models to "enhanced how they sound, giving the agents a wider range of intonations." This sort of mimicry of human expressiveness in responses is reminiscent of Duplex's pauses and utterances that led people to think Google's AI might be a candidate for the Turing test.

When will Project Astra be available?

While Astra remains an early feature with no discernible plans for launch, Hassabis wrote that in future, these assistants could be available "through your phone or glasses." No word yet on whether those glasses are actually a product or the successor to Google Glass, but Hassabis did write that "some of these capabilities are coming to Google products, like the Gemini app, later this year."

Catch up on all the news from Google I/O 2024 right here!

This article originally appeared on Engadget at https://www.engadget.com/googles-project-astra-uses-your-phones-camera-and-ai-to-find-noise-makers-misplaced-items-and-more-172642329.html?src=rss