Apple releases an AI model that can edit images based on text-based commands

Key Takeaways:

– Apple has developed an open source AI model called MLLM-Guided Image Editing (MGIE) for image editing.
– MGIE uses multimodal large language models (MLLMs) to interpret text-based commands for manipulating images.
– The tool can edit photos based on the text input from the user, transforming simple or ambiguous prompts into clear instructions.
– MGIE can make major changes to images, crop, resize, rotate, and improve brightness, contrast, and color balance, all through text prompts.
– It can also edit specific areas of a photo, such as modifying hair, eyes, and clothes or removing elements in the background.
– Apple collaborated with researchers from the University of California, Santa Barbara to develop MGIE.
– The model has been released through GitHub, and a demo is available on Hugging Face Spaces.
– Apple’s plans for incorporating MGIE into its products or tools have not been announced.

engadget:

Apple isn’t one of the top players in the AI game today, but the company’s new open source AI model for image editing shows what it’s capable of contributing to the space. The model called MLLM-Guided Image Editing (MGIE), which uses multimodal large language models (MLLMs) to interpret text-based commands when manipulating images. In other words, the tool has the ability to edit photos based on the text the user types in. While it’s not the first tool that can do so, “human instructions are sometimes too brief for current methods to capture and follow,” the project’s paper (PDF) reads.

The company developed MGIE with researchers from the University of California, Santa Barbara. MLLMs have the power to transform simple or ambiguous text prompts into more detailed and clear instructions the photo editor itself can follow. For instance, if a user wants to edit a photo of a pepperoni pizza to “make it more healthy,” MLLMs can interpret it as “add vegetable toppings” and edit the photo as such.

Apple

In addition to changing making major changes to images, MGIE can also crop, resize and rotate photos, as well as improve its brightness, contrast and color balance, all through text prompts. It can also edit specific areas of a photo and can, for instance, modify the hair, eyes and clothes of a person in it, or remove elements in the background.

As VentureBeat notes, Apple released the model through GitHub, but those interested can also try out a demo that’s currently hosted on Hugging Face Spaces. Apple has yet to say whether it plans to use what it learns from this project into a tool or a feature that it can incorporate into any of its products.

Source link

AI Eclipse TLDR:

Apple has developed a new open-source AI model called MLLM-Guided Image Editing (MGIE) for image editing. The model uses multimodal large language models (MLLMs) to interpret text-based commands when manipulating images. This means that users can edit photos by simply typing in text instructions. The company collaborated with researchers from the University of California, Santa Barbara to develop MGIE. MLLMs have the ability to transform simple or ambiguous text prompts into detailed and clear instructions that the photo editor can follow. For example, a user can request to make a photo of a pepperoni pizza “more healthy,” and the MLLMs can interpret it as “add vegetable toppings” and edit the photo accordingly. MGIE can also perform other editing tasks such as cropping, resizing, rotating, and adjusting brightness, contrast, and color balance. It can even modify specific areas of a photo, like hair, eyes, and clothes, or remove elements from the background. Apple has released the MGIE model through GitHub and a demo is available on Hugging Face Spaces. However, it is unclear whether Apple plans to incorporate this technology into its products.