Artificial intelligence that can make video movies: Who is Sora?

Artificial intelligence, which has entered every aspect of our lives with the innovations it has made and will continue to make, has begun to challenge our perception of reality. "Sora", the latest program introduced by OpenAI, seems to break many routines, especially in the cinema industry.

Sora, which is still in the testing phase, can perceive text commands and produce highly realistic and creative 60-second videos for now.

Before explaining the details of how Sora did this, let's briefly touch on the working principle of the video. The video is actually a visual unity created by consecutive single photographic frames. The main point to focus on here is that the video has a working principle that delivers this successive image to the viewer in a dimension that can continue in a spatial and temporal line. It is clear that the key element in this process is photo frames; there are already artificial intelligence applications that can produce photos (Starry AI, Deep AI). So, is it not possible for a trained artificial intelligence that can produce photographs to produce a continuously flowing image, just like in a video? Of course, it is conceivable, but the really important question here is "How?" The answer is very simple: “With education.” The basis of all artificial intelligence applications is a principle called deep machine learning. Those who create these programs provide training to the artificial intelligence they produce. As far as we understand, Sora is a program trained to convert photo frames into images on a logical plane, just like video does.

THIS TIME IT'S NOT A HUMAN BEING SITTING IN THE DIRECTOR'S SEAT

Sora has a database containing millions of images.

On February 15, 2024, OpenAI announced a text-to-video model named Sora, which it plans to release to the public at an unspecified date. It is currently available for red teams for managing critical harms and risks.

If we examine the technical aspect of the matter a little more, we need to talk a little more about how artificial intelligence obtains a photograph. This is partly a matter of training the artificial intelligence we mentioned above. We mentioned that the system has a very large database. This database is also a center where artificial intelligence is fed. The system then gradually adds noise to these photographs and thus learns how to transform the photograph into a pure image. We can compare the system's addition of noise to photographs, causing distortion in the image, to the tingling on TVs. This is also a disorder (entropy). When a photo of a "black-furred bulldog dog" is requested, the system uses its database to spread a very high-noise photo until it reaches the pixel of the requested photo, in other words, it actually obtains a complex photo. As can be seen, after a very technical process, artificial intelligence obtains a single-frame photo. What Sora has achieved is to turn these photos into a fluent video.

Sora, which means sky in Japanese, promises us a brand new world. A copy of reality enters our lives indistinguishable from reality. This has many advantages as well as major disadvantages. Most likely, OpenAI also did not release the program for free circulation to minimize the risk of these drawbacks.

The program, which can only produce short-term videos, apparently will be able to produce longer videos. Many sectors may be affected positively or negatively by this situation, but if we remember the strike of the set workers in Hollywood, it is an undoubted fact that the medium that the invention will most deeply affect will be cinema.

As humanity, we are in a great development. While hundreds and thousands of years of experience make our lives easier, it also triggers our nature to approach new developments with fear. We must not forget that humans are creatures that cannot survive without development, and how these developments will affect us depends on how we handle them.