Ever since Sam Altman shared Sora with the world on X [formerly Twitter], the internet community has gone wild exploring its text-to-video capabilities.
As if OpenAI’s ChatGPT wasn’t enough to bring in a revolutionary change in how the world operates, Sora now comes out as one step superior to its Generative AI colleague.
Falling true to all the buzzwords such as groundbreaking tech, immersive experience, and whatnot, Sora does seem an impressive piece of art as of now.
Let’s understand OpenAI Sora and also look into the floating rumors about whether Sora will disrupt the video production industry or not in detail…
What is OpenAI’s Sora?
OpenAI’s Sora is a text-to-video Generative AI model.
Upon entering a text prompt, it will generate a video [up to a minute long] that closely matches your requirements or in plain words – the description.
Here’s a closer look at what it’s exactly about in their product launch video:
So basically, you can consider Sora as a diffusion model built on a transformer architecture, which in layman’s terms means, its functionality is pretty similar to that of its colleague – ChatGPT.
The company says that upon rollout, users will be able to generate life-like videos. As far as the current output shared by OpenAI with the world is concerned, it’s mind-boggling.
Wondering how AI is playing a key role in Animation?
Check out: AI in Animation – Understanding its Role, Benefits & Future
Is Sora AI open to the public?
Well, still waiting! Sora’s creator OpenAI has not revealed its release date yet as per their official website.
But from what is available on their website, the company has made available Sora to the Red Teamers and security researchers to carry out strong assessments concerning potential harms and mitigation of risks.
Since it’s a transformational weapon for creative professionals, OpenAI has also granted access to a few folks from the filmmaking and designing community to gather comprehensive and diverse feedback.
How does Sora work?
The backend functionality of Sora is almost similar to that of the existing AI image generator tools.
Based on the diffusion model concept, Sora when given text prompts reverses the static output generated by AI images, and then finally turns it into an ultra-realistic video.
It has been trained on millions of existing videos and images consisting of detailed image descriptions, alt texts, and other details.
This allows it to establish a correlation between its data and the prompts the users provide it.
However, experts believe that Sora still lacks replication & production of complex body movements, structures, visual elements, smoothness, etc., for which it requires extensive machine learning-based training eventually.
But the most worrying aspect is the source of videos from which Sora is currently being trained. Unfortunately, there’s no information available on the same as of now.
A few days ago, Nvidia’s Research Manager Dr. Jim Fan who also interestingly happens to be OpenAI’s first intern [as per his X [formerly Twitter profile], tweeted something interesting.
Here’s a snapshot of its introductory part [read the complete tweet here]
He broadly believes that Sora’s core training data can be possibly traced to synthetic data, 3D gaming environments, game engine footage, etc.
So the underlining part here is that Sora is being provided with a lot more data and computing to help it grab a thorough understanding of the visual world.
Ultimately, this will help generate better results and avoid silly loopholes in video outputs [more on that later].
If you wish to study in detail how Sora works, Open AI has provided a complete breakdown on its website.
Big Question: Will OpenAI Sora disrupt the Video Production industry?
Later below, I’ve attached OpenAI’s comment – where they said they’re in the process of training Sora about real-life movements, physical interactions, visual streamlining, thermodynamics, etc.
And it sounds pretty much fair enough looking at the bigger picture.
However, I strongly believe that despite Sora’s eye-opening abilities to generate stunning videos, it’s decently far from taking over the industry at large. Here are 2 broad reasons why:
1) Restricted Scope: OpenAI has currently armed Sora with the ability to produce short [60-second] videos. So upon public launch, if you wish to generate longer videos, you won’t be able to.
Also, generating long videos such as product explainers, documentaries, case studies, etc., requires systematic and comprehensive prompt input.
And Sora currently lacks the dynamism to operate on such complex requests, given its training which is still under intense process.
2) Absence of Human Touch: This is crucial not only as far as Sora is concerned but for all the Generative AI tools floating in the market.
Human intervention is becoming more required than ever before for fact-checking purposes and creative inputs.
What has been devised and developed by humans for decades now, cannot suddenly be sidelined due to AI’s emergence.
But there’s more to it from the internet community at large. And why not? With every AI tool launched in the market, there are potential harms that can be associated with them.
Check out them below.
What are Sora’s drawbacks?
For now, OpenAI’s official documentation itself states that Sora is lagging on various output-based parameters.
They’ve also emphasized the deployment and training of their model, allowing it to stand in a stronger position to accurately simulate/generate results close to realistic items.
Talking about a few weaknesses in detail, here are 2 flat examples of silly flaws spotted by internet users:
Check out an excerpt of the result generated by Sora in this analysis video by Varun Mayya. Begin watching the video at 3:40.
Now zoom in and you will locate an unusual movement of the lady’s hand in the background.
Here’s another example of the same in a video shared by the Wall Street Journal’s YT channel. Watch it from [0:12 – 0:15].
While whisking the batter/dough, the grandmother’s hand suddenly disappears, reappears, and then again disappears – all this happening within a sequence of 2-3 seconds.
Now, this sharply rules out the claim that Sora will eventually destroy the video production and “Hollywood” industry because it doesn’t seem so.
But yeah, this is an initial release and it will drastically improve its output quality as and when it is fed more complex training data.
You already saw this coming – Well, a lot of users voiced their concerns on Sora’s launch describing how it comes out as an ultimate destroyer of multiple jobs and a snatcher of livelihood.
Below, a young girl planning to study animation and probably in her early teenage years shares her opinion:
She clearly seems distressed about Sora’s capabilities while there’s more to come. How more destructive – only OpenAI knows!
And these concerns seem fair on human grounds. Here are a few potential consequences of Sora:
- Misinformation: The tool could be used to generate morphed videos of real humans into a false narrative, especially during hearings, political elections, and gatherings to spread fake news.
- Biased Reporting: These AI models are trained on gigantic LLMs and diverse datasets which eventually increases the chances of generating biased output in the form of videos.
- Hateful Content: Again, Sora’s results could be considered as cornered toward a particular community, geography, etc. One would argue that it ultimately depends upon the prompts provided to it but considering the data on which the tool is being trained, it has a lot to choose from to generate results.
- Job Displacements: An initial look at the teaser shared by OpenAI hints at how Sora carries the capacity to shut down video production studios and eat up the bread and butter of motion graphic artists, creative professionals, and small creators.
Well, it doesn’t seem that the guessed outcomes will come true sooner. The reason being – OpenAI itself has commented specifically on the “Weaknesses” of the video results generated by the given prompts.
Check out the highlighted part in the image below:
They’ve clearly stated that Sora is currently falling behind in producing interactions about the real world.
This provides us an idea about how they’re going up and about training their text-to-video model and expecting it to produce near-to-real results in the days to come.
Conclusion
To say the least, it would be too early to thoroughly analyze and comment on Sora’s capabilities as it’s still undergoing training from datasets.
However, whatever has been shared publicly by OpenAI, seems promising and revolutionary.
Only time will tell us how far AI will expand its wings, intervene with our daily operations, and occupy an important slot in our lives.