Dionysios Basdanis

Software Engineer

Beyond the Red Carpet: Predicting Movie Success with Machine Learning

Thesis Project | Grade: 10/10


Data has a story to tell, and in the world of cinema, that story is often written in ratings, awards, and plot tropes. For my Integrated Master of Science thesis at the University of Thessaly, I developed the PMMPS (Predictive Model of Motion Picture Success) to see if an algorithm could determine a film's score before the curtains even rose.


The Dataset: Merging IMDb and the Academy

To build a robust model, I combined two distinct databases to capture both popular opinion and critical prestige:

  • IMDb: A massive collection of graded films and metadata.
  • The Oscars: A specialized database of Academy Award winners for both actors and directors.

By merging these, I could weigh a film's potential not just on its genre, but on the historical success and reputation of its cast and crew.


Approach 1: Visual Analysis & Core Influencers

Initially, I performed a deep dive into the raw data. Through visual graphing, the primary influencers became clear: the actors, directors, genre, and even image analysis were the strongest indicators of whether a film would be successful.


Approach 2: Enhancing Accuracy with PCA

I applied Principal Component Analysis (PCA) to create new features—mathematical combinations of the initial ones. This dimensionality reduction helped filter out noise and notably improved the percentage of correct predictions.


Approach 3: Feature Weighting & Plot Keywords

In the final approach, I introduced plot keywords and studied the coefficients of various factors. By combining keyword data with the Oscar database, I could see exactly how much weight a "Star Director" or a specific "Theme" carried in the final classification.


Conclusion

The results of the weighted approach were remarkably close to the PCA method. Both proved that while film success feels like "magic," there are deeply embedded patterns in the data that can predict a movie's reception with surprising accuracy. This project served as my foundation for data-driven decision-making—a principle I carry into my work with cloud-native systems today.