In this study we take a look at the problem of gender representation in the world of cinema by analyzing comprehensive datasets on movies, tropes and characters in movies. Our analytical approach begins by the categorization of movie tropes through gendered lenses. We identify trends of the most prominent tropes and their evolution over time. Complementing our trope analysis, we conduct survival analysis techniques to examine the career longevity of individuals in the film industry illustrating the gender-based dynamics. Last but not least, we examine if the genders of the actors in the cast of a movie affects the people's opinion when rating the movie.

🎬 Welcome to 'Beyond the Screen' 🎬

Gender Disparity in Cinema's Narrative

Cinema has for a long time been a reflection of our society's, norms and values. When we think about movies, it's not just the blockbuster or avant-garde masterpieces that come to mind, but also the unique characters and performances that stick with us.
This disparity extends beyond our screens, the characters we see and the stories we hear, but they are rooted to the very core of the industry. It can be identified in the offered roles, the narratives showcased, and the longevity of careers are often influenced by gender. The impact of these inequalities is not just a matter of representation, it shapes careers, defines opportunities, and influences the stories that reach our screens. Our journey through this analysis will unravel these layers. We will explore how gendered tropes, despite being fictive and created for the screen, have real-life implications. These tropes are not just artistic choices; they often reflect and reinforce societal biases. From the archetypal damsel in distress to the fearless action hero, the effect of these characters is not just to merely entertain, but shape perceptions and expectations.


🎥 The Scale of Imbalance 🎥

The existing gender disparity can be observed on the actor population. Based the actors population of various movies from different genres, dates and nationalities, it was observed a uneven proportion of actresses in absolute number.

What's behind this imbalance? To find answers, we're exploring the world of movie tropes. Could it be that the types of characters they're cast as — the heroes, the villains, the sidekicks — play a part in this disparity?

Our next scene delves deeper into the script of gender disparity. We review the world of cinema to understand how these tropes might tip the scales of gender equality. Are the stories told on screen a reflection of the scripts written in the industry or are they re-writing the narrative themselves?


🎭 Genderedness of Tropes 🎭

As any other popular media, films reflect societal biases and perspectives through the use of tropes. These are narrative elements describing stereotypical characters and plot arcs that are common across movies. Tropes range from classics like 'dumb blonde' and 'the Casanova' to the 'Absent minded professor'.

In our study, drawing inspiration from Ghala et al.'s methodology, we assessed the genderedness of 27,000 narrative tropes in movies by giving them a score to classify them as male, female, or unisex.

Delving into the themes

Having assigned each trope to a class (male, female or unisex), we asked ourselves: what stories are these gendered tropes telling? To answer this, we conducted a Latent Dirichlet Allocation (LDA) analysis on all our tropes to identify the most prevalent themes among them.

Female trope analysis Male trope analysis

Our results revealed an interesting set of patterns:

Female Themes: Our analysis showed that tropes classified as female fall under one of two categories: "Apparance and Styling" or "Family" with significant emphasis on themes of relationships, motherhood, and physical attributes. In fact the most probable category female tropes fall under is that of "Family":

This analysis presented us with an indication of bias as it suggested that perhaps female characters may be primarily represented through their relationships with others and in particular through family bonds. The prevalence of the "Appearance and Styling" them also made us think whether such tropes could limit womens' representation in film.

Male Themes: The topics identified to describe our male tropes were made up of a "General/Miscellaneous" category including non-specific, diverse terms and an "Action and Adventure" category containing words related to superhero and fantasy films as well as war and action. Although the general/miscellaneous theme may not seem highly coherent and interpretable, the number of topics was not changed as it resulted in stable and reproducible LDA results.

The more general nature of the dominant topic in male tropes may suggest a wider diversity of roles attributed to male characters. At the same time, the action and adventure theme may indicate an association of male characters with action-packed, adventurous, and powerful roles. Although we identified trends that may reveal an aspect of gender bias in the film industry, we had to ask ourselves: are these results significant?

In reality the vast majority of our tropes in the corpus have been classified as unisex by our model:

In fact, looking at the proportions of tropes representing the top 10 most frequent genres across our movies we notice that unisex tropes make up more than 85% of tropes across all genres:

Indeed our model was limited by its simplicity and the inherent uncertainty of our score discretisation to gender labels. Due to this, certain tropes may fall in a class that does not accurately represent them. More specifically, although we aimed to discretise our genderedness scores by maximizing the overall F1-score in the classification process, the result produces a relatively low precision for unisex (0.41) and male tropes (0.63). At the same time, as we did not distinguish between tropes tropes that refer to entire movie plots and those that concern specific characters, drawing conclusions about gender becomes even more challenging.

Hence, it is suggested that should this project be revisited, a more sophisticated approach to trope classification should be used. This could involve the use of a more reliable machine learning classification algorithm. Future work should focus on properly classifying the tropes are character specific or movie specific as well as identifying the characters they refer to and their sex. This would allow an analysis of a smaller and potentially more informative subset of tropes from which direct and more robust conclusions about gender bias in film could be drawn. A clustering algorithm could then be trained to more effectively associate character tropes to their 'gender'.

So are movie tropes free of gender bias?

Are we witnessing the end of gender bias in cinema? Not quite. Despite the prevalence of unisex tropes, the nuances of gender representation are still skewed. Our subsequent survival analysis on actors' careers further underscores this, revealing stark disparities in career longevity between male and female actors.


🎞️ Career Longevity in Cinema: 🎞️

A Survival Analysis Perspective

Let's take a step back and look at the big picture — how long do actors stay in the spotlight, and does it differ for men and women? Our investigation peeks behind the curtain to reveal the ebb and flow of acting careers through the lens of gender. Imagine a snapshot showing the length of actors' careers with two colors: one for men and one for women. What we see is that men generally have a slightly longer stay in the world of cinema. But it's not just about who stays longer; it's about the range of career lengths. Men's careers seem to have more variation — some have brief cameos while others enjoy enduring lead roles.

The Kaplan-Meier curve we present below is used to measure the survival probability over time. Each step down represets an event causing the survival probability to drop. In our case, the curves represent the career length of male and female actors. Initially, the survival curves for both genders are nearly identical suggesting a similar probability of career survival in the early career stages. As time evolves the likelihood begins to differ, showcasing a lower career longevity for female actors in comparison to male actors. While the differences may seem subtle, they are highly significant with a p-value of 2.4e-21. This metric tells us that the observed differences in career length survival between males and females did not occur by chance.

Does Gender Steer the Story?

While numbers don't lie, they don't tell the whole story either. They don't say who will have the longest career or why. But they do suggest a pattern — one that we see when we look at who's still active in films and who's not. In our story, 'active' means having been in a movie in the last ten years. Those who have, take the stage as our 'non-censored' actors. They're the majority, showing that many actors' stories are still being written.

What if we could play detective and guess an actor's gender just by looking at their career details? Given the patterns we've seen, it seems possible. Could the length of an actor's career, the roles they play, and their activity give us clues?

We took on the challenge of guessing actor's genders based on parameters such as the number of movies they played in, their height or their most recurrent movie genre. Our quest yielded average results — a 66% accuracy across four different models. This is a starting point, but it suggests room for improvement. Perhaps there are confounders we haven't considered or aspects of the data that need refining. One technical obstacle we faced was the large amount of categorical data, which required one-hot encoding to be inserted into our models. This process can influence the interpretation of certain data types when it comes to classification tasks. It's possible that this impacted our models' ability to predict accurately.


✨ The influence of the gender ✨

Above we found a possible pattern that gender may influenece the career of the actor. If male actors possibly have longer careers, is it because they produce movies better liked by the viewers? So finally, it is time to address the elephant in the room, does gender play a role in the ratings that people give to the movies. To find if that is the case, we are again using the fraction of female actors in the movie and how that influences the rating. The naïve analysis conducted shows that that there is a small but significant negative correlation between the female fraction of actors in the movie and the average rating. We can also see again that the female fraction of actors is generally below 0.35. This could be for a lot of reasons, for example people might like genres or that are best suited for male actors, or these genres are more carefully made. Since we all know that different aspects of the movie can influence the rating we give to them, we also conducted a naive analysis between different movie characteristics to find what could be the most probable influencers. This analysis showed that possible confounders could be the movie runtime, the year the movie was released, the genre of the movie, the language of the movie, even the number of votes. On the other hand characteristics tested like the average age of the cast of the movie didn't show a significant correlation with the movie rating.

Characteristic Spearman Correlation p-value
Female Fraction -0.40 0.00
Average age 0.04 0.72
Runtime 0.47 0.00
Release year -0.78 0.00
Number of votes 0.22 0.00

Is this really the case?

Until now, we were looking at these cases naively and saw that many confounders can affect the average rating of a movie. To make sure that the fraction of female actors affects the average rating, we need to isolate its effect, that is we are going to perform a causal analysis. To do that we use a treatment and a control group. The treatment group is defined as the movies that have more female than male actors in their cast, whereas the control group is the opposite. So, we used propensity score matching for the above confounders that showed some correlation with the average rating. After matching we can see that the distribution of the average rating between the two groups is very similar, with the treatment group (more females than males in a movie) having a bit more probability to have around 4.5-7 and the control group just a touch higher probability above 7. The correlation statistics show in the end that the fraction of female actors in a movie has little to no correlation to the average rating. So people's opinion for a movie does not seem to change based on the gender of the actors in the movie - What a relief!

🔍 What are the take away messages? 🔍

Our exploration into cinema reveals a complex interplay of gendered tropes and their real-world impacts. We've found distinct disparities in the representation and portrayal of genders, with prevalent stereotypes shaping societal views. Our survival analysis shows a notable career longevity gap between male and female actors, highlighting the industry's challenges and the need for inclusive opportunities. Contrary to myths, our causal analysis shows that female actors do not negatively impact movie ratings, challenging long-standing biases and advocating for balanced casting.