Exploring the Controversy of Mulan (2020) with the NYT Movie Review APIs

Published in

Web Mining [IS688, Spring 2021]

13 min readJan 31, 2021

Note: A copy of this story was featured on the Medium publication, Nerd for Tech! Removed it due to the duplicate article rule, but still here for reading pleasure!

If you’re like me, when you heard about the remake of Disney’s Mulan (2020), you may have been excited to hear that it featured an all Asian cast and the film was meant to be a more accurate depiction of Hua Mulan. Because while I adore Disney’s 1998 release of Mulan, I was open to the idea of seeing a film that may better portray Asian, and specifically Chinese culture. Accurate representations of different groups that aren’t your everyday white, straight, cisgender people is long overdue. Initially I was excited and looked forward to seeing this film.

Of course, when the film was released on Disney+, controversy sparked. This article on The Guardian relays an overview of the controversy of the film, including “Crystal” Liu declaring her support for the Hong Kong police after their acts of police brutality on Hong Kong protestors back in 2019 as well as part of the movie being filmed in Xinjiang (an area where it’s believed that many Muslims are being held at internment camps). This started the hashtag, #BoycottMulan, and upon hearing about this I thought about not watching it after all, but still considered because there were many other Asian talents who worked on the film.

With the New York Times Movie Reviews API, I decided to look into the actual reviews of this movie. The NYT has APIs available to use with its movie reviews, book reviews, best seller lists, article archive, and so on. The NYT is considered to be a reliable source so I believe using the critics’ reviews of Mulan will help provide further insight into any other controversies that lie within the film as well as maybe why I should just stick to watching the 1998 version time to time. And the IMDb rating for the movie (5.6/10 stars) certainly isn’t helping the case either. Representation matters, but representation should be presented right. For anyone who identifies as Chinese or Asian or any other minority group we have to point out when representation is not done correctly–this likely isn’t the picture we want painted of us in society and we have every right to criticize it as such. But past the drama outside of the film, can we consider the 2020 release of Mulan to be something worth watching? Using the NYT API, I do some exploratory content mining analysis between a film review of the 2020 release of Mulan as well as the 1998 release of Mulan.

For this exploratory analysis, I used Python in a Jupyter Notebook. I also used the following libraries: pynytimes [imported NYTAPI], requests, json, pandas, matplotlib, json, nltk.tokenize [imported RegexpTokenizer], nltk.corpus [imported stopwords], nltk.probability [imported FreqDist], bs4 [imported BeautifulSoup], and vaderSentiment [imported SentimentIntensityAnalyzer].

The Process

The following is an explanation of how I eventually get to the end results (discussed below).

Get API Access: The New York Times provides several APIs for much of the information it provides online including news articles, best seller lists, book reviews, and movie reviews. An account with the NYT is required for API access and a key will be given upon activating which APIs you’d like to use for a project. In this case, I chose to work with the Movie Reviews API. Exploring this API took some time and there’s lots of great things to explore here.
Use Requests Library to Obtain Data On Movie Reviews: Since I’m looking at Mulan, I needed to retrieve the information necessary for content mining those reviews. In Python, I first imported all the libraries above including requests. Below I show what the code for executing this step looks like. The first line of code showcases the basic format of searching for movie reviews by title.

movie_review_search = "https://api.nytimes.com/svc/movies/v2/reviews/search.json?query=mulan&api-key=insert_api_key_here"request = requests.get(movie_review_search)

3. Dump Content Into JSON And Then a Pandas Data Frame: After getting the API request for the movie reviews, I take that data and put it first into a JSON and then into a Pandas data frame for easier viewing. The json and pandas libraries make this possible. What that data frame looks like is also shown.

movie_review = request.json()
movie_review = json.dumps(movie_review)
review_data = json.loads(movie_review)
review_df = pd.json_normalize(review_data['results'])review_df

Note: not all columns are shown in this screenshot.

As you can see, we find movie reviews for the two different Mulan films and even a third film that is not directed by Disney, but is about the tale of Hua Mulan. Evidently, I only focus on the first two indexes of the data frame. At the very right of the screenshot, we can see the URLs which is what we need to extract.

4. Pull the URLs of The Articles From the Table: We have data on the movie reviews, but not the actual movie reviews yet. Hence, I decide to extract the two URLs necessary to access those reviews. This is simply done by filtering out the column of the data frame and then selecting which indexes I would look to view. Something that I would have wanted out of this data frame was potentially a brief summary of the critic’s review or even better: the review itself. Still, I make do with what’s available.

5. Use Beautiful Soup to Extract the Movie Review Content: This is where some web scraping comes in. While the NYT API provides a lot of sufficient information on movie reviews such as the critic, date, and headline, I still needed the actual content of the review. Hence, using the bs4 library, I web scraped the content of that movie review’s URL. Below is the code used for scraping the movie review of the 2020 release of Mulan. Then, if I left the process at “results”, I would have a lot of unwanted content since when web scraping, all of the raw HTML documentation is shown. This is very messy to look through and not the easiest to parse through in my opinion. So to clean this content, I created a blacklist of HTML tags to filter out of my output as well as any CSS elements. This was one of my main obstacles in parsing the content, but some online investigation aided in doing this. But what I’m left with is the actual text of the article. Of course, there is some other text left in the output that isn’t the review article. Normally, I would further clean the output to exclude any extra information, but for simplicity purposes, I’m able to find where the actual review is by skimming through the content. This is only appropriate because I’m only working with two web scraped pages.

from bs4 import BeautifulSoupurl1 = 'https://www.nytimes.com/2020/09/03/movies/mulan-disney-review.html'
page = requests.get(url1)
soup = BeautifulSoup(page.content, 'html.parser')
text = soup.find_all(text=True)
results = soup.find(id='story')

output = ''
blacklist = [
    '[document]',
'noscript',
    'header',
    'html',
    'meta',
    'head', 
    'input',
    'script',
    'style',
]for t in text:
    if t.parent.name not in blacklist:
        output += '{} '.format(t)

6. Tokenize the Text of the Movie Review: Now that I’ve got the content of the article, it’s time to tokenize the material. First I use the lower() function on the string to make everything lowercase. Then the library, nltk.tokenize, makes tokenizing easy to do as seen below. Tokens are then appended into an empty list.

review_2020 = desired_string_content_goes_herefrom nltk.tokenize import RegexpTokenizer
space_tokenizer = RegexpTokenizer('\w+',gaps=False)
space_tokens = []review_2020 = review_2020.lower()
space_tokenizer.tokenize(review_2020)
space_tokens.append(space_tokenizer.tokenize(review_2020))

7. Clean the Tokens By Removing Stop Words: Using nltk.corpus, I imported English stop words to filter them out of the token list. This removes words that aren’t very meaningful like “the”, “a”, “an,” “he”, “they”, etc. Example code for that is below.

from nltk.corpus import stopwords
stop_words = set(stopwords.words('english'))  
cleaned_up_tokens = []
for w in space_tokens:
    for word in w:
        if word not in stop_words:  
            cleaned_up_tokens.append(word)

8. Finding The Most Common Tokens and Plotting Them: With a complete filtered list of tokens, I now can look at what the most common tokens were in this article and if there were any significant ones. The nltk.probability library has an fdist() function for locating the most common tokens. In this case, I decide to look at the twenty most common tokens and plot them (using matplotlib). The lists and plots for the most common tokens are in the Results section of this article.

from nltk.probability import FreqDist
fdist = FreqDist(filtered_sentence)
print(fdist.most_common(20))fdist.plot(20,cumulative=False,title="Mulan 2020 Tokens")
plt.show()

9. Use vaderSentiment to Find Polarity Sentiment Scores of the Review: Finally, I used the vaderSentiment library to get some extra context on the sentiment of the reviews overall as well as sentiment of sentences within those reviews. An issue I came across this was splitting the content of the review into sentences and deciding how to do so. I struggled with how to split the string of the review and eventually ended up with the re.split() function below that splits the string by periods. The results of this is given in polarity scores, which will be explained further below.

from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzersid.polarity_scores(rev_2020)split_string_2020 = re.split("(?<!\w\.\w.)(?<![A-Z][a-z]\.)(?<=\.|\?)\s", rev_2020)
dict_list_2020 = []
for sentence in split_string_2020:
    result = sid.polarity_scores(sentence)
    dict_list_2020.append(result)
    print(sentence)
    print(result)
print(dict_list)

Results

Below are two plots showcasing the frequency of tokens used in the movie reviews for the 1998 release of Mulan and 2020 release of Mulan respectively. To no surprise, “mulan” is the most common token, but past that, there are differences in the tokens used. It’s evident that some other tokens aren’t particularly useful such as “disney”, “one”, “show”, or “film”/“movie”. Of course, we’re aware that the 1998 version of Mulan is animated and a musical (as indicated by two tokens below) and that the 2020 version is not. If not for my having seen the films before, I likely would have been able to infer that information based off the tokens below. While this is interesting, it doesn’t necessarily answer the question at hand. However, one interesting thing about the 2020 tokens is that “xianniang” occurs eight times. “xianniang” is the name of a character in the film and appears at first glance to be an important aspect of the movie. My own personal understanding is that Xianniang was a new addition to the film and because they weren’t in the animated film, this somehow warranted some discussion in the review.

Overall, these tokens don’t add much to what is already known about these two films or provide any further insight to how much the 2020 version is worth seeing.

1998 Release–Most Common Tokens
[('mulan', 12), ('disney', 7), ('film', 5), ('see', 4), ('voice', 4), ('takes', 3), ('animation', 3), ('films', 3), ('time', 3), ('musical', 3), ('character', 3), ('little', 3), ('gender', 2), ('war', 2), ('exclaims', 2), ('show', 2), ('female', 2), ('warrior', 2), ('comes', 2), ('right', 2)]

2020 Release–Most Common Tokens
[(‘mulan’, 23), (‘movie’, 11), (‘one’, 9), (‘xianniang’, 8), (‘disney’, 7), (‘china’, 6), (‘like’, 6), (‘ballad’, 5), (‘men’, 5), (‘family’, 4), (‘action’, 4), (‘war’, 4), (‘character’, 4), (‘original’, 4), (‘women’, 4), (‘khan’, 4), (‘life’, 4), (‘new’, 3), (‘appears’, 3), (‘takes’, 3)]

Thus, I turn to the other side of this analysis: the sentiment and polarity scores. The following code is an example of how the sentiment results work. The last line displays the polarity scores for the sentence, “The movie was awesome.”: a sentence that is evidently positive. We can see that in the output that we have ‘neg’ (negative sentiment), ‘neu’ (neutral sentiment), ‘pos’ (positive sentiment), and ‘compound’ (total of neg, neu, and pos). There are many details to how this was developed and can be explored here, but simply put in this example, this sentence has a 0% negative score, 42.3% neutral score, and 57.7% positive score. [Side note: The negative, neutral, and positive scores should all add up to 100% (or relatively close to it).] This means the sentence is considered more positive than. Furthermore, as the compound score is 0.6249. The compound score is normalized on a scale between -1 (most extreme negative) and +1 (most extreme positive). In this case, a score of 0.6249 is closer to +1 and is considered more positive as a whole.

example = 'The movie was awesome.'
sid = SentimentIntensityAnalyzer()
sid.polarity_scores(example)##Output##
{'neg': 0.0, 'neu': 0.423, 'pos': 0.577, 'compound': 0.6249}

Here are some of the most notable polarity scores I found in both films that help provide insight into the question.

1998 Release–Notable Sentences and Their Sentiment Scores
Overall: {‘neg’: 0.09, ‘neu’: 0.807, ‘pos’: 0.103, ‘compound’: 0.9303}
1) “sign me up for the next war!’’ exclaims the heroine’s grandmother, in a show of what does not precisely qualify as progress for women.
{‘neg’: 0.145, ‘neu’: 0.759, ‘pos’: 0.097, ‘compound’: -0.3382}
2) “who spit in her bean curd?’’ wisecracks one character in this supposedly eastern fable, which often seems about as chinese as chop suey.
{‘neg’: 0.056, ‘neu’: 0.944, ‘pos’: 0.0, ‘compound’: -0.0772}
3) though the plot and setting present exotic new opportunities for the filmmakers, the china of ‘mulan’ has surprisingly little depth of field or background detail.
{‘neg’: 0.0, ‘neu’: 0.827, ‘pos’: 0.173, ‘compound’: 0.5859}
4) as directed by two first-time feature filmmakers, barry cook and tony bancroft, the film works harder at staging computer-enhanced battle scenes with swooping camera movement and loads of extras, more little figures than its technical prowess can accommodate.
{'neg': 0.066, 'neu': 0.934, 'pos': 0.0, 'compound': -0.3818}
5) and they are not especially musical; the fluid choreography of vastly better-made films like last year’s “hercules’’ is strikingly absent this time, so images just seem busily piled on while music blares.
{‘neg’: 0.0, ‘neu’: 0.925, ‘pos’: 0.075, ‘compound’: 0.3612}
2020 Release–Notable Sentences and Their Sentiment Scores
Overall: {'neg': 0.077, 'neu': 0.778, 'pos': 0.146, 'compound': 0.9988}
1) it has antiseptic violence, emotional uplift and the kind of protagonist that movie people like to call relatable: a brave, pretty young woman (the suitably appealing yifei liu), who loves her family, but doesn’t quite fit in (yet).
{'neg': 0.054, 'neu': 0.66, 'pos': 0.285, 'compound': 0.8334}
2) mulan is an insistently attractive character, no matter how indifferently conceptualized or bluntly politicized.
{'neg': 0.067, 'neu': 0.751, 'pos': 0.182, 'compound': 0.4265}
3) as a director, she tends to overshoot and overcut, sometimes to distraction; she fusses up one conversation with swooping shots from different angles.
{'neg': 0.106, 'neu': 0.894, 'pos': 0.0, 'compound': -0.3818}
4) the movie takes on gender more boldly than it handles warfare.
{'neg': 0.157, 'neu': 0.643, 'pos': 0.2, 'compound': 0.1513}
5) walt disney studios xianniang is the movie’s most vibrant creation and an original addition to the mulan chronicles.
{'neg': 0.0, 'neu': 0.642, 'pos': 0.358, 'compound': 0.8111}

Overall, both movie reviews had positive sentiments as a whole, the 2020 version having a higher sentiment score even. Both films have their pros and cons and glancing through the reviews, there were mostly more neutral sentences than positive or negative ones. Hence, deciding to include just five of those sentences from each review.

As “xianniang” was one of the most mentioned tokens, I noted its appearance and the sentiment score for one of the sentences it was in. Apparently, the addition of Xianniang was beneficial to the story after all! Mushu (the dragon from the 1998 film) might not be as missed as I thought he would be.

Something that stuck out as odd to me was the third sentence and polarity scores I include for the 1998 release. It has a 0.5859 compound score, meaning it should be considered more positive, but reading the sentence itself leads me to think otherwise: “though the plot and setting present exotic new opportunities for the filmmakers, the china of ‘mulan’ has surprisingly little depth of field or background detail.” I think the issue lies with the fact that the second half of the sentence is an independent clause. The first half of the sentence (a dependent clause) seems to offset the score from being accurate.

Limitations and Challenges

A few of the challenges I came across this work was mentioned earlier on in my process such as movie review content not being readily available and cleaning up content that was web scraped, but after reviewing the results, I discuss more here.

Much of my time was spent on content mining than sentiment analysis, the latter being more useful in the end. It is still interesting to see what gets tokenized in some given content, but it didn’t give as much insight as the sentiment analysis did.
Web scraping isn’t something I would prefer to do most of the time as it does not generally seem time efficient, but it works in this case. And it only works because I only parsed through two URLs. A larger number of URLs to web scrape would prove more time costly and difficult. This was also my first time using BeautifulSoup and while it seemed simple, learning new functions can be a challenge at times.
Something I wonder about the results of the polarity scores is this: should I have separated the text into independent clauses rather than sentences? Independent clauses are often sentences on their own, but the inclusion of a dependent clause seemed to change the sentiment. However, even if I did separate independent and dependent clauses, how accurate would dependent clauses be classified? Where do they fit in and most importantly, how do I ensure that we are getting accurate compound scores each time? Going through sentences individually is also rather time consuming if I had to do this at a larger scale.
Surely if I did this again, I think more critic reviews and maybe even reviews from the average viewer would prove to be beneficial in determining if a movie is “good.” I would have to scrape these reviews from other sites like Rotten Tomatoes or IMDb, assuming that APIs are not available there.

Conclusions

Conclusively, at least according to the New York Times, Mulan (2020) is worth giving a watch. Despite the controversy surrounding it, at its core it appears to be an improvement upon the 1998 version of Mulan. Of course, this is only one review out of many so I would still take this with a grain of salt. Still, it’s surprising to see that looking at Mulan as just the film itself, it received a more positive compound score. Not to mention the previously mentioned IMDb score of 5.6/10 stars.

Does one movie review truly speak to how worthy a movie is of viewing? Part of this depends on the person and their views on various sources. A critic, while I assume they try to remain unbiased, still have subjective opinions and us viewers can interpret them however so. On one hand, we have what seems to be according to one critic, but on the other hand we have something that has controversy surrounding the cast and the production. If I did end up watching the movie, and for anyone who is still wondering whether they want to see it (if they haven’t already), it’s important to think about why. Even if it’s considered to be good, or much better than the 1998 version, I don’t see myself viewing it any time soon, despite what I’ve found. But of course, my opinion is subjective, so I digress.

Exploring the Controversy of Mulan (2020) with the NYT Movie Review APIs

The Process

Results

Limitations and Challenges

Conclusions

Written by Megan Resurreccion