Tuesday, December 21, 2010

Google Books Ngram Viewer on Men vs. Women, Media , and STEM

Google labs recently released a Books Ngram Viewer. It is available here. The ngram allows to serach for words and phrases that could be five word long within books data base dating back to 1800. The data base contains 12 percent of all published books but it is substantial and can be used to investigate cultural trends.

More details about the methodology are published in the research paper titled "Quantitative Analysis of Culture Using Millions of Digitized Books"

The Abstract

We constructed a corpus of digitized texts containing about 4% of all books ever printed. Analysis of this corpus enables us to investigate cultural trends quantitatively. We survey the vast terrain of “culturomics”, focusing on linguistic and cultural phenomena that were reflected in the English language between 1800 and 2000. We show how this approach can provide insights about fields as diverse as lexicography, the evolution of grammar, collective memory, the adoption of technology, the pursuit of fame, censorship, and historical epidemiology. “Culturomics” extends the boundaries of rigorous quantitative inquiry to a wide array of new phenomena spanning the social sciences and the humanities.
The paper can be accessed here

Men vs Women

The first trend was to see how the words men and women faired over the year. The plot is shown below

Please click on the image for an enlarged view



The plot starts from 1800 and is fairly constant for men till 1880. After that it start losing and the plot for women start gaining. It picks up steam around 1970 and crosses over around 1990 and takes over.

It shows steady ascendancy of women in the last century and finally taking over. This period also roughly coincides with the start of feminist movement and a decline in patriarchal value system.

It shows that the grass root efforts of feminist movement with lots of organizing and field work paid off in the end.

The next plot was created for the trend in different media such as print, radio, movie, TV and Internet. The plot is shown below


Please click on the image for an enlarged view


No surprises here. Print was dominant media type and pretty stays the same through 1800-2000. Radio had a phenomenal rise and still dominates that is surprising. The possibility is that Radio is identified with wireless communication and even though Radio is not as big as TV but wireless communication is still a big part of communication landscape. Movies enjoyed a steady rise but over shadowed by TV around 1950. TV took over print media around 1980. Since then TV has been on an upward trajectory. What is surprising is that Internet as a media is just a short blip showing at the bottom right of this plot. It really is not part of mainstream media and still have long way to go. Other possibility is that Internet is not a single media type and one has to add contribution of words like "New Media", "Interactive Media" and "social Media" etc. I mean it is clear that Internet according to chart has to acquire an identity that is competitive with other four media type.

The last trend chart was created to see the trend for Science, Mathematics, Engineerings and Technology. There is a big push in academic circles these days to integrate Science, Mathematics, Engineering and Technology ( STEM) in a single unit to achieve synergistic effect of these four disciplines with the support from National Science Foundation. On the surface they all appear to be same and seems to work on the same type of problems. However, these four discipline have different historical origins and have different professional practices that emphasizes different aspects of the same problems occurring in nature.

The chart is shown below


Please click on the image for an enlarged view



Looking at the plot it is obvious that Science has been on a steady rise since 1800. Mathematics has been around and has a steady trend. Engineering took off around 1890 and steadily gained since then. However Technology that is older then Science historically started gaining around 1960 and took over Engineering around 1970.

These trends are approximate but corroborate well with the existing knowledge of the trends about the topics.

Additional resources

Culturomics http://www.culturomics.org/Resources/A-users-guide-to-culturomics

In 500 Billion Words, New Window on Culture http://www.nytimes.com/2010/12/17/books/17words.html



Quantitative Analysis of Culture Using Millions of Digitized Books http://www.sciencemag.org/content/early/2010/12/15/science.1199644.full.pdf


No comments:

Post a Comment