1. Frequency of “literacy” and its pairings
  2. AL Direct’s Content Cutdown Strategy

The story of "literacy" in American Libraries Direct

Lou, D. (2019) The The story of “literacy” in American Libraries Direct, American Libraries Magazine. Available at: (Accessed: March 15, 2019).

The article Examining the word “literacy” on AL Direct, December 18, 2018, raised my interest in examining the word “literacy” in a different way. As was the article’s author Georgina Trebbe, I am quite amazed by the popularity of the word “literacy” which is deeply rooted in libraries. Likewise, I want to have a better understanding of how the usage of “literacy” changed over the years. Then an interesting idea occurred to me: how about examining the word “literacy” in all issues of American Libraries Direct by applying the skills in “data literacy”, “digital literacy” and “information literacy”?

My little pet project started from scraping the American Libraries Direct Archive by writing a Python script, and saving those data in csv files. The most important data is the issue date, article title, and article summary. For the purpose of the project, I excluded miscellaneous information that are not an article in these issues.

Then I used Python Pandas to import the whole data set.

df = pd.read_csv('Data/american_libraries_summaries.csv', index_col=0)

Fig. 0: Screenshot of scraped data from AL Direct Archive

Frequency of “literacy” and its pairings

The next step is to use Python’s natural language processing package NLTK to trace the word “literacy” in all archived issues. I would like to get the frequency distribution of the word in AL Direct Archive. (Frequency distribution generates raw counts of a given word in a text.) I used the word “information”, “program” and “digital” as “controlled variables” in my experiment. I will explain why these specific words are selected later on.

Before calculating the frequency distribution, I developed a set of code to normalize all words in content. This is a critical step to ensure that I will get more accurate result. My script should not be case sensitive when counting the occurrences of a given word. What’s more, it should take plural words like “programs” into account. Then I run the code to get the frequency distribution.

cfd = nltk.ConditionalFreqDist(
(year, word)
for year in range(2006,2019)
for word in [singular(w.lower()) for w in annualSummaries(year, df)])
modals = ['literacy', 'information', 'program', 'digital']
years = range(2006,2019)
years_data = tabulate(cfd, modals,years)

Fig. 1 shows that the frequency of occurrence for the word”literacy” is relatively stable. (The frequency of occurrence is computed by dividing the frequency distribution with the total count of words in a given text.) The “literacy” line has a slight uptick compared to the declining lines of the other three words. However, on average, “literacy” is still the least used word among all four words here. It merely contributes about 0.047% to AL Direct Archive throughout the years. (See Table 1)


Fig. 1: Frequency of Occurrence for words “literacy”, “information”, “program” and “digital”

Year literacy Frequency Distribution literacy Frequency of Occurrence information Frequency Distribution information Frequency of Occurrence program Frequency Distribution program Frequency of Occurrence digital Frequency Distribution digital Frequency of Occurrence
2006 75 0.06% 302 0.22% 297 0.22% 109 0.08%
2007 86 0.04% 452 0.19% 477 0.20% 193 0.08%
2008 116 0.04% 574 0.19% 560 0.18% 194 0.06%
2009 161 0.04% 602 0.17% 658 0.18% 261 0.07%
2010 173 0.05% 514 0.14% 667 0.18% 279 0.08%
2011 171 0.05% 457 0.14% 659 0.20% 317 0.09%
2012 238 0.06% 541 0.13% 736 0.18% 632 0.15%
2013 233 0.05% 519 0.12% 734 0.17% 526 0.12%
2014 174 0.05% 456 0.13% 636 0.18% 456 0.13%
2015 45 0.04% 145 0.12% 200 0.17% 187 0.15%
2016 53 0.04% 185 0.16% 157 0.13% 119 0.10%
2017 52 0.04% 172 0.14% 183 0.15% 108 0.09%
2018 64 0.05% 140 0.11% 184 0.15% 97 0.08%

Table 1: Frequency of words “literacy”, “information”, “program” and “digital

You will notice the frequency distribution numbers are strikingly higher across the board from 2006 and 2014 (see Fig.2 and Table 1) . It is due to content cutdown in each issue since Oct 7, 2014, which is an unexpected discovery for my pet project. I will explain this in details later on.

Fig. 2: Frequency distribution for words “literacy”, “information”, “program” and “digital

The frequency distribution and the frequency of occurrence data give us a nice overview of how low profile the word “literacy” is. But we still do not know in what contexts this word is get used.

To shed some light on this inquiry, I chose to use and modify NLTK’s concordance function. By default, the concordance functions shows us every occurrence of a given word, together with some context. I modified the function to return all the words used next to (before and after) every occurrence of a given word. Basically, this will help me to find all the literacy pairings. Then I drew word cloud graphs to visually display the data. I also computed the frequency of the pairing words to get accurate results. (See Fig. 3)

Fig. 3: Word cloud for pairing words of “literacy” in AL Direct from 2006 to 2018

As you can see from Fig. 3 and Table 3, the top three pairing words are “information”, “digital” and “program”. That is why I picked these as “controlled words” in the frequency of occurrence calculation.

Word Count
information 203
digital 91
program 77
early 72
skill 54
financial 46
medium 26
initiative 26

Table 2: Frequency distribution of pairing words of “literacy” in AL Direct from 2006 to 2018

Below are word cloud graphs for annual data.

Year Word Cloud

Table 3: Word cloud for pairing words of “literacy” in AL Direct by year

From Table 3 and Table 4 , we can see that except in 2012 and 2018, “information literacy” is always the most popular term among all literacy topics. Apparently, we librarians holds “information literacy” dearly.

  • Information Literacy: the ability to use information and communications technologies to effectively locate, evaluate, determine relevancy, synthesize, and use information ethically.

I think “literacy skills”, “early literacy” and “family literacy” all belong to a same category, despite their slight differences, they all emphasize building skills required for reading and writing:

  • Literacy skills: all the skills needed for reading and writing.
  • Early Literacy: what children know about reading and writing before they actually read or write.
  • Family literacy: home literacy activities that provide literacy skill-building opportunities for young children while enhancing literacy skill development in all members of the family.

And this category is an important and constant theme as shown by data. From 2006 to 2015, it always had spot on the top three literacy pairings list. Noticeably, “early literacy” appeared on the list in 5 out of 13 years.

“Literacy program” was popular from 2007 to 2009. But since 2011, it never actually managed to come back to the top three list again. It is possible that instead of discussing general literacy programs, librarians switched to focus on specific ones.

  • Literacy program: library programs to promote literacy.

“Digital literacy” maintained popular from 2011 throughout to 2013, and afterwards, it tuned down.

  • Digital literacy: the ability to use information and communication technologies to find, evaluate, create, and communicate information, requiring both cognitive and technical skills.

“Financial literacy” shows the libraries’ rapid response to the economic crisis around the time. It was listed as the top three literacy topics in both 2014 and last year, 2018.

  • Financial literacy: the possession of the set of skills and knowledge that allows an individual to make informed and effective decisions with all of their financial resources.

Besides financial literacy, in recent years, “health literacy” also started to catch libraries’ attention.

  • Health literacy: the degree to which individuals have the capacity to obtain, process, and understand basic health information and services needed to make appropriate health decisions.
Year Word(s) Count
2006 information 9
skill 5
family, campaign, adult 3
2007 program 6
skill, information 5
2008 information 15
program 8
skill 4
2009 information 19
program 10
early 8
2010 information 23
family 7
initiative 6
2011 information 16
digital 10
program 8
2012 digital 36
information 28
early 11
2013 information 29
digital 24
early 20
2014 information 28
financial 13
early 10
2015 information 4
health 3
childhood 3
2016 information 13
design 4
skill 3
2017 information 9
medium 5
news, instruction, improves, digital, attitude 2
2018 early 7
information 5
health, financial 4

Table 4: Top pairing words of “literacy” in AL Direct by year

AL Direct’s Content Cutdown Strategy

I made an unexpected discovery when scraping the AL Direct Archive. I noticed that there are roughly two different styles for all the archived pages. This implicates that some time in the past AL Direct updated its online appearance. The style modification also indicates the underlying change of the content strategy. By glancing over the size of data in different years, I made the conclusion that the update might have taken place toward the end of 2014 or the beginning of 2015. The annual data size abruptly dropped from 2.9M in 2014 to less than 1M in 2015 (see Fig. 1).

Fig 1

Fig. 1: Data size of AL Direct Archive by year

A closer look of the number of articles per month agrees with this conclusion. In October 2014, this number shrank to almost a third of that of the previous month. (see Fig. 2)

Fig. 2: Number of articles per month from 2006 to 2018

Let’s zoom in to the data in October 2014, and we notice the number of articles per issue plunged on Oct 7, 2014, and it maintained that low level onwards. (see Fig. 3)

Fig. 3: Number of articles per issue in October 2014

This content cutdown strategy is a smart move if we consider the unpopularity trend of “mailing list” since 2014 as Google Trends told us. (see Fig.4)


Fig. 4: Trend of “mailing list” from Google Trends

I am sure the downsizing of AL Direct helped many librarians to better manage their professional readings every week. Before, a subscriber might have felt overwhelmed every week when he/she clicked and opened the long mailing list containing 75 articles on average (at the peak, it was 182 articles to read in the week of July 2, 2010 and July 3, 2013). Now, he/she only has an average of 12 articles to go through in every seven days.