Using data to unlock—and understand—my reading history
Back when I was a kid, I used to read A LOT of books. Then, over the last couple of years, movies and TV series somehow stole the thunder, and with it, my attention. I did read a few odd books here and there, but not with the same ferocity as before. I could also feel my attention span dwindling so I had trouble reading longer, slower-paced books. It was always easier to consume media to satiate my curiosity and experience something new, especially with the rise of beautiful video essays.
Everything changed this year. I learned to let go of the expectation of finishing a book and go back to why I enjoyed reading them in the first place. As a consequence, I’ve read more books in the first six months of this year than I have in the last four years combined. This led me to questions about my reading patterns. How have they evolved over time? What are my favourite genres?
I already knew these answers for movies. I’ve been on Letterboxd for a long time. For the uninitiated, Letterboxd is the better version of IMDb. It’s a site with great design, an awesome rating system, and has a great community of people who love movies and make brilliant lists. Above all of these though, the thing I love most is their stats page. A quick glance shows me that I’ve watched movies from 47 countries and my most-watched director, to my absolute dismay, is David Dhawan.
Welcome to “Nightingale and Chill” Week
Every article from Nightingale’s week-long celebration of the intersection between entertainment and data visualization
I wanted something similar for books. My first stop was Goodreads but their stats page was nowhere as good as Letterboxd. I did some more research and came across The StoryGraph. It’s a new site that’s in beta but shows much promise. Each book is tagged with parameters like “mood” and “pace” of the book which they consider while giving personalised recommendations. Think of Spotify’s Echonest algorithm but for books. The nerd in me was delighted.
Logging all the books I’ve read was an arduous task. I had logged some books historically on Goodreads but not the entirety of my reading history. The remaining books I logged through memory, with help of social media posts, notes, letters, old photos, birthday gifts, etc. By the end of it, I logged a paltry 168 books and imported them into The StoryGraph, which has its own stats page, which in turn gave me ideas to create new visualisations. I reached out to Nadia from The StoryGraph and she sent me a dump of all of the data that I had logged even though the feature is not open to public yet (thanks Nadia!).
A quick word about the data. I have not logged any comics I’ve read yet. Two reasons why:
- I’ve read too many (at least 800 issues of Batman, 300 issues of The Flash, 100 issues of Tinkle, etc.) and logging them is going to take some time.
- Secondly, this would massively skew my data so I’ve not added them as yet. I logged in almost every book I could remember reading.
Also, this is not the entirety of what I’ve read. A lot of old books are not present in the Goodreads/The StoryGraph database so those I haven’t been able to log. Eventually I managed to tag all the ones I imported and came up with some rudimentary visualisations. Let’s dive in!
The Surface Dive
For the first one, I bucketed books into three: < 300 pages, 300–500 pages, 500-plus pages to see their progression over time. I started out with a lot of small books and then when I finished my board exams, I read a LOT of large books for the next couple of years. Then I started working and still read a lot of medium sized books. Over the last couple years, I’ve gone back to reading a lot of small and medium-sized books and I hope to read a large book soon.
I then charted pace of the books I’ve read as characterized by The StoryGraph. I generally prefer fast-paced books because I need something that holds my attention. The data backs that up, showing that fast-paced books do make up the majority of what I’ve read. I think (I could be wrong) an author has to be particularly skilled to hold someone’s attention when it’s a slower-paced story.
This made me wonder: Do we have more fast-paced books now to go along with people’s dwindling attention spans? Mapping the publishing landscape would be a worthy subject for a future post, but in the meantime it’s made me consider how the evolution of publishing might impact my own reading patterns.
Anyway, despite my preference for faster pace, I do enjoy reading slow-paced books as well, and I was really surprised to find that almost all the books I’ve read this year are slow-paced. That wasn’t a conscious decision on my part, but could be due to another shift I uncovered in my reading habits:
Going in, I already knew the fiction/non-fiction divide. I’ve always preferred fiction and even scoffed most times at non-fiction. When I was young, non-fiction books were basically reference or self-help books. Now that we have very accessible online tutorials for anything you may want to learn, I didn’t really have the need for them. That was until I read Meggs’ History of Graphic Design in 2016. That book was a catalyst for me to see what other golden knowledge lay hidden in these books. I knew I’ve read a lot of non-fiction recently but was surprised to find that all the books I’ve read this year are non-fiction, including the book I’m currently reading. I think I got tired of watching too many video essays, so went on a knowledge rampage when the lockdown began and hence the spike.
The Entire History
The breakouts were cool but I wanted to see my entire history using a single visualisation. Also, since there are multiple genres of a novel, I’ve tagged each book with a “main” genre for this visualisation. This took some time since there are some books that gave me sleepless nights. Is Audrey Niffenegger’s The Time Traveller’s Wife romance, or sci-fi? How about Vikas Swarup’s Q&A? What even is Old Man and the Sea?
Enter the below visualisation. It looks like a sankey diagram but is spiritually a parallel sets plot (which deals more with categorical data and how it is classified). The genres on the right are ordered by appearance in which I first read them.
I’ve always been a huge fan of fantasy but was very surprised that I hadn’t picked any fantasy book up until very late into my reading years.
Once I did, I was unstoppable. 2004 was entirely thriller and fantasy!
I still wanted to see all the genres I’ve read across the years so I made the titular “Landscape of Literature” for genres. The graph at the beginning of the article is exactly that, but while it looks pretty, it’s not the best for analysis, which is why the Ridgeline Plot is a better option. I’ve accounted for multiple genres of the same book in this visualisation.
Adventure was a huge theme when I began (Journey to Center of the Earth, Treasure Island) before reading a lot of mystery books (Secret Seven, Famous Five, Hardy Boys, Nancy Drew). Then graduated to fantasy (LOTR, Harry Potter). The spike of mystery, history, and thrillers starting from 2003 can be attributed to The Da Vinci Code and all the books similar to it that I read. Fantasy then makes a comeback because of A Song of Ice and Fire and continues almost through the whole decade. A big twist was Sci-Fi; for a genre that I love so much, I’ve read surprisingly few. In the last couple of years, non-fiction genres like design and business have been surging owing to a lot of reference books I’ve been reading and catching up on.
My last visualisation deals with the moods associated with the books I’ve read. Although adventure as a genre tapered out in my childhood, almost all the books I’ve read are adventurous in spirit. The second-most prominent set are dark, mysterious, and tense books that almost increase in frequency on cue as lighthearted tapers off.
That’s a walk through the history of my reading so far. Feel free to linger around and dive into the data. If you have any suggestions please feel free to drop them in! You could have insights slightly different than mine so please feel free to leave them in too.
This was a great eye-opener for me. Some of my pre-conceived notions were broken, and I picked up on some very interesting patterns I would’ve never seen if I hadn’t visualized it. Why haven’t I read adventure books since I was a kid? Is it because there are very few true adventure books out there now or haven’t I put enough effort in finding them? This is what is great about data and why I love it. Data is a mirror you can use to reflect upon yourself and confront your biases. What you do with it is then up to you.
I still have multiple improvements planned for this and more ways to dig into data — author diversity (how eurocentric is my author list?), book formats (when did I start embracing eBooks?), the difference between my “Read” and “Want to Read” lists, etc. I also want to make this a real-time dynamic page à la Letterboxd. Let’s see how I get along with that a month or two from now.