Lists, Damned Lists, and Statistics

In 1986, I came to New York for the first time (since visiting as a five-year-old), where I interned at Rolling Stone. I sublet a room on the Lower East Side, lived on lentil/rice concoctions, and learned about coffee carts, subway routes, homelessness, and the last shreds of the punk scene (my apartment was right over the Pyramid Club, and the East Village was full of mohawks, piercings, and tattoos at that time).

One of my tasks at Rolling Stone was to create the charts. This was done by calling up about 20 record stores all over the country, which someone had designated as key indicators, and checking on their sales rankings. I collated the data and submitted it to the managing editor, who might make some tweaks to it before running it in the next issue. And I learned some things about bestseller charts. Mostly that this type of data-gathering was less than scientific.

This method was replaced in 1991 by Nielsen's SoundScan service, which tallied sales from cash registers in thousands of stores. It was marginally more scientific - results were based on raw numbers rather than phoning around randomly and having the results edited to suit someone's tastes. Ten years later, Nielsen expanded its service to bookstores - BookScan was born.

Again, BookScan wasn't perfect. It can only track print book sales - because it relies on bar-code scanning technology. And it doesn't track non-traditional sales such as to libraries, or direct sales, or sales by online retailers.

And then...there's the New York Times Bestseller List. Today they announced that they are consolidating the lists, merging some print and digital charts, and dropping a few lists. The compilation of the NYT lists is secret even from the NYT Book Review staff - it's done by the news staff. But they have mentioned that it's done in similar fashion to what I used to do at Rolling Stone - communicating with bookstores around the country and tabulating sales by what they report in. My understanding is that this process now spans thousands of stores, as well as wholesalers who distribute to non-traditional book outlets. It scales more broadly than my efforts did in 1986, but the principle is still the same - self-reporting by stores, plus some kind of editorial "secret sauce".

Of course, the best source of sales (post-returns) is the publishers, who don't share this knowledge with anyone. So, just as we don't have a fully complete and authoritative repository of all publishing metadata, we don't have such a repository of all book sales data.

Basically, these lists come down to what you count, what you DON'T count, and what you CAN'T count. They are signposts, some more artfully created than others.

8 thoughts on “Lists, Damned Lists, and Statistics”

    1. Hi, Nate – certainly Amazon is not all-inclusive and those numbers wouldn’t count B2B sales (large corporate orders placed directly with the publisher, etc.) and non-Amazon outlets. If you’re talking about self-published books, again, the best source for sales data would be the authors.

    2. Amazon only counts some sales, and distorts the results in favor of those books that sell only online. The books that are sold more widely get radically underestimated by using Amazon.

      Consider: If half of my book’s sales come from online, and I sell 10,000 copies there, it means I’m selling twice as many as the online-only book that sells 10,000 copies there. This means that my “bestseller status” should be more equivalent to a book that sells 20,000 copies online-only.

      Publishers are the best source of data on the books that were published by a traditional house, or by a Pay to Publish outfit. (I can’t call them self-publishing, I’m sorry.)

        1. Well, publishers are never going to get just-in-time sales data on competitors’ books, or for that matter, ANY sales data apart from rankings, unless the competitors actually release it (which they never do – only “over 100,000 copies sold” or similarly vague info).

          1. You’re both right — which is why BISG and other such groups’ reports were the best possible information. They looked at the AAP-reporting publishers and the small and micro-publishers (which is what a self-publishing author really is, as long as they’re not using a Pay to Publish house) and even some Pay to Publishers, all in one. Of course, that no longer happens.

            Bookscan came along and was much better than anything else (during the few years between it’s launch and the dawn of ebooks as a numerically important sales channel some 5 or 6 years after the launch of the Kindle). That killed off the combined reporting surveys. They were expensive and not as accurate as Bookscan (for trade books).

            Or you could look at the Department of Commerce census data, if they’re still collecting it. Given the recent cuts in government spending, that may no longer be happening.

            Or you can just make do with a couple of incomplete estimations of partial information (like AAP and the AuthorEarnings assessments of Amazon sales) and try to combine them with some spreadsheet legerdemain, and hope you end up with something useful.

            But to say that either one or the other is complete and to rely upon them? Not good practice, IMNHO.

          2. Massively agreed. We’ll never have a complete picture. But somehow this industry has functioned since the 1400s without one….

          3. True. But that’s why numerical estimation techniques are so critical. Because good data is too expensive to get, and therefore not cost-effective, even if we’re all buying the reports.

            Thank goodness that there are lots of book loving spreadsheet jockeys around.

Leave a Reply

Your email address will not be published. Required fields are marked *