UPC

Universal Product Code

The UPC was invented by George Laurer, a prolific inventory with a long history at IBM. The first product ever to successfully pass through a UPC scanner was a pack of Wrigley's gum, in 1974. Like virtually every bar code type, the UPC is governed by GS1, an international standards body that maintains many identification schemas for use in trade.

UPCs became a required bar code for the grocery industry. Most supermarket UPCs consist of 12 digits, including a checksum. But UPCs grew in popularity and have been adopted by all sorts of retailers.

As bar codes became more prevalent, mythologies sprung up around them. The popular Bar Code Tattoo series is based on the dystopian premise of oppressive identification. But the most intriguing UPC conspiracy is known as the "666 controversy" - which associates UPCs with the Book of Revelation's "mark of the Beast".

A Word On Bar Codes

The demise of Shelfie, a tool that allowed customers to link their print books to ebooks, got me thinking about bundling. Obviously, one method of linking print to its digital metadata is through its identifier. For print books, the identifier is the ISBN, which (since the early 2000s) is 13 digits long and can be expressed as a bar code - the EAN. The bar codes allow users to scan a book, check some of its metadata (such as price) online, and make purchasing decisions - and are frequently used by resellers prowling thrift shops and library sales, looking to pick up a good deal (an application that is looked down upon in the industry).
The book business has sneered at bar codes for a long time. Cover designers didn't like the way they marred the cover art. The very idea of printing pricing information on a book cover was anathema to publishing's vision of itself as somehow above commerce (though that did become a consumer requirement, and is why hardcover book jackets print the price on the inside front flap - it's as if publishers were holding their noses while giving in to consumer demands).
But as bookstores such as Barnes & Noble, Borders, and Books-A-Million began cropping up, and books began to be sold in other venues such as supermarkets, drugstores, and airports, the industry could no longer afford to be snobbish. Horror of horrors, these vendors required bar codes to sell their products. And in 1985, BISG published guidelines for bar codes on book jackets.
At that time, the EAN had not yet been invented. Bar codes  as we know them were invented first (and patented in 1952) for the grocery business. They began to be implemented in the 1970s, when the UPC was developed. The effectiveness of the bar code gave rise to efficiencies that other industries had been looking for, so soon the code gained traction in other types of stores as well.
UPC bar codes began to appear on books at first in sticker form. And that was the job of the receiving clerks at large bookstores. A shipment of books would come in, and clerks would print the stickers and apply them to book covers - that way, publishers didn't have to get involved. But as the large book chains gained more power, they began to campaign for publishers to take on this responsibility themselves. So publishers gave in and began printing covers with the bar codes already on them. (We hadn't seen the last of stickering, however.)
And now we come to Bookland.
Bookland originated in the 1980s, but was re-invigorated in the US in the 1990s by George Wright III, founder of Product Identification & Processing Systems, and an active member of BISG. The EAN bar code, which was gaining popularity, reserved a prefix to designate a country code. Wright convinced EAN to implement "Bookland" to indicate book products regardless of which country they are from. 978 and 979 are therefore reserved for book products.
EAN began as the European Article Number - a European version of the UPC that came in 8- or 13-digit schemas. UPC codes for books had 12 digits. EAN realized that they could incorporate the ISBN of a book into the EAN by using the Bookland prefix - and books could be easily scanned worldwide. But this would require publishers to change their ISBNs from the historical 10 digits to 13-digit identifiers. In the late 1990s/early 2000s, the ISBN standard was modified to reflect its new compatibility with the EAN. BISG began a large educational effort to persuade publishers to convert their ISBNs to 13 digits (my first consulting engagement) - a not-trivial task, as so many publishing systems had fixed-width ISBN fields.
A small aside: Never design your systems so that the field that contains identifiers is fixed-width. They will break every time.
During this transitional period, there were two types of bar codes in use simultaneously. Grocers and other retailers had not made the change to EAN - they were still using UPC. Bookstores had adopted the change to EAN. So publishers and booksellers were forced to grapple with this. BISG established rules for the transition - and while publishers were in the process of converting from UPCs, bookstores once again found that they had to sticker their products. It was a painful period, but the efficiencies in global selling more than made up for it.
There's a common misconception that the reason the industry converted from 10- to 13-digit bar codes is that we were "running out of numbers." I don't know how that rumor got started, but it even the BISG website touted it during the transitional period. We converted to conform with a standard that was being adopted globally - ISBN had agreed to make the change, and American publishers had to fall in line if they wanted to keep using the standard. Given that the book industry depended on ISBN more than any other identifier - it was baked into the supply chain at that point - the business had to confirm or reinvent the wheel, which was a more painful prospect.
That's how we got to where we are. Scanning technology has advanced to be able to handle both UPCs and EANs, so we don't see as much stickering. But if you compare bar codes on books from the 1990s to those on books today, you're actually looking at two different technologies.
So any business that has predicated its model on scanning the bar codes on books new and old needs to be aware of this. Scanning a book from the 1990s isn't going to tell you much about the book. And, in fact, publishers (and other manufacturers) re-used UPCs all the time. UPCs can in no way be seen as a reliable identifier the way EANs can.
Jeff Bezos understood this as he began planning to build Amazon. According to "The Late Age Of Print," a marvelous book that I highly recommend:
"Bezos’s decision to start an online bookstore was largely driven by a pragmatic appraisal of the book industry’s level of standardization. Books, he reasoned, were more “meticulously organized” than almost any other type of consumer good owing to the book industry’s decision to adopt the ISBN twenty-five years earlier. That the book industry already had taken the unusual step of assiduously inventorying, coding, and maintaining a detailed database of its wares convinced Bezos that books would be relatively easy to integrate with his company’s burgeoning distribution and inventory-control systems. Standardized product coding also meant that Amazon.com could more readily establish dependable communications with book publishers and wholesalers, which would be critical to meeting the company’s promises of speedy delivery, not to mention its ability to compete with local bookstores."
The ISBN has been a groundbreaking identifier in many ways, and encoding it into a bar code is one of the smartest things this industry has ever done. Other industries have looked to the ISBN as an example - which is how we've gotten the ISSN, the DOI, the ISRC, and other identifiers in the ISO TC 46/SC 9 family.
NB: I have further notes from George Wright III, which I am compiling and formatting into further posts/newsletters/podcasts. There is definitely more to come.

Alphabet Soup: Thema

Thema is an internationally-developed set of subject codes meant to serve publishing's global trade needs. Whereas North America uses BISAC, the UK uses BIC, English-speaking countries use both BISAC and BIC, France uses CLIL, and Germany uses WGS. Thema is designed to be a sort of lingua franca to which all these schemas can be mapped. This way, a publisher using BISAC codes, having mapped them to Thema codes, can send out information in a way that a recipient using CLIL can make use of.

Thema is multi-lingual, and its implementation reduces the need for duplication of work in similar workflows. It is heavily in use in the EU, of course, because inter-lingual communication is a daily need there. It is gaining traction in English-language countries who have to work with both BIC and BISAC. In the US, implementation has not gone as far - that will happen when US publishers are more affected by global market communication requirements than they are now.

Thema is maintained by EDItEUR.

What Happens When My Metadata Leaves The House, Part II

In a previous post, I talked about the issues surrounding getting your data ingested by third parties. This week, let’s think: What are some of the issues around sending OUT metadata?

Relationships

We have systems, which can be outdated, chaotic, and incompatible. We have a functional process – who does what when – which is riddled with exceptions. And we have a single source of truth that holds all the most updated information, but has to export that information in a lot of different ways.

Relationships are what get us through the gaps in tech, staffing, and standards.

This industry was built on relationships: Agents knowing the tastes of editors. Marketers knowing the tastes of bookstore buyers. Editors knowing what works for library acquisitions. Human relationships – our work friends – keep this industry in business in the face of imperfect systems. And that’s not appealing to some folks – which is also why we have a thriving indie scene with relationships of its own – some see these relationships as forming a gate-keeping system, a sense of exclusivity.

But for the most part, the relationships are there to smooth the bumps in the workflow. Sudden price change? Call up your vendor partner at B&N. Want to test out an idea for a project? Reach out to the collections development librarian to see if she thinks it’s viable. Need to recall a book or cancel a publication, and the ONIX feed may or may not get ingested? Call up your trading partners and they’ll do a manual update to their records.

Our relationships are the hack that makes it all work. The relationship between publisher and bookseller existed before there were systems to facilitate information exchange.

So, as with any relationship, communication is the highest priority. Much of that communication is automated with metadata transmissions – but as with anything automated, there are going to be exceptions that have to be dealt with manually. Which means an email or a phone call, or an in-person chat at a conference to gain or give clarification about what’s expected. Maybe the answers aren’t what we want to hear, but at least we know how to solve the problem.

I’d like to say that the one thing we can all count on is that everybody in the business wants it to work. And in most cases that’s true.

You do have the “digital disruptors” – Apple, selling devices; Google, selling ads; Amazon, selling everything – who don’t necessarily have the book industry’s best interests at heart. And that’s where we’re experiencing the greatest friction. It’s not necessarily with computer systems. It’s with market systems – when our world is disrupted not by ebooks, but by (for example) a vendor using our product as a loss leader; another vendor who is uncommunicative and not at any conferences/standards meetings; another vendor whose founder didn’t even want to sell books in the first place because “nobody reads anymore”.

We can’t NOT do business with these folks. But we are the little guys here, and that’s a reality. This is WHY we have to deal with competing vendor requirements – books might be a small fraction of Amazon’s business, but Amazon is a LARGE fraction of OUR business.

So how can we improve our relationships with these partners?

Well, on some level they do want things to work as well, or they wouldn’t be selling our books. And maybe, in addition to trying to school Google in the wacky ways of book publishing, we need to learn from Google about the also wacky ways of tech.

The other thing to note is that these companies DO hire book people. The folks at Google Books are longtime publishing operatives – from HarperCollins, B&N, Scholastic and Harvard University Press. Amazon regularly hires out of the NYC book publishing pool. And the iBookstore is staffed by people from Simon & Schuster, B&N, Oxford University Press, and other traditional publishing companies. So the encouraging thing is…we are infiltrating. There really ARE like-minded people at these companies. They seem (and feel) like faceless organizations, but they are not. That’s a myth that we’ve been telling ourselves for over 10 years.

Best Practices

What we’d consider best practices can vary depending on who’s consuming the metadata.

BISG has published a document on best practices for product metadata in the book supply chain. That’s a good foundation for bookselling – obviously the vendor requirements will differ as they compete with one another. Of course, Amazon doesn’t participate in hammering out these best practices, so they’re not perfect. But it’s a good guideline to begin with.

For non-bookselling purposes – libraries and other institutions – the idea of “best practice” gets a little murky. Resource Description and Access (RDA) was created by an international association of library organizations as a set of guidelines for cataloguing resources (books and other things). RDA can serve as a good foundation upon which to build library metadata.

Some systems vendors also do educational sessions where best practices are reviewed: Firebrand, Klopotek, Iptor and Ingenta all have user conferences or webinars for their clients that go over market needs for metadata and standards. And of course there are trade shows like ALA and BEA which also have conference tracks.

Basically, if you get out there and start talking – and you might not have to go places physically, if budget is an issue – you can get a conversation going. Maybe others are having similar problems to you. Maybe there’s a solution someone knows about. To some degree, we compete, but as an industry we’re also really good at helping each other.

Alphabet Soup: GLN

Global Location Number

GLNs are administered by GS1, the same organization that handles the GTIN. A GLN is assigned to a specific location, meaning that computers don't have to process the text string of a place name. It provides precision - a GLN can identify a warehouse, an office, or a shelf on a store. It can be encoded into a bar code or RFID tag, so that it can be scanned.

Extension components of the GLN allow users to identify sub-locations - storage bins, scan/read points, warehouse docks. GLNs begin with a company prefix, and end with a check digit, similar to ISBNs.

What Happens To My Metadata When It Leaves the House?

We've all seen it. We spend time perfecting the metadata in our feeds, send it out to our trading partners, and had to take complaints from agents, authors, and editors. "Why is it like that on Amazon?"
The truth is, data ingestion happens on whatever schedule a given organization has decided to adhere to. Proprietary data gets added. Not all the data you send gets used. Data points get mapped. So what appears on any trading partner's system may well differ somewhat from what you’ve sent out.There are so many different players in the metadata arena that can affect what a book record looks like. When you send your information to Bowker, they add proprietary categories, massage author and series names, add their own descriptions, append reviews from sources they license – and send out THAT information to retailers and libraries. The same thing happens at Ingram, at Baker & Taylor – so what appears on a book product page is a mishmash of data from a wide variety of sources, not just you.

At an online retailer, different data sources get ranked differently. This happens over time, as a result of relationships and familiarity with data quality, and these rankings can change. The data can also get ranked on a field-by-field basis. So a publisher might be the best source of data for title, author, categories, and cover image. But the distributor might be ranked higher for price and availability. And an aggregator might be ranked higher for things like series name – especially if they specify to the retailer that it’s something they’re focusing on standardizing and cleaning up. It’s important to remember that in the eyes of the retailer, not all data feeds are equal. You’d think the publisher would be the best source of data about its own books but I can assure you, having worked with publisher data my entire 30-year career, that isn’t always the case.

For a publishing house, updating old metadata records is a break from normal workflow, so it doesn’t happen as often as it should for optimal marketing purposes. It’s important to remember, though, that the job doesn’t stop once the book leaves the house – there are reviews, awards, and other events that are worth making stores and readers aware of through your metadata feed.

Just another quick word on terminology when it comes to updates – a “delta file” is what we call these updates – additions, changes, and deletes only, rather than a full file. Most publishers will send an initial full file, and then supplement with delta files for a time, and begin the cycle again just to make sure that their trading partners are in sync.

But on the retailer/aggregator end, there’s no guarantee that your updates will get processed in a timely way (without a phone call). Companies ingest on their own schedule, and if they have a very heavy processing week, they might skip your delta file and wait for the next one, which means there might be gaps in data updates. This is why publishers find themselves occasionally sending a full file – just to be sure all their records are brought up to date.

IDPF into W3C: What We Learned At DBW

On January 18, 2017, the IDPF (International Digital Publishers Forum) held an open meeting at the Digital Book World conference to discuss the impending merger of IDPF into W3C.
It was, to say the least, an engaging session. If by "engaging" you mean "confrontational."
W3C has been circling the waters of digital publishing for nearly 5 years. In 2012, they formed the Digital Publication Community Group, looking at issues such as accessibility, markup, metadata and more. That group closed in 2013, and the Digital Publishing Interest Group formed in its stead. This group examined issues relating to the W3C Open Web Platform, layout and pagination, annotations, metadata, accessibility and CSS.
It became apparent that the IDPF's EPUB standard stood at risk of "forking" as the W3C got more involved. And as W3C already managed the HTML and CSS standards, it seemed logical that they should house the EPUB standard as well.
Or so 88% of the IDPF voters thought.
Not so Steve Potash, who has been CEO of Overdrive for 23 years, and who founded the Open eBook Forum in the late 1990s. The Open eBook Forum would go on to become the IDPF, which Potash served as President for many years.
Potash forcefully accused the current Executive Director of personal profiteering, and accused the W3C of commandeering the standard only to ignore it in favor of other standards. He also accused both parties of steering the EPUB standard out of the book industry.
This was all refuted handily by both the Executive Director and the representatives of W3C, as well as by the IDPF board itself. The Web standards group cares deeply about EPUB and digital publishing - within the book industry and beyond.
It was a difficult moment for IDPF and W3C, and was handled gracefully. Suffice it to say that it is a good move for the EPUB standard, because now it can take advantage of proximity to other standards, cross-pollinate committee meetings, and develop the standard to be flexible and accommodating to the many different constituencies that use it.
It is also a sign that indeed, the Web has come for books. That books are important to Web developers - as rich mines of content that can be presented in a variety of ways. Surely, as we know, the print book will continue to offer the same reliable experience it always has - but with digital publishing being embraced by the W3C, it will be exciting to see what other applications besides digital facsimiles of print await.

Beyond Simple Math & 5th Grade Semantics: The Not-So-Mindless Musings of Two 21st Century Metadata Wonks

By Nannette Naught and  Laura Dawson

For the sake of clarity, as wonks, we started at the beginning with simple math and basic sentences. But let’s be honest,

  • The business of knowledge is far from simple. Knowledge acquisition, creation, and communication are complex tasks requiring advanced semantics.
  • The mechanics of the knowledge economy are a far cry from straightforward supply and demand. Knowledge creation, acquisition, and curation are more about performance than parts.
  • The obligations of knowledge stewardship are far and away larger than the web. Knowledge rights, ownership, and access are international in scope and fraught with location- and community-specific considerations.

And, for that matter, let’s get real, as modern day metadata mechanics, most of us are:

  • More concerned about music, streaming digital media, and demand driven acquisition than articles, offsite storage, and comprehensive collections.
  • More affected by the need to demonstrate impact, the pressures of digitization, and concerns related to institutional and service alignment than usage counts, known item fulfillment, and search versus discovery sidebars with our vendors.

And while we’re at it, let’s put a fine point on the problem and explicitly state our operational struggles — the “nickel and diming us to death, but can’t afford to stop long enough to replace” work -rounds our behemoth, scarce-resource-guzzling 20th Century information systems require of us in the 21st Century knowledge economy. As trained, GenX and Millennial information, data, and content scientists we are forced to spend way too many hours in “dare we say it” repetitive, menial data massage tasks such as:

  • Recataloging content objects and enhancing their string-based, record trapped metadata.
  • Finding, investigating, and hand correcting the associated master and local holdings record collisions which make our collections invisible and unavailable to our patrons. Over and over again, with each update, renewal, replacement, and reconciliation.
  • Rescuing valuable, expensive researcher and student access from our disconnected ERM- systems and KB-focused acquisition and delivery processes.

From this vantage point then, let’s clearly outline our shared interests. As 21st century, knowledge industry leaders, we are united with our management colleagues across the lifecycle by our:

  • Deep interest in the economics of metadata modernization and the ongoing costs (time, money, and opportunity) of our community’s extended re-engineering efforts.
  • Strong commitment to quickly and effectively bridging the gaps in communication, understanding, metrics, and use cases that plague both initiatives. Be those gaps within our niche or with our software development friends who create the applications that deploy our data.

And with all this, finally, off our chests, let’s throw some light on this 21st Century knowledge economy we keep talking about, using the recent ALA MidWinter Conference as our crystal ball.

  • What does it look like? And how is it different than the production and inventory infrastructures of yesterday? In one word: Energy. With over 15 years of attendance under my belt (yes, I, GenX, Nannette, have been attending since 2001 in San Francisco), I can honestly say, perhaps more than any I have attended, the knowledge economy seen through the lens of this conference looks like Diversity, Responsibility, and Transformation!  From the active, visible, and varied president elect candidates to the council and committee meetings where strategic plans, realistic achievable budgets, and forward reaching conference updates are being not just discussed, but enacted, this is a vital industry, on the move.

For me, it’s spirit was best summed up in a single quote:
“If you can make the 14th Librarian of Congress, a Librarian, you can do anything … Librarians are having a moment, remember your power.”Carla Hayden, in her address to Council. (And yes, earlier this year as the second link shows, she also visited those PCC meetings, gathering place of many a library metadata wonk.)

  • Who are the players? And how are they different than yesterday’s? Quite frankly, some of the players you already know — LC, Ingram, and TLC or as we noted last week, libraries, library technology companies, publishers, distributors, aggregators, and the like. Others, however, like Biblioboard, DLSG, and BluuBeam are more recent entrants with new landscape shaping technical offerings. As to difference, I go back to the words Energy and Spirit. Old or new, these players are different for their willingness to actively execute real, working software on the cusp of change. They are not simply promising a future feature set, they are actively demonstrating an understanding of 21st Century use cases and a commitment to economical service modernization with tangible results.

For example,

  • Library of Congress’ BIBFRAME Initiative, drawing on real project planning that began in early to mid2016 (they’ve been reporting on it, in some detail, incrementally since late spring or early summer of that year) and in collaboration with PCC, IndexData, and others is actively completing the MARC to BIBFRAME converter, processing 19 million MARC records, and generating billions of RDF triples over the next three months to feed its active production pilot and the coming objectification of library metadata. Talk about realism and controlling the cost of re-engineering efforts, this is a whole new level of performance. This is from 1967 to big data, from “dirty by design” to “specifically semantic” in months not years. To my way of thinking at least, these are the foundations that will free re-invention efforts from the bounds of MARC-driven ILSes:
  • Enabling metadata mechanics to connect sales metadata to knowledge across the lifecycle. Need a reference point? Think back to Jean Godby’s ONIX mappings and OCLC Research’s Schema.org work of a few years back. Recast it against Selection and Acquisition in a beyond BIBFRAME 2.0 world, and a whole new era of metadata automation opens up.
  • Allowing metadata curators to establish relationships between people, pieces, collections, and disciplines. Need a guidepost? Think back to 2010 or so and Tom Delsey’s, Barbara Tillet’s, and the RDA JSC’s work on relationships and relationship designators. Recast it against metadata curation postBIBFRAME 2.0 production pilot, and a whole new world of metadata driven inquiry opens up — inside, and outside, the library system.
  • Empowering lifecycle leaders to drive ROI and assess impact. Need some lane markers? Follow Wayne Schneider’s (IndexData speaking at the BIBFRAME Forum) train of thought. A post conversion, big data world (on and off the web), working with curated metadata resulting from steps 1 and 2 herein. Library metadata which is no longer surrogate and outside content, but expressed in and aggregated against our resources’, digital content’s native languages (e.g., objectified XML). And a whole new world of learning, research, and inquiry facilitation (not to mention library operations cost optimization) opens up!

And on that note, let’s look at a few of those cusp riding, working applications which cast against this upgraded, connected metadata uplift wonk spirits and re-energize us.

  • BiblioBoard with its Amazon Kindle store style interface is helping libraries affordably and intuitively provide access to digital materials outside cumbersome discovery interfaces. Giving librarians and their patrons the ability to create custom curations and a personalized user experience akin to their relationship with their tablet. Talk about metrics, this is a whole new level of embedding. This is ILS information (they have integrated with SirsiDynix so far) powering the digital life of the user.
  • DLSG with its BSCAN Interlibrary Loan & Digital Document Delivery is easily and affordably (for less than the cost of 1 staff member for a month, initial purchase, and a small, in the $100s, optional yearly support fee) placing the power of immediate PDF-based interlibrary loan into the hands of any library. No muss, no fuss, no system understanding or technical knowledge required. Talk about ease of use, this is as simple as taking a book off the shelf and making a photocopy. This is a whole new level of service. This is near immediate, near idiot proof managed access to physical library collections anywhere, at any time, at minimal cost.
  • BluuBeam with its small plastic beacons and location-triggered alert capabilities is affordably and noninvasively allowing libraries to integrate their physical and digital presence through their patron’s handheld devices. Talk about patron engagement and personalization. Products like this with low points of entry and creative deployment; open the door to a whole new level interactivity and performance. This could be the beginning of demand-driven, patron-controlled, library-managed, location-specific, device-delivered service.

Whew! That’s a lot to say in one blog post! It’s too much to chew in a single three article series! It is quite frankly, insufficient to the tasks at hand. Don’t worry, we metadata wonks, agree with you. Moreover, we believe it is just the start of a conversation that needs to involve many more players and address many more issues, viewpoints, and perspectives. It is a conversation at the cross roads of centuries, services, and business cases. For the feedback we’ve received so far, it is an exchange of ideas that needs a home. Thus, as we close this series on Life, Liberty, and the Pursuit of Knowledge, we open the door to a “coming soon” Crossroad Conversations space at the former www.imteaminc.com address and add the voices of both Kathryn Harnish and a rotating series of virtual coffee participants.
Until then, two metadata wonks, signing off!

Mastering the Math: The Not-So-Mindless Musings of Two 21st Century Metadata Wonks

By Nannette Naught and  Laura Dawson

"Slow down, start at the beginning.”

“Decrease the drama, keep it simple, silly.”

“The facts, Ma’am. Just the facts.”

Long lost episode of Dragnet or guiding remarks made by system designers in a focus group at ALA? Honestly, it can be hard to tell the difference at times. And yes, we metadata wonks understand the hard, uncomfortable truth of this statement. Heck, we’ve lived it!

So now what? How do we as a knowledge industry move beyond the battle lines of our individual business cases? How do we as professionals productively collaborate across the divides of discipline and raison d’etre? WHERE DO WE BEGIN?

As we said last week, for our parts, we metadata wonks begin with trusted, experienced human knowledge acquisition and deployment engines with access to vetted, curated collections of knowledge. Trained information and data service professionals backed by institution-branded, collaboratively curated, community-aware resource collections, we access either in person, virtually, or via the ubiquitous, affordable software programs deployed on our phones, tablets, watches, and laptops — aka Libraries and Librarians.

So starting here, bringing Joe Friday with us into the 21st Century...

Fact: Learning is contemplative. Humans do not simply point and click their way to knowledge. Unlike machines, people do not simply additively take in information and automatically get smarter. Learning is an experience that requires active thought.

Volume (even organized volume) ≠ Wisdom.

Fact: Research is contextual and collaborative. Unlike simple mathematics, human learning is not necessarily incremental, but rather, oft nonsequential and unpredictable. Knowledge creation is a shared experience that requires active thought, appropriate access, and imagination.

1s and 0s (even organized, interconnected 1s and 0s)  Thought.

Fact: Inquiry is a dependent and serendipitous process. Unlike nascent, database-driven sort and retrieval algorithms, the human brain natively understands real world objects and complex relationships. The investigative act is a personal, emotion-inclusive, uniquely human experience that requires MORE. MORE than just machines with their 1s and 0s, fledging algorithms, and sales-driven, mob ruled, short-term, oft insecure, point and click document collections.

THE Web (even the linked, semantic web)  Library.

Which brings us back to last week’s conclusion, to move beyond current stagnation, the knowledge economy requires a library-led programming toolset and the resulting librarian-curated, web-formatted, web-accessible metadata that augments current sales-driven applications with knowledge- and language-driven contextual metadata to power its services and applications.

Which if we are honest, requires us all (aka not just us metadata wonks) to face yet another hard, uncomfortable truth — said efforts within the Library community seem stalled at best; in crisis at worst. Heck all of us, leadership and feet on the street alike, are living this truth daily! So why, oh why, can’t we take a clue from Nike and Shia LeBeouf and “JUST DO IT,” already?

Heck, as many correctly point out, library metadata was authoritative, global, and knowledge-based before anyone even conceived of the web, let alone email. Librarians, archivists, and their cadre of associated technical professionals were serving communities long before anyone received a degree in computer science, let alone web design or search engine optimization. And therein lies the rub. The rub or controversy that is holding us back: The epic battle between commercialized computer science and community-based library and information science.

Thankfully, here, the wisdom of Dragnet holds: Strip away the drama (aka commercial versus community) and get down to the facts (aka the science). And Voila, a simple fact, not to mention a very human truth, emerges: Computers are simply the vehicles through which library delivers knowledge and information (and for that matter commercial interests deliver their products) to communities (geographical, practice, institutional, academic, public, K-12, and specialized communities). Communities of people who are:

  • Driven by their individual wants, needs, and relationships;
  • Learning, researching, and inquiring in very human ways.

Which brings us metadata wonks to our main point for the week: We believe subtraction, not addition, is required, if we are to successfully escape the current trough of disillusionment.

We believe it’s time to STOP ceding authority, ownership, and, and frankly, definition of what is possible/needed, to the programmers.

We believe it’s time to QUIT relying on simple arithmetic, directory-like relational databases, and 5th grade level semantics, to discovery, deliver, and manage our resources, our patrons, and our businesses.

We believe it’s time to STAND UP for ourselves and the communities we serve by insisting that data, library, and information science aided by computer science move forward together as equals — no ugly step children, no domineering bullies allowed.

Then and only then, can we collaborate successfully across the divides of discipline and motivation to additively:

  • Move innovation forward, grounded in Tradition and Diversity.
  • Reliably extend service levels with Economics and Ethics.
  • Help ensure Peace by relying on shared Governance structures and Negotiation methodologies.
  • Achieve collective and individual Success through Innovation, Service, and Peace

Join us next week for ideas from Digital Book World about how our lifecycle partners the publishers, distributors, and aggregators see this playing out in 2017 product and service lines. And the following week for ideas from ALA MidWinter about how their lifecycle partners the librarians, researchers, and library technologists see this playing out in their institutions and within their communities of practice.

Lists, Damned Lists, and Statistics

In 1986, I came to New York for the first time (since visiting as a five-year-old), where I interned at Rolling Stone. I sublet a room on the Lower East Side, lived on lentil/rice concoctions, and learned about coffee carts, subway routes, homelessness, and the last shreds of the punk scene (my apartment was right over the Pyramid Club, and the East Village was full of mohawks, piercings, and tattoos at that time).

One of my tasks at Rolling Stone was to create the charts. This was done by calling up about 20 record stores all over the country, which someone had designated as key indicators, and checking on their sales rankings. I collated the data and submitted it to the managing editor, who might make some tweaks to it before running it in the next issue. And I learned some things about bestseller charts. Mostly that this type of data-gathering was less than scientific.

This method was replaced in 1991 by Nielsen's SoundScan service, which tallied sales from cash registers in thousands of stores. It was marginally more scientific - results were based on raw numbers rather than phoning around randomly and having the results edited to suit someone's tastes. Ten years later, Nielsen expanded its service to bookstores - BookScan was born.

Again, BookScan wasn't perfect. It can only track print book sales - because it relies on bar-code scanning technology. And it doesn't track non-traditional sales such as to libraries, or direct sales, or sales by online retailers.

And then...there's the New York Times Bestseller List. Today they announced that they are consolidating the lists, merging some print and digital charts, and dropping a few lists. The compilation of the NYT lists is secret even from the NYT Book Review staff - it's done by the news staff. But they have mentioned that it's done in similar fashion to what I used to do at Rolling Stone - communicating with bookstores around the country and tabulating sales by what they report in. My understanding is that this process now spans thousands of stores, as well as wholesalers who distribute to non-traditional book outlets. It scales more broadly than my efforts did in 1986, but the principle is still the same - self-reporting by stores, plus some kind of editorial "secret sauce".

Of course, the best source of sales (post-returns) is the publishers, who don't share this knowledge with anyone. So, just as we don't have a fully complete and authoritative repository of all publishing metadata, we don't have such a repository of all book sales data.

Basically, these lists come down to what you count, what you DON'T count, and what you CAN'T count. They are signposts, some more artfully created than others.