Mastering the Math: The Not-So-Mindless Musings of Two 21st Century Metadata Wonks

By Nannette Naught and  Laura Dawson

"Slow down, start at the beginning.”

“Decrease the drama, keep it simple, silly.”

“The facts, Ma’am. Just the facts.”

Long lost episode of Dragnet or guiding remarks made by system designers in a focus group at ALA? Honestly, it can be hard to tell the difference at times. And yes, we metadata wonks understand the hard, uncomfortable truth of this statement. Heck, we’ve lived it!

So now what? How do we as a knowledge industry move beyond the battle lines of our individual business cases? How do we as professionals productively collaborate across the divides of discipline and raison d’etre? WHERE DO WE BEGIN?

As we said last week, for our parts, we metadata wonks begin with trusted, experienced human knowledge acquisition and deployment engines with access to vetted, curated collections of knowledge. Trained information and data service professionals backed by institution-branded, collaboratively curated, community-aware resource collections, we access either in person, virtually, or via the ubiquitous, affordable software programs deployed on our phones, tablets, watches, and laptops — aka Libraries and Librarians.

So starting here, bringing Joe Friday with us into the 21st Century...

Fact: Learning is contemplative. Humans do not simply point and click their way to knowledge. Unlike machines, people do not simply additively take in information and automatically get smarter. Learning is an experience that requires active thought.

Volume (even organized volume) ≠ Wisdom.

Fact: Research is contextual and collaborative. Unlike simple mathematics, human learning is not necessarily incremental, but rather, oft nonsequential and unpredictable. Knowledge creation is a shared experience that requires active thought, appropriate access, and imagination.

1s and 0s (even organized, interconnected 1s and 0s)  Thought.

Fact: Inquiry is a dependent and serendipitous process. Unlike nascent, database-driven sort and retrieval algorithms, the human brain natively understands real world objects and complex relationships. The investigative act is a personal, emotion-inclusive, uniquely human experience that requires MORE. MORE than just machines with their 1s and 0s, fledging algorithms, and sales-driven, mob ruled, short-term, oft insecure, point and click document collections.

THE Web (even the linked, semantic web)  Library.

Which brings us back to last week’s conclusion, to move beyond current stagnation, the knowledge economy requires a library-led programming toolset and the resulting librarian-curated, web-formatted, web-accessible metadata that augments current sales-driven applications with knowledge- and language-driven contextual metadata to power its services and applications.

Which if we are honest, requires us all (aka not just us metadata wonks) to face yet another hard, uncomfortable truth — said efforts within the Library community seem stalled at best; in crisis at worst. Heck all of us, leadership and feet on the street alike, are living this truth daily! So why, oh why, can’t we take a clue from Nike and Shia LeBeouf and “JUST DO IT,” already?

Heck, as many correctly point out, library metadata was authoritative, global, and knowledge-based before anyone even conceived of the web, let alone email. Librarians, archivists, and their cadre of associated technical professionals were serving communities long before anyone received a degree in computer science, let alone web design or search engine optimization. And therein lies the rub. The rub or controversy that is holding us back: The epic battle between commercialized computer science and community-based library and information science.

Thankfully, here, the wisdom of Dragnet holds: Strip away the drama (aka commercial versus community) and get down to the facts (aka the science). And Voila, a simple fact, not to mention a very human truth, emerges: Computers are simply the vehicles through which library delivers knowledge and information (and for that matter commercial interests deliver their products) to communities (geographical, practice, institutional, academic, public, K-12, and specialized communities). Communities of people who are:

  • Driven by their individual wants, needs, and relationships;
  • Learning, researching, and inquiring in very human ways.

Which brings us metadata wonks to our main point for the week: We believe subtraction, not addition, is required, if we are to successfully escape the current trough of disillusionment.

We believe it’s time to STOP ceding authority, ownership, and, and frankly, definition of what is possible/needed, to the programmers.

We believe it’s time to QUIT relying on simple arithmetic, directory-like relational databases, and 5th grade level semantics, to discovery, deliver, and manage our resources, our patrons, and our businesses.

We believe it’s time to STAND UP for ourselves and the communities we serve by insisting that data, library, and information science aided by computer science move forward together as equals — no ugly step children, no domineering bullies allowed.

Then and only then, can we collaborate successfully across the divides of discipline and motivation to additively:

  • Move innovation forward, grounded in Tradition and Diversity.
  • Reliably extend service levels with Economics and Ethics.
  • Help ensure Peace by relying on shared Governance structures and Negotiation methodologies.
  • Achieve collective and individual Success through Innovation, Service, and Peace

Join us next week for ideas from Digital Book World about how our lifecycle partners the publishers, distributors, and aggregators see this playing out in 2017 product and service lines. And the following week for ideas from ALA MidWinter about how their lifecycle partners the librarians, researchers, and library technologists see this playing out in their institutions and within their communities of practice.

Lists, Damned Lists, and Statistics

In 1986, I came to New York for the first time (since visiting as a five-year-old), where I interned at Rolling Stone. I sublet a room on the Lower East Side, lived on lentil/rice concoctions, and learned about coffee carts, subway routes, homelessness, and the last shreds of the punk scene (my apartment was right over the Pyramid Club, and the East Village was full of mohawks, piercings, and tattoos at that time).

One of my tasks at Rolling Stone was to create the charts. This was done by calling up about 20 record stores all over the country, which someone had designated as key indicators, and checking on their sales rankings. I collated the data and submitted it to the managing editor, who might make some tweaks to it before running it in the next issue. And I learned some things about bestseller charts. Mostly that this type of data-gathering was less than scientific.

This method was replaced in 1991 by Nielsen's SoundScan service, which tallied sales from cash registers in thousands of stores. It was marginally more scientific - results were based on raw numbers rather than phoning around randomly and having the results edited to suit someone's tastes. Ten years later, Nielsen expanded its service to bookstores - BookScan was born.

Again, BookScan wasn't perfect. It can only track print book sales - because it relies on bar-code scanning technology. And it doesn't track non-traditional sales such as to libraries, or direct sales, or sales by online retailers.

And then...there's the New York Times Bestseller List. Today they announced that they are consolidating the lists, merging some print and digital charts, and dropping a few lists. The compilation of the NYT lists is secret even from the NYT Book Review staff - it's done by the news staff. But they have mentioned that it's done in similar fashion to what I used to do at Rolling Stone - communicating with bookstores around the country and tabulating sales by what they report in. My understanding is that this process now spans thousands of stores, as well as wholesalers who distribute to non-traditional book outlets. It scales more broadly than my efforts did in 1986, but the principle is still the same - self-reporting by stores, plus some kind of editorial "secret sauce".

Of course, the best source of sales (post-returns) is the publishers, who don't share this knowledge with anyone. So, just as we don't have a fully complete and authoritative repository of all publishing metadata, we don't have such a repository of all book sales data.

Basically, these lists come down to what you count, what you DON'T count, and what you CAN'T count. They are signposts, some more artfully created than others.

Alphabet Soup: EAN

International Article Number

EAN is one of those acronyms that has outlived its original meaning. It used to stand for European Article Number, and was a bar-coding system mostly used in...Europe.

However, as global trade increased, and big-box stores began selling an increasingly large variety of products, it became clear that books were going to be among those products. Rather than re-stickering or having dual bar codes, it made sense for the ISBN bar code to become part of the EAN system. Thus the invention of Bookland. The EAN's first 3 digits are a specified country code. Books were given their own country - Bookland - with the digits 978 or 979. The rest of the Bookland EAN follows the pattern of the formerly 10-digit ISBN (language code, publisher prefix, item number), but the check digit is recalculated to accommodate the additional 3 digits.

More information about EANs (and Bookland) can be found here.

Of Life, Liberty, and the Pursuit of Knowledge: The Not-So-Mindless Musings of Two 21st Century Metadata Wonks

By Nannette Naught and  Laura Dawson

When starting something new, it seems best to start at the beginning, without preconceptions: Library metadata is problematic from a wide variety of perspectives. Just about every interaction with it is an exercise in frustration. From legacy issues, retro-conversion trip-ups, the introduction of new concepts like Linked Data, and the simple volume of STUFF that needs to be effectively described to a multitude of audiences – it’s safe to say that increasingly efficient technologies (like the Web, like faster computers) means that we’re looking at a burgeoning crisis.

So. Basic concepts:

  • What is metadata?
  • Why do we need metadata?
  • And while we’re at it, why, oh why, hasn’t someone – anyone – fixed it by now?!

Not exactly the scintillating outline for a “politically-charged, pulse pounding” theatrical thriller. Not exactly the reader-grabbing, panic-inducing tagline of a breaking news alert.  And yet, a cursory search on Google brings up results about “Snowden: The Movie” and the NSA. Even prompting related “People also ask” questions such as: “What is metadata collection?” which give us results like:

  • What is the NSA and what do they do?
  • Is the NSA still spying on us?
  • What is the NSA Surveillance program?

For our parts, we metadata wonks trust knowledgeable, experienced, human search engines with access to vetted, curated collections of knowledge (aka libraries and librarians) more than we trust fickle, opaque, largely sales-motivated and easily manipulated, computer algorithms with access only to properly formatted, mostly recent documents on the Web. Documents like news feeds and advertising or pseudo-advertising in the guise of a PSA that are seldom sourced, let alone vetted and peer-reviewed. Especially when it comes to important decisions such as those related to our jobs, safety, security, privacy, healthcare, and the like! (To say nothing of the problems of “fake news”.)

Why? Because research, inquiry, and learning (or, rather, knowledge acquisition and deployment) are NOT purchasing decisions. Nor are they simple, fact-based, database-driven information sorting-and-retrieval questions. Yes, we wonks believe the wisdom and hearsay of an anonymous (for those who can afford it) web-based crowd – be it the commercial crowd; the free and open, no-copyright or peer-review crowd; or one of the many NIMBY mobs – is an inadequate, and frankly immature resource for the tasks of life, liberty, and the pursuit of knowledge.

So now what? Where do we go from here, if we can’t rely on Google, Amazon, and/or Facebook alone to answer important questions? If these lauded, affordable platforms are really just sort-and-retrieval systems for the easily accessible sales and social media databases of the internet, how do we get access to vetted, curated collections of knowledge and trained, experienced information professionals (as opposed to scripted, offshore customer support staff) we need to learn and grow?

For us, it’s simple – use these software programs to access your library and librarians! Let’s look at our metadata search again. Let’s add the word “library” to it and see what happens. This simple addition totally changes the results, yielding a Journal of American Librarianship article (V 21, N 2, pp 160-163) by Karen Coyle in the top 5 hits and no NSA or or Snowden references.

Having now used the tool, a sales-driven algorithmic search engine, to access the library realm and the writing of a leading librarian (vetted and trusted by her peers over the course of a long and productive career), in less than 4 of her paragraphs we find concrete, intelligible answers to our initial questions:

  • What is metadata? “Metadata is cataloging done by men.”

This is a quip attributed to two notable librarians and library metadata luminaries, Tom Delsey and Michael Gorman. And though we do not know Michael personally, Nannette knows and has worked with Tom closely. We’re fairly confident both use the term in its classical, inclusive sense, of “mankind,” referring to all humans, regardless of gender and/or biological equipment. But then this is why wonks trust librarians and librarian-guided, vetted programming over anonymous algorithms. Librarians are in the business of knowledge and value diversity. Anonymous, sales-driven algorithms are in the business of advertising and generally value the biases of BOTH commercial interests who have purchased or negotiated their way to preferential sort-and-retrieval placement (think of Ad Words as shelf positioning in a grocery store; you know that name brand is always at eye level, the generic products are at the bottom, and the nutritious stuff is up high) and their equally anonymous programmers.

  • How does metadata work? “…metadata is constructed information, which means that it is of human invention and not found in nature.”

This is an important point that Google and other algorithmic, natural language processing platforms overlook. Metadata is human, it takes a human act to create. Humans and their actions are by definition contextual. And unfortunately for most algorithms, most humans do not explicitly state their present context or that surrounding their desired outcome out loud. Let alone take the time to type it into an interface. Again, ask a reference librarian; that’s why they do introductory interviews in the same fashion that a doctor interviews you about your symptoms and concerns, only then discussing treatment with you. So too, your librarian attends to your very human, very individualized knowledge diagnosis and care needs.

  • Why do we need metadata? “… necessary characteristic of metadata: metadata is developed by people for a purpose or a function.”

This is a critical pivot point for all knowledge acquisition and deployment activities, be they research, inquiry, or learning related. Success or failure, and its accompanying metric (aka relevance), hinge on appropriateness or fitness for the human purpose against which the metadata is being deployed, as measured against the use case (need, model, and definition) under which the metadata was created, collected, and/or enhanced. And therein lies the rub: too much, way too much of our metadata, both that in the world at large and that in libraries specifically is created, collected, aggregated, and deployed without adequate, explicitly documented, appropriately serialized, adequately tested needs, models, and definitions.

  • Why, oh why, hasn’t someone, anyone fixed it by now?!

The answer seems to be the lack of library-led, governed, and administrated, explicitly documented, appropriately serialized, adequately tested knowledge acquisition and deployment use cases, metadata models, and element/term definitions – for our vendors, the web, publishers, and others in the knowledge economy to deploy.

As for our earlier trick of prefacing a search with “library” to access vetted, curated collections of knowledge and trained, experienced information professionals – why can’t we just do that going forward?

Simple. Without the above-mentioned library-led programming toolset and the resulting librarian-curated, web-formatted, web-accessible metadata that augments current sales-driven applications with knowledge- and language-driven contextual metadata, there is nothing (or at least very little) to power the applications.

Which of course leads to the question: “Ok, metadata wonks, you said it. WHERE DO WE BEGIN?”

To which we have to say, “honestly, we don’t pretend to know!” However, over the course of our 20+ year careers, we wonks – acting as consultants, product managers, knowledge product developers, and strategists – have learned a thing or two about metadata and knowledge economics, not to mention listening. Listening to the users and creators of knowledge, as well as their lifecycle partners the publishers, distributors, librarians, administrators, researchers, aggregators, lawyers, and business people who make the knowledge economy go round. And we have become quite the conversationalists and discussion moderators. Thus, this series of posts – a record, if you will, of not just our musings, but our ongoing conversations with the ourselves and industry nerds like ourselves. Conversations we hope will help us all pave the way answering the publics’ need for knowledge driven, library-led web-based inquiry.

Next week we'll dive into Mastering the Math:

Tradition + Diversity = Innovation
Economics + Ethics = Service
Negotiation + Governance = Peace
Innovation + Service + Peace = Success

Alphabet Soup: GTIN

Global Trade Item Number

GTIN is a number assigned to a trade-able object. Developed by GS1, it resolves numbering schemes such as ISBN, ISMN, UPC, EAN, ISSN into a single "numbering space" so that scanners and databases that use the GTIN system can make use of these other numbering systems as well. This allows, for example, supermarket scanners to scan bar codes on paperback books that they might be selling. It also allows warehouses to scan cartons from publishers as well as chewing gum manufacturers. It's a way to allow for the efficient processing of products regardless of what vertical they happen to be from.

More about GTIN can be found here.

On Taxonomies, Mapping, and Loss

The world of books represents the world of human thought. Concepts articulated, written down, codified, published. But of course, our understanding of these concepts can vary – by nationality, cultural background, experience, philosophy of life. The word “alienation,” for example, can mean different things to different people. It can be expressed differently in different languages – by a single word, or by a phrase rather than a word. And, in fact, in cultures all over the world, many words can be used to describe phenomena like “snow”, “walking” – think of how we describe colors in the Crayola box, for example, or the Pantone chart.

Words carry nuance that’s not always immediately apparent, which is why non-native speakers of languages tend to struggle, and why translations nearly always lose meaning. And it’s way our systems of categorization are the most subjective and argued-about forms of metadata.

Taxonomies, in particular, are inherently political and authoritarian. They are hierarchical. Taxonomies are, essentially, what we call “controlled vocabularies”. Which begs the question: Who controls them? Do we trust those people to express what we mean? What if we disagree?

As in politics, taxonomies evolve as society evolves. What used to be “Negro history” became “Afro-American history”, which became “African-American history”. What used to be “Occult” became “New Age”, which became “Body/Mind/Spirit”.

Taxonomies reflect our understanding of phenomena. And that understanding is deeply colored by our culture, our experience, our politics, and our vision of the world. It varies from person to person. Taxonomies are a compromise, a consensus.

They’re the result of committee work. Taxonomies are rarely finalized. They shift and change depending on cultural mood, society’s evolution, and market trends. They are living things.

I just want to go over some of the issues that we see in book commerce – where Amazon, B&N, and other booksellers have their own proprietary codes. How are those created? And how do BISACs influence them?

When a publisher communicates information about a book to a distributor or retailer, that publisher will assign a series of BISAC codes. Online retailers, as we know, have their own proprietary codes, based on how their users search and browse for books. Retailers such as Amazon and Barnes & Noble.com tend to look at BISACs as useful suggestions. They map BISAC codes to their own codes, and they each make separate decisions about which BISACs map to which proprietary code. A code like SOCIAL SCIENCE - Media Studies might map to a scholarly sociology code at B&N, and a commercial code (such as "media training") on Amazon.

Mapping data points always results in the loss of some meaning or context. Some categories don’t cleanly line up to others. So mapping one taxonomy to another is yet another compromise – one we have to live with in a taxonomic, hierarchical world.

In Which We Are 1

Numerical Gurus, LLC, has just turned a year old! We're celebrating by...buckling down and working harder than ever, basically. We've got loads in store for 2017 - new webinar programs, new writing, and lots more educational sessions at conferences throughout the year.

In the meantime, we are still here for your metadata optimization needs. From help with keywords, to standardization and normalization, to troubleshooting your EPUB issues, we provide back-end support for publishers of all types - scholarly, academic, Christian, association, trade, independent, small, self-, and specialty. As you can see from our portfolio, we cover the gamut of publishing needs.

Things to watch for:

1/17/17 - Laura is giving a "master class" in identifiers at Digital Book World. Come for the ISBN, stay for the ORCID! Learn things you never knew you wanted to know.

1/23/17 - A new column in Publishers Weekly!

2/2/17 - A new round of Metadata Boot Camp! New content, new concepts! So many exclamation marks!!!