Alphabet Soup: GTIN

Global Trade Item Number

GTIN is a number assigned to a trade-able object. Developed by GS1, it resolves numbering schemes such as ISBN, ISMN, UPC, EAN, ISSN into a single "numbering space" so that scanners and databases that use the GTIN system can make use of these other numbering systems as well. This allows, for example, supermarket scanners to scan bar codes on paperback books that they might be selling. It also allows warehouses to scan cartons from publishers as well as chewing gum manufacturers. It's a way to allow for the efficient processing of products regardless of what vertical they happen to be from.

More about GTIN can be found here.

On Taxonomies, Mapping, and Loss

The world of books represents the world of human thought. Concepts articulated, written down, codified, published. But of course, our understanding of these concepts can vary – by nationality, cultural background, experience, philosophy of life. The word “alienation,” for example, can mean different things to different people. It can be expressed differently in different languages – by a single word, or by a phrase rather than a word. And, in fact, in cultures all over the world, many words can be used to describe phenomena like “snow”, “walking” – think of how we describe colors in the Crayola box, for example, or the Pantone chart.

Words carry nuance that’s not always immediately apparent, which is why non-native speakers of languages tend to struggle, and why translations nearly always lose meaning. And it’s way our systems of categorization are the most subjective and argued-about forms of metadata.

Taxonomies, in particular, are inherently political and authoritarian. They are hierarchical. Taxonomies are, essentially, what we call “controlled vocabularies”. Which begs the question: Who controls them? Do we trust those people to express what we mean? What if we disagree?

As in politics, taxonomies evolve as society evolves. What used to be “Negro history” became “Afro-American history”, which became “African-American history”. What used to be “Occult” became “New Age”, which became “Body/Mind/Spirit”.

Taxonomies reflect our understanding of phenomena. And that understanding is deeply colored by our culture, our experience, our politics, and our vision of the world. It varies from person to person. Taxonomies are a compromise, a consensus.

They’re the result of committee work. Taxonomies are rarely finalized. They shift and change depending on cultural mood, society’s evolution, and market trends. They are living things.

I just want to go over some of the issues that we see in book commerce – where Amazon, B&N, and other booksellers have their own proprietary codes. How are those created? And how do BISACs influence them?

When a publisher communicates information about a book to a distributor or retailer, that publisher will assign a series of BISAC codes. Online retailers, as we know, have their own proprietary codes, based on how their users search and browse for books. Retailers such as Amazon and Barnes & Noble.com tend to look at BISACs as useful suggestions. They map BISAC codes to their own codes, and they each make separate decisions about which BISACs map to which proprietary code. A code like SOCIAL SCIENCE - Media Studies might map to a scholarly sociology code at B&N, and a commercial code (such as "media training") on Amazon.

Mapping data points always results in the loss of some meaning or context. Some categories don’t cleanly line up to others. So mapping one taxonomy to another is yet another compromise – one we have to live with in a taxonomic, hierarchical world.

In Which We Are 1

Numerical Gurus, LLC, has just turned a year old! We're celebrating by...buckling down and working harder than ever, basically. We've got loads in store for 2017 - new webinar programs, new writing, and lots more educational sessions at conferences throughout the year.

In the meantime, we are still here for your metadata optimization needs. From help with keywords, to standardization and normalization, to troubleshooting your EPUB issues, we provide back-end support for publishers of all types - scholarly, academic, Christian, association, trade, independent, small, self-, and specialty. As you can see from our portfolio, we cover the gamut of publishing needs.

Things to watch for:

1/17/17 - Laura is giving a "master class" in identifiers at Digital Book World. Come for the ISBN, stay for the ORCID! Learn things you never knew you wanted to know.

1/23/17 - A new column in Publishers Weekly!

2/2/17 - A new round of Metadata Boot Camp! New content, new concepts! So many exclamation marks!!!

Alphabet Soup: ASIN

ASIN stands for Amazon Standard Identification Number. Amazon assigns this to all products sold on its sites. For books, the ASIN is the ISBN. For non-book products (tee-shirts, lawn furniture) or extra-book products (chapters, short stories, etc.), the ASIN is an internal identifier that assists in transactions. The ASIN is a proprietary identifier – in other words, no other merchant besides Amazon will ever require it – which means that smaller manufacturers (or publishers) are locked into sales with Amazon because Amazon essentially supplies the bar code for online sales – implied in the ASIN. More about the ASIN can be found here. (The interaction between ASIN and global sales is interesting!)

From Systems To Metadata

There are many services out there that handle workflows. Some are comprehensive, like IngentaConnect and Klopotek. These cover every aspect of the publishing process, from title management to warehousing to metadata distribution. Some focus on specific parts of the publishing process – Firebrand focuses on title management and metadata extracts; Iptor includes modules for paper/print/binding and warehouse functionality; MetaComet focuses exclusively on royalty tracking. You may find yourself having to use portions of some put together – for example, Firebrand and MetaComet are able to integrate. Or you may find yourself using non-publishing-specific tools like SAP, which feed into systems like Firebrand or Klopotek.

Or, you may have your own in-house tools to manage workflows. Smaller publishers have made their businesses work on a series of spreadsheets stored in a central location, to which only a few people have access. Or a SQL database to handle title management with third-party tools for other functionalities.

All of which is to say that, in my experience, workflow management systems are somewhat Rube-Goldberg in nature, and there are usually systems talking to other systems. There’s double-keying – entering the same information in multiple systems. And there’s also a lot of sneaker-net.

These systems’ relationships to one another are complex. I’ve worked at McGraw-Hill, Bowker, Barnes & Noble, and consulted to many, many publishers and aggregators – and I have NEVER seen a smoothly running set of interoperating systems where everything worked elegantly and produced perfect and timely metadata with a minimum of effort. One or two components may be problem-free, but as a whole, there are ghosts in our machines.

And that affects workflow, of course. Certain jobs can only run at night, which means real-time data isn’t available. Your warehouse data runs on an open source platform which isn’t sufficiently supported. Your sales staff keeps entering endorsements in the reviews field because their system doesn’t have an endorsements field. Your company has acquired another company and the systems need to merge. Your digital asset management system is literally a box of CDs.

So there are lots of points where these systems don’t align perfectly, and that is going to affect the quality of output.

One way to begin to tackle this is to structure your system so that there’s a single repository everyone can tap into. Fran Toolan at Firebrand calls it the “Single Source of Truth” – having a central repository means you don’t have competing spreadsheets on people’s hard drives, or questions about whether you’ve got the latest version of information about a book.

Read more about these types of problems in the publishing supply chain in The Book On Metadata.

ORCID

orcid-logo

ORCID stands for Open Researcher Contributor ID. It is in use primarily in academic, scholarly, and STEM research.

ORCID is a 16-digit identifier, just as ISNI is. In fact, ISNI "carves out" numbers from its own database for ORCID's use so there are no data integrity issues between the two standards.

It's commonly thought that ORCID and ISNI are competing identifiers. Those who have worked on both standards would tell you that isn't true. ORCID began as a self-claiming system for individual researchers. The researcher controls the profile and what goes in it. Because ORCIDs are intended to follow a researcher's career, often the only criteria a beginning researcher has is an email address. That is the only qualification for getting an ORCID assigned to you. ISNIs have much more stringent requirements.

ORCIDs are not assigned to deceased people (so Isaac Newton, for example, doesn't have one - but he does have an ISNI!). They're primarily used in grant and funding applications. The ORCID website does allow you to link your ORCID profile to your ISNI profile.

To learn more about ORCID, click here!

The Web Came For Books

In 1998, when I began working there, BN.com had 900,000 titles in its database for sale - representing the entire availability of books at that time.

Bowker reports there are over 38 million ISBNs in its database now. There are some caveats to this: some of these ISBNs don’t represent viable products, some are assigned to chapters rather than whole books…however, there are also a sizeable number of books (via Smashwords, Kindle, and other platforms) that never make it into Books in Print. We don’t know if this evens things out, or if 38 million is the minimum number of books available in the US market today.

That’s over a 4000% increase.

Further complicating this scenario, we are living in a world where the content is born digitally. It can be produced and consumed rapidly, which is why there is so much of it, and why there is only going to be more of it. Lots and lots of information and entertainment. Lots and lots of, essentially, data.

Nothing ever goes away anymore.

Another factor is that the internet provides a persistence even to physical objects. With the web, nothing goes away, even physical objects – they only accumulate (on eBay, in vintage shops, and in libraries). They accumulate and accumulate. And books are very much a part of this accumulation. We don’t order books out of paper catalogs anymore. We order books off the web.

And, in many cases, we order books that ARE websites – packaged into…ePub files.

We now have 38 million books to choose from. We also order music and movies over the web – and we do frequently don’t see a physical medium for most of these things. Physical media get scratched, damaged, lost, borrowed and never returned. But digital is forever, and there’s a freaking lot of it.

At some point (and remember, we’re in a world of rapid development, explosion of content, and ever-more-sophisticated ways of consuming it – so “at some point” could actually be sooner than we think it ought to be), search engines and online catalogs go one step further than asking publishers (and other manufacturers) for product metadata in a separate (e.g. ONIX) feed. They are increasingly going to want to derive that metadata (and more detailed metadata) directly from the file representing that product itself. In our case, the EPUB file - a “website in a box”.

This means that publishers are not only going to have to get good at creating and maintaining metadata at a pace that can sustain a 3000% increase over 18 years (so vastly more products to keep track of), they are going to have to get good at doing this inside the book file itself, which means not only grappling with markup language, but treating the EPUB file as a (really long) web page.

And at volumes that are unprecedented – because (a) publishing is easier than it has ever been before and (b) no book published now ever goes away.

This kind of rapid development doesn’t just change your workflow – it changes what and how you publish. And the more publishers understand about the web, the more likely they are to survive.

This is a different kind of survival than just holding on through a bad time, waiting out an economic downturn. This is survival that depends on evolution. On change. On new skills and abilities and ways of looking at things – while keeping in mind where we have come from and how we got here.

Let’s go back to the problem of content proliferation. How are we going to manage it, organize it, feed the search engines in ways that they understand so that normal people who think Google is magic can actually find it, discern it, and read it?

We impose a structure on it. We take that mess and organize the hell out of it. And yes, it has to be us - the book industry.

The search engine industry doesn’t really care what results they display. Books are no more important to a search engine than anything else – it’s all data. If we want to make the search engine work for us, we have to engage it. We have to understand how it searches, what’s most effective on it. Just as the industry worked very hard in the 1990s to understand superstores and how they displayed books and what co-op could get us, so must we understand storefront of search.

In an age of this much abundance, it’s not enough to simply create a thing and then offer it for sale on the web. We have to understand how the market works. And the market revolves around search.

BNC To Retire ONIX Converters

Booknet Canada is announcing the retirement of two of its ONIX converters - the Bronze Template Excel-to-ONIX converter, and the ONIX 2.1-3.0 converter.

Their reasoning is below:

Creating ONIX files from the Bronze Template is no longer a sufficient solution for today's metadata needs. It doesn't provide support for full and complete book data (for example, spreadsheets can't be used to manage images), and the industry as a whole should be entirely reliant on ONIX. The spreadsheet template and converter just aren't cutting it anymore. It's time to invest in a database – either in-house or through a third party.

And while ONIX 2.1 continues to be widely used in North America, support for it has ceased and use of the ONIX 2.1 to 3.0 converter has been declining steadily. It is also no longer required for our ONIX education program. ONIX 3.0 is the way to go!

See their announcement here. The takeaway here is that we're moving to an ONIX 3.0 world, and reliance on 2.1 (or 2.0!) is probably not wise.