032: Digital Publishing's Past

Like Ebenezer Scrooge in Charles Dickens’ A Christmas Carol, whose past, present and future spirits arrive on Christmas Eve to show him a side of himself he can’t see from within, I too will be taking you on a journey, although it will likely prove less prolific and transformative than Scrooge’s experience. Nevertheless, over the next three episodes of Talk Paper Scissors, I will be your guide through the past, present and future of digital publishing.

There’s lots to learn about digital publishing of yesteryear, including the technologies that made it mainstream. That’s what this episode is all about, where I will cover the first 40 years or so of the history of digital publishing from 1971 to about 2010.

The second episode will capture the snapshot of the last 10 years, from 2011 to 2021. Digital publishing of today represents a variety of formats on platforms both new and established.

And while we can never know about digital publishing of tomorrow until it becomes the present, there are exciting new opportunities for connection that we will explore, including the ways in which this hyper-connectedness may directly and indirectly impact the world of digital publishing. The final episode will cover trends and what might happen in 2022 and beyond.

A logical starting point before we look behind us, observe where we’re at right now and squint to see the future, is to simply define the term “digital publishing”. Let’s start there.

“Digital” refers to the digitization, or computerization, of content. Whether viewing content on a screen or listening to digitally stored music or a podcast in our ears, we live in a very digital world. Digital mediums and formats are useful for many reasons, such as their ability to be replicated an infinite number of times, shared instantly and globally and be remixed into something new. With the advent and commercialization of the Internet came new avenues to distribute content to newfound audiences.

This segues nicely into “publishing”, for which the dictionary definition states: “the occupation, business, or activity of preparing and issuing books, journals, and other material for sale”. However today’s digital landscape has broadened the scope of this definition in a way that the publishing industry even 20 years ago is unlikely to have predicted. We are all now publishers. When we create content and put it out for the world to see, whether in the form of photographs or video or writing or music or any combination thereof, we are publishers. The Internet has democratized the publishing process and if you want to get something out into the world, you no longer have to receive permission and approval and cut through red tape to do so, you can just publish it for the world to see. And if the Internet were a flame that ignited the self-publishing revolution then social media is the gasoline, the accelerant, poured onto the flames that caused them to shoot up into the air for all to see. Social media added fuel to the fire and facilitated new ways to share content with friends, family and like-minded individuals across a variety of platforms with precise algorithms getting content to people who will be most interested in seeing it. An audience can be found for nearly every imaginable subject; every unique niche topic and area of interest has a home on the Internet, unrestricted in mass, not bound by any physical size constraints. This has created a new phenomenon whereby appealing to the masses has been eclipsed by appealing to individuals, which one of my favourite creativity thought leaders, Seth Godin, explains in his wonderfully quirky little book ‘We Are All Weird”. But I digress. I’m getting ahead of myself.

When we put “digital” and “publishing” together, we come to a definition that could encompass everything from PDFs and eBooks to apps and games and social media. For purposes of this 3-part podcast mini-series, we’re sticking to digital publishing in the more traditional publishing sense. Documents and books that live in digital format, weather locally on individual computers and/or shared on interconnected platforms. That said, the potential role that social media could play in the future of the book can’t be denied. I’ll be veering into the mainstream media lane when we take the exit ramp into the future of digital publishing.

Let’s do this. Put on your slippers and overcoat, and let’s head out into the cold night air as I introduce you to digital publishing’s past.

We’ve arrived in the 1970’s.

1971 to be exact.

In his wildest dreams Gutenberg couldn’t have imagined the long-lasting impact of his efforts or the trajectory of his converted wine press. What started as a way to print bibles efficiently and profitably has transformed our entire world, enabling all of the communications technologies leading to today, as well as all other wonders of modern life.

It’s fitting then that the widely recognized birth of digital publishing started in 1971 (predating the modern Internet), with the launch of Project Gutenberg by Michael Hart. Project Gutenberg is an online library of free digital books; their mission “to encourage the creation and distribution of eBooks”. It started at the University of Illinois with Hart hand-typing ‘The Declaration of Independence’, as he tried to navigate how he would add value to the world and repay the immense amount of computer time he’d been gifted by the university (equivalent to $100,000,000).

In a 1992 article by Hart, entitled The History and Philosophy of Project Gutenberg, he states: “The Project Gutenberg Philosophy is to make information, books and other materials available to the general public in forms a vast majority of the computers, programs and people can easily read, use, quote, and search.”

Hart believed from the beginning of his Project Gutenberg journey that the “...greatest value created by computers would not be computing, but would be the storage, retrieval, and searching of what was stored in our libraries”. The entire premise upon which Project Gutenberg was founded was that anything that could be entered into a computer meant that it could be replicated an indefinite number of times. He used the term “Replicator Technology” and conceivably, anyone and everyone in the world (even outside of this world!) could have a copy of a document that lives inside of a computer. “Electronic Texts” (Etexts) are made available by Project Gutenberg so that works are in the simplest format and accessible to the greatest number of people. Project Gutenberg is all about free access of valuable works to the greatest number of people.

I recently searched for this first entry on the Project Gutenberg site and, sure enough, there it was; the very first entry with a release date of December 1, 1971, filed away on the Internet in an appropriately modest and historical URL that sums it up wonderfully: http://www.gutenberg.org/ebooks/1. That was 50 years ago. Project Gutenberg now boasts 60,000 free eBooks in both ePub and Kindle formats.

1980’s here we come.

The year is 1982 and the first CD-ROM (Compact Disc with Read Only Memory) just became commercially available. This changed the way people were able to share digital information. CD-ROM’s obviously had important uses in the music and video industry (a vintage Aqua CD still lives happily in my car), but publishers also found a way to make use of the technology. Publishers like National Geographic magazine sold and distributed CD-ROMs of their issues. This allowed for digital copies to be made available and viewed on screen at a time when the Internet wasn’t widely available for home use yet. Furthermore, digital rights management (or DRM, something we’ll be exploring in the next podcast episode) could be controlled by the publisher when distributed in CD-ROM format.

Additionally, CD-ROMs could hold so much text information that an entire’s encyclopedia’s worth of text could be contained on a single CD-ROM. The entire 21-volume, 65 lb., 9,000,000 word Grolier’s encyclopedia could fit onto a single compact disc. This was mind blowing at the time. To give you a little perspective, a CD-ROM in the 1980’s could hold about 700 MB of data. A fairly standard and cheaply available 16 GB USB stick contains 23x the data storage of a single CD-ROM. After the CD-ROM came some exciting and groundbreaking technological advances in the world of digital publishing...

Here’s looking at you, 1990’s.

Welcome to 1992! The Toronto Blue Jays baseball team won the World Series championship, Home Alone 2: Lost In New York premiered in theatres and the PDF was born. It was a BIG year.

A PDF (or Portable Document Format) is a cross-platform compatible, now staple of the modern digital publishing world. Appearing the same on any device or operating system makes information exchange seamless and elevates the PDF to modern marvel. They are easy to make, send and receive.

The year prior in 1991, the co-founder of Adobe, Dr. John Warnock, began ‘The Camelot Project’ with the aim of anyone being able to capture documents from any application and send it electronically anywhere and print them on any machine. By 1992, the format became a reality when the PDF was created.

PDF is now an open standard, overseen by the International Standards Organization (ISO). PDF’s can contain all sorts of information: text, images, links, buttons, form fields, audio, video and even business logic.

PDF files are also the defacto standard for the printing industry. Which means, whether a publication is intended for digital publication, traditional publication or both, PDF files will play an integral role in the publishing process.

There are many different PDF Standards, all with different purposes. Some of the most common in the publishing world include: PDF/X (for print and creative professionals because high resolution images, fonts and colour profiles are embedded within), PDF/UA (designed to aid in accessibility and readability for those who use screen readers - the UA stands for ‘Universal Access’) and PDF/VT (also for print professionals, but specifically for those who use PDFs to customize information, such as information in bank statements or marketing material - the VT stands for variable and transactional), among others.

We’re going to hear more about PDFs and what the future holds for this universal standard in digital publishing in a later episode.

Let’s jump ahead two years to 1994 when the first online diary (later coined ‘Blog’) is written.

Then-student, Justin Hall, is credited with creating the first blog, Links.net (a URL still in existence and added to regularly!). He used this platform to publish his writing and provide links to content. His first entry in January 1994 read:

Welcome to my first attempt at Hypertext

Howdy, this is twenty-first century computing... (Is it worth our patience?) I'm publishing this, and I guess you're readin' this, in part to figure that out, huh?

High Stylin' on the Wurld Wyde Webb

This is a Hypertext server using MacHTTP v1.2.3 running on a Powerbook 180 w/ 8 RAM and a 120 HD. It is currently being broadcast from the depths of Willets, a dorm nestled in the shrubbery here at Swarthmore College in Swarthmore, Pennisylvania.

I’m getting a little ahead of myself timeline wise, but I want to share with you a particularly interesting example of success in the blogosphere. Julie Powell was a modern New York woman working in a job she hated, trying to navigate life and reignite her passion for life. In trying to figure this out, Julie decided to embark on a journey of cooking every one of the 524 recipes in Julia Child’s cookbook, Mastering the Art of French Cooking; a lofty goal for someone who “had never eaten an egg before she tackled Oeufs a la Fondue de Fromage”. She decided to call her experiment The Julie/Julia Project and chronicle her adventure on a blog, beginning in 2002.

Julie’s blog grew a large following and it was featured in an article in The New York Times which really put it on the map. Publishing company, Little, Brown and Company offered Julie an opportunity to develop a book about her experience. The resulting book, published in 2005, was called Julie and Julia: 365 Days, 542 Recipes, 1 Tiny Apartment Kitchen. That’s a pretty big deal for a little blog. An even bigger deal was when the film rights were purchased on the story was adapted into a feature film starring Amy Adams as Julie and Meryl (Freaking!) Streep as Julia Child. The film was released in 2009. What a ride! (As an aside, who would you play you in a film about your life? Me? Tina Fey, please.) While the real Julia Child wasn’t a fan of Julie’s blog saying “I don’t think she’s a serious cook”, Julie was awarded an honorary diploma from Le Cordon Bleu, which is the same cooking school that Julia Child graduated from in 1951. What a blog-tastic story!

In the next episode of this podcast, looking at digital publishing’s present, we’ll hear about another blog turned movie blockbuster success.

With that little skip ahead behind us, let’s now officially head Into the new millennium!

The Year is 2000.

The world’s banking systems didn’t set themselves back 100 years and Will Smith may have released the greatest song of our time (in my humble opinion). Google, founded only two years earlier, had a very ambitious goal: they aimed to digitize the world’s books.

In an absolutely incredible 2017 article by The Atlantic, entitled Torching the Modern-Day Library of Alexandria: Somewhere at Google there is a database containing 25 million books and nobody is allowed to read them. Author James Somers details the long-held dream of a “universal library”; one-click access to a collection of nearly every book that has ever been published that’s out of print available for free at terminals in every local library the world over. That was the dream. It wasn’t the capabilities of the technology - hardware or software - that stopped this dream from becoming a reality, instead it was the judicial system (and scholars, archivists and librarians standing opposite Google in the courtroom) that saw the modern-day library of Alexandria go up in virtual flames.

“Project Ocean” (Google’s effort to scan every book in the world) began in 2002 when Google co-founder, Larry Page, sat down with a 300-page book and a metronome to figure out how long it would take to scan a book. It took 40 minutes. And at that rate, scanning the 7 million volumes just contained within the University of Michigan’s collection would take… about a thousand years. Page told the university that they could do it in six years.

With this lofty challenge ahead of them, semi trucks filled with books showed up at Google scanning centres every day of the week. It took just over a decade, but Google outpaced Page’s ambitious previous goal, having scanned almost 25 million books in just over 10 years time, costing the company approximately $400 million dollars.

How did they do it?

The article describes one Google scanning site, a converted office building on Google’s Mountain View campus as custom-made scanning/photographing stations arranged in rows, each with a human operator and each station could digitize 1,000 pages per hour. Each station contained four cameras, two pointed at each half of the book and technology that would compensate for the curvature of the book. Pages were turned by hand and the cameras would fire when the operator manually pressed a foot pedal. Software employed de-warping algorithms to adjust pages after they were scanned, speeding up the scanning process. 50 full-time software engineers were assigned to the task, creating OCR (optical character recognition) software that turned raw images into text, creating the de-warping, colour-correction and contrast-adjustment instructions, as well as detect illustrations and diagrams, page numbers and turn footnotes into real citations.

In a perfect marriage of established and new-fangled digital publishing technology, in August 2010, Google released a blog post that announced that there were 129,864,880 books in the world… and the company was going to scan them all.

Holy smokes.