The New Yorker has a interesting, albeit very very slow-going, article on efforts by Google and Microsoft to digitize a good portion of all books ever printed. You’ve probably heard about the creation of this digital library and the controversy surrounding it — hello Mr. Copyright Infringement — and the article touches on that a bit, but it’s far more concerned with cataloging past efforts to catalog books. The reading can get a bit tedious, and quite admittedly I skipped around a bit, but it’s a neat read. In fact, an interesting stat about the number of books ever published was particularly eye opening:

A conservative reckoning of the number of books ever published is thirty-two million; Google believes that there could be as many as a hundred million. It is estimated that between five and ten per cent of known books are currently in print, and twenty per cent—those produced between the beginning of print, in the fifteenth century, and 1923—are out of copyright. The rest, perhaps seventy-five per cent of all books ever printed, are “orphans,” possibly still covered by copyright protections but out of print and pretty much out of mind.