The Chronicle of Higher Education reports on a database’s sort-of wonderful problems:
Then there are the classification errors, which taken together can make for a kind of absurdist poetry. H.L. Mencken’s The American Language is classified as Family & Relationships. A French edition of Hamlet and a Japanese edition of Madame Bovary are both classified as Antiques and Collectibles (a 1930 English edition of Flaubert’s novel is classified under Physicians, which I suppose makes a bit more sense.) An edition of Moby Dick is labeled Computers; The Cat Lover’s Book of Fascinating Facts falls under Technology & Engineering. And a catalog of copyright entries from the Library of Congress is listed under Drama (for a moment I wondered if maybe that one was just Google’s little joke).
You can see how pervasive those misclassifications are when you look at all the labels assigned to a single famous work. Of the first 10 results for Tristram Shandy, four are classified as Fiction, four as Family & Relationships, one as Biography & Autobiography, and one is not classified. Other editions of the novel are classified as ‘Literary Collections, History, and Music. The first 10 hits for Leaves of Grass are variously classified as Poetry, ‘Juvenile Nonfiction, Fiction, Literary Criticism, Biography & Autobiography, and, mystifyingly, Counterfeits and Counterfeiting.