Library Catalog Records

I’d always assumed that catalog records were based on MARC, and that MARC was a guideline or standard like METS, MODS, or TEI, or even HTML or XML. After all, SGML is one heck of a powerful grandparent for modern record formats, right? And for printing, TeX, LaTeX, and BibTeX have been around for ages, so there’s no way that an archaic punch-card style technology could be in use at almost every library in the US, right? Sadly, no, I was wrong.
My assumptions on what MARC must be have kept me from helping to fix the problems that stem from what it actually is. I’m also now worried about what other dead technologies might be in widespread use that are directly related to library operations. Please note that I’m not in any way attacking the ideas that underly MARC records. We need bibliographic records, and metadata and organizational systems are essential. MARC is just a mix of the transfer protocol, data definition, data structure, data display, and actual data content. It’s a thing optimized to print card catalog cards in a card catalog world.
Cards in card catalogs have defined data elements (author, title subject, call number, etc) and they have an organizational method and so extrapolating that to defined fields should be easy. Except, defined fields in MARC are always within the record. The minimum part of a MARC record is a single full MARC record read line by line. You can’t skip ahead because the field leaders note where the field begins and how long it will be. I saw the weird number sequences and leaders for elements, and I assumed that those were either shorthand or they were habit-based preferences that people chose to use. After all, the catalog record has defined data components for bibliographic and authority records (named people, corporations, other entities), so it had to be a matter of preference to displaying an author like this:

ME:Pers Name     100 1# $a Brenner, Richard J.,
                        $d 1941-

The $a for author had to be a shorthand, and so must be the 100 1#, because they had to be. It could not be the case that this shorthand was actually needed and that almost every library with an electronic catalog was still wedded to the technology made to optimize the printing of card catalog cards in the 1970s (or before? it’s updated to deal with unicode, at least MARC21 is, but this is punch card or telegraph style technology).
Take a look at this MARC record:

01041cam  2200265 a 450000100200000000300040002000
a###89048230#/AC/r91^##$a0316107514 :$c$12.95^##$a
0316107506 (pbk.) :$c$5.95 ($6.95 Can.)^##$aDLC$cD
LC$dDLC^00$aGV943.25$b.B74 1990^00$a796.334/2$220^
10$aBrenner, Richard J.,$d1941-^10$aMake the team.
$pSoccer :$ba heads up guide to super soccer! /$cR
ichard J. Brenner.^30$aHeads up guide to super soc
cer.^##$a1st ed.^##$aBoston :$bLittle, Brown,$cc19
90.^##$a127 p. :$bill. ;$c19 cm.^##$a"A Sports ill
ustrated for kids book."^##$aInstructions for impr
oving soccer skills. Discusses dribbling, heading,
 playmaking, defense, conditioning, mental attitud
e, how to handle problems with coaches, parents, a
nd other players, and the history of soccer.^#0$aS
occer$vJuvenile literature.^#1$aSoccer.^

Sure, that can be formatted nicely, but imagine a modern system having to read all of this to be able to allow users to search by author, title, keyword, and have facets for years, material type, etc. A program then reads all of the records in, indexing all of them and then running purely off of the index, except when forced to look at the MARC records because people are still doing something to/with them, or it somehow queries the records-as-blobs. I’m not even sure how older catalogs actually worked because the format of these is impossible for my concepts of computerized search.
I don’t know how common it must be for people familiar with normal standards to unquestioningly assume that MARC must be a normal standard, but I had trouble even understanding that something as broken as the MARC record could still exist. Now, I understand why people would tell me “that’s not possible” or “that’s not the way the system works” when I’d ask questions about what should be simple tasks. I’d often reply “but it has to be because that’s the way computers work” and I’d keep asking, thinking MARC must be an elaborate way to define data, with ties to legacy systems that made it confusing. That’s true-ish, but the real problem is that MARC is an archaic legacy form, so much so that I couldn’t comprehend when people tried to explain it to me.
When explaining MARC records to those familiar with normal technology standards, Karen Coyle notes hearing “virtual sighs” as  the programmers who “were not familiar with the standard library metadata record, and the standards were not compatible with the general suite of tools that the programmers commonly work with, such as HTML, CSS, and a host of XML-based tools” (source). In my mind, a metadata standard – especially one for library materials, whether books or audio or maps or whatnot – cannot be incompatible with XML.
It looks like the phenomenon of not knowing how to define MARC is fairly common for folks who work regularly with current computing. Hopefully we’ll all learn just enough about MARC to replace it quickly with RDA (or even something that seems like MARC to those who like it, but something that functions as a real data model). Once the archaic MARC-technology-underpinnings – whether or not other aspects of it remain – can be replaced, library data will be so much easier to access, use, and connect for everyone from catalogers to patrons. I feel awful that I didn’t understand how broken MARC was as it tried to act as protocol/structure/display/format/record, and I’m only now learning what MARC is, so I don’t yet know how many problems it’s created or how many innovations or aids it’s prevented.