pdf

Just DOI it!

Martin Fenner
December 22, 2008 1 min read

With the December 18 issue (https://web.archive.org/web/20151003053153/http://www.nature.com/nature/journal/v456/n7224/?) Nature started to support XMP markup in article PDFs (reported last week on the Nascent blog by Tony Hammond)1 (https://web.archive.org/web/20151003053153/http://blogs.plos.org/mfenner/2008/12/22/just_doi_it/#fn1?). XMP stands for Extensible Metadata Platform and is a technology to embed metadata in files, including PDFs2 (https://web.archive.org/web/20151003053153/http://blogs.plos.org/mfenner/2008/12/22/just_doi_it/#fn2?). XMP was created by Adobe (with XMP support in PDF files since 2001), but is an open standard with backing by others, including Creative Commons3 (https://web.archive.org/web/20151003053153/http://blogs.plos.org/mfenner/2008/12/22/just_doi_it/#fn3?). The Digital Object Identifier (DOI (https://web.archive.org/web/20151003053153/http://www.doi.org/?)) is the most important piece of information in the metadata, as the DOI provides a link to the journal publisher website where more metadata can be retrieved. XMP support in scientific PDFs is unfortunately still very uncommon and probably hasn't changed much since Pierre Lindenbaum (https://web.archive.org/web/20151003053153/http://network.nature.com/people/lindenb/profile?) checked last year4 (https://web.archive.org/web/20151003053153/http://blogs.plos.org/mfenner/2008/12/22/just_doi_it/#fn4?).

Adding metadata to PDFs seems to be a no-brainer. We have done the same with music (mp3 ID) and photos (IPTC and EXIF) for years and it has been a tremendous help in organizing these files stored on our computers. Unfortunately there aren't too many tools that can extract the DOI or other metadata from the XMP in article PDFs. But I expect more desktop software to support XMP, once XMP support in scientific articles is more widespread. We will then be able to add a journal PDF to our reference manager of choice and have the relevant metadata (including authors, title, journal and issue) automatically filled in. As well as many other creative uses. Until then we need tools like Papers (https://web.archive.org/web/20151003053153/http://mekentosj.com/papers/?) or Mendeley (https://web.archive.org/web/20151003053153/http://www.mendeley.com/?) that can extract metadata from PDF files without this XMP information.

For a more technical discussion of XMP in scientific articles, please read the set of blog posts by Tony Hammond5 (https://web.archive.org/web/20151003053153/http://blogs.plos.org/mfenner/2008/12/22/just_doi_it/#fn5?),6 (https://web.archive.org/web/20151003053153/http://blogs.plos.org/mfenner/2008/12/22/just_doi_it/#fn6?),7 (https://web.archive.org/web/20151003053153/http://blogs.plos.org/mfenner/2008/12/22/just_doi_it/#fn7?).

fn1. XMP Labelling for Nature (https://web.archive.org/web/20151003053153/http://blogs.nature.com/wp/nascent/2008/12/xmp_labelling_for_nature.html?)

fn2. Adding intelligence to media (https://web.archive.org/web/20151003053153/http://www.adobe.com/products/xmp/?)

fn3. XMP (https://web.archive.org/web/20151003053153/http://wiki.creativecommons.org/XMP?)

fn4. Is there any XMP in scientific pdf? No (https://web.archive.org/web/20151003053153/http://plindenbaum.blogspot.com/2007/05/is-there-any-xmp-in-scientific-pdf-no.html?)

fn5. Metadata in PDF: 1. Strategies (https://web.archive.org/web/20151003053153/http://www.crossref.org/CrossTech/2007/08/metadata_in_pdf_1_strategies.html?)

fn6. Metadata in PDF: 2. Use Cases (https://web.archive.org/web/20151003053153/http://www.crossref.org/CrossTech/2007/08/metadata_in_pdf_2_use_cases.html?)

fn7. Metadata in PDF: 3. Deployment (https://web.archive.org/web/20151003053153/http://www.crossref.org/CrossTech/2007/08/metadata_in_pdf_3_deployment_1.html?)

(n.d.-a). In web.archive.org. Retrieved February 21, 2021, from https://web.archive.org/web/20150923191606/http://www.bl.uk/aboutus/stratpolprog/legaldep/
(n.d.-b). In web.archive.org. Retrieved February 21, 2021, from https://web.archive.org/web/20170831024033/https://retractionwatch.wordpress.com/2012/01/30/an-arxiv-for-all-of-science-f1000-launches-new-immediate-publication-journal/
arXiv.org e-Print archive. (2016). In web.archive.org. https://web.archive.org/web/20161013002949/https://arxiv.org/
caBIG. (2015). In web.archive.org. https://web.archive.org/web/20150524133714/https://cabig.nci.nih.gov/
F1000 launches fast, open science publishing for biology and medicine : News blog. (2016). In web.archive.org. https://web.archive.org/web/20161004092538/http://blogs.nature.com/news/2012/01/f1000-launches-fast-open-science-publishing-for-biology-and-medicine.html
Fenner, M. (2013). What Can Article-Level Metrics Do for You? PLOS Biology, 11(10), e1001687. https://doi.org/10.1371/journal.pbio.1001687
Home : Nature Precedings. (2016). In web.archive.org. https://web.archive.org/web/20161012221933/http://precedings.nature.com/
In which I suggest a preprint archive for clinical trials Gobbledygook. (2016). In web.archive.org. https://web.archive.org/web/20161019024534/http://blogs.plos.org/mfenner/2010/10/16/in-which-i-suggest-a-preprint-archive-for-clinical-trials/
Twitter. It’s what’s happening. (2016). In web.archive.org. https://web.archive.org/web/20161010225105/https://twitter.com/

Other Formats

ePub PDF JATS