When you are just too lazy...
This is a shell script I created for downloading, bibtexing, and organizing scientific papers which I named getpaper. This way you can use a single command to get a paper rather than a lot of tedious mouse-clicking.
It now has GUI support through zenity. It is tested to run in Linux and Mac OS, and so it should work in basically all UNIX-like systems. I recently tested the basic method in Android, and it works! So be ready for future release supporting your mobile phone or tablet.
What it does:
Input can be sent as:
- Download papers (if you have subscription access)
- Remotely download the papers through a server that has journal access by ssh
- Add a full BibTeX entry (Abstract, DOI, local PDF link, …)
- Open the paper
- Print the paper
It also includes all the basic sanity checks you would expect, avoiding to download the same papers, telling you if it fails to download, checking if the downloaded item appears to be a PDF file, and so on.
- Direct command line for a single reference
- Zenity GUI input for a single reference
- Text file with one or more references
As this script will automatically add entries to a specified .bib file for BibTeX use, I suggest combining this script with a BibTeX editor like JabRef. This is a fabulously simple GUI written in Java for managing BibTeX entries, which can also open the relevant PDF files when they are linked (as getpaper does automatically).
Download the latest release (v.0.979)
As a shell script, you probably need to Save As… since it will just open in the browser. See the section below for how to run shell scripts if you need help.
How to install
getpaper is just a shell script with a few dependencies. If you are comfortable using shell scripts and installing a few simple programs, you don't need to read this. When you try to run it, it will tell you what (if any) dependencies it is missing.
(Note: Also be sure to edit some of the user defined variables at the top, like PDFVIEWER, PRINTCOMMAND, et c.)
How to use shell scripts
To use a shell script, you need to have the shell installed (in this case bash). The script also needs to be executable, and either called explicitly or be in your PATH.
Once you download the script, you can easily make it executable:
daid@flux ~/downloads % chmod +x getpaper
If you don't want to modify your PATH, then you need to call it explicitly:
daid@flux ~ % ~/downloads/getpaper
Usually I just stick things like this somewhere like ~/scripts which I've put in my PATH. You can modify your path in your shell's run configuration or profile (something like ~/.bashrc). In .bashrc, you could add ~/scripts to you PATH by doing:
All the dependencies required are very common tools found in most package managers. Thus, you probably do not need to follow the links given here to each program you are missing, but just use a package manager to install them! Many or all of them may already be installed on your system, so just run getpaper to see which ones you are missing. These programs are all fairly light-weight (although summed with their own dependencies may not be). In any case, you will need:
If you want GUI support, then there is the optional dependency:
- lynx: a command line web browser. Preferred to links or elinks due to the -base option
- wget: a non-interactive network downloader.
- poppler: a free software library for PDF rendering (used to check the validity of the downloaded papers)
- Depending on your distribution, the relevant dependencies may be included under xpdf.
- We need this because wget will take whatever it finds, like a webpage saying you don't have access to the pdf. This is a simple check for that.
For Mac OS, all these dependencies are available through both macports and fink.
- zenity: a tool to display GTK dialog boxes in shell scripts
If you have zenity installed, just run getpaper to start. Otherwise, running getpaper without any arguments will tell you how it works, which I syndicate here:
daid@flux ~ % getpaper
getpaper version 0.90
Download, bibtex, print, and/or open papers based on reference!
Copyright 2010-2011 daid - www.goatface.org
Usage: /home/daid/scripts/getpaper: [-f file] [-j journal] [-v volume] [-p page] [-P] [-O] [-R user@host]
Description of options:
-f <file> : getpaper reads data from <file> where each line corresponds to an article as:
JOURNAL VOLUME PAGE COMMENTS
prl 99 052502 12C+alpha 16N RIB
(Comments are used in the bibtex for the user's need.)
-j <string> : <string> is the journal title abbreviation
-j help : Output a list of available journals and abbreviations.
-v <int> : <int> is the journal volume number
-p <int> : <int> is the article first page
-P : Printing is turned on
-O : Open the paper(s) for digital viewing
-R user@host : Remote download through ssh to user@host
(Note: -f option supersedes the -j -v -p options.)
If zenity is installed, getpaper will enter GUI mode if no options are passed
Present Journal List
The shell script can easily tell you the list of journals presently in the database. You can contact me with requests for additional journal support—it's very easy! Also then I will know anyone actually appreciates what I do with my life. There is also an example below as to how to add a new entry yourself.
daid@titan ~/html/goatface/src % getpaper -j list
Journals in database:
aa Astronomy & Astrophysics
aipc American Institute of Physics (Conference Proceedings)
aj The Astronomical Journal
astl Astronomy Letters
anap Annales d'Astrophysique
apj The Astrophysical Journal
apjl The Astrophysical Journal (Letters)
apjs The Astrophysical Journal (Supplement Series)
aujph Australian Journal of Physics
baas Bulletin of the American Astronomical Society
bsrsl Bulletin de la Societe Royale des Sciences de Liege
epja European Physical Journal A
epjh European Physical Journal H
gecoa Geochimica et Cosmochimica Acta
mnras Monthly Notices of the Royal Astronomical Society
msrsl Memoires of the Societe Royale des Sciences de Liege
natph Nature Physics
nucim Nuclear Instruments and Methods (1983 and earlier)
nimpa Nuclear Instruments and Methods in Physics Research A
nimpb Nuclear Instruments and Methods in Physics Research B
nupha Nuclear Physics A
nuphb Nuclear Physics B
obs The Observatory
paphs Proceedings of the American Philosophical Society
pce Physics and Chemsitry of the Earth
phrv Physical Review
pmag Philosophical Magazine
ppsa Proceedings of the Physical Society A
ppsb Proceedings of the Physical Society B
pra Physical Review A
prb Physical Review B
prc Physical Review C
prd Physical Review D
pre Physical Review E
phlb Physics Letters B
pasp Publications of the Astronomical Society of the Pacific
prl Physical Review Letters
pthph Progress of Theoretical Physics
pthps Progress of Theoretical Physics Supplement
rvmp Reviews of Modern Physics
scoa Smithsonian Contributions to Astrophysics
zphy Zeitschrift fur Physik
Adding a new journal
If you like my script but don't study nuclear astrophysics, you may find the present journal list rather lacking. While it's easy for me as the script author to add entries, a new user may be a little confused about the approach used.
At present, the journal list is hardcoded in the script, and the script must be modified in three separate places for each new entry. If you think this sucks, don't worry because I agree. (I am thinking about ways to avoid this for a future release, but I haven't thought of the best approach I want to take yet.)
In the example, I will show how to add the journal Geochimica et Cosmochimica Acta.
The script needs to be modified under three functions: JournalList, SetJournal, and GUI. The most important and only tricky one of these is the SetJournal function, so I will explain that first.
Most important is if ADS has a suitable database for the journal we want to add. So go to the Journal/Volume/Page Query Form and click on the Journal Name / Code link near the top left. Within the popup window, just click All Journals. Now just do a page Find to locate your journal of interest. For the example here, we can find the entry for Geochimica et Cosmochimica Acta and quickly note its code is GeCoA. If you click the code, it will put it automatically into the Journal Name / Code entry on the J/V/P page you had open. Now just send that as a query. Right now I can see there are 15262 entries in ADS for this journal, so it looks like a large database, and thus worth using in the script. Just click on some entry. Make sure it has a link to an online version, like Electronic Refereed Journal Article (HTML) or Full Refereed Journal Article (PDF/Postscript). The former is a link to the offical website, and the latter means it is hosted directly at ADS. For the case of GeCoA it just has the HTML version. Hover your mouse over this link, and see the full URL path clicking the link will follow. One part will be 'link_type' which in this case is EJOURNAL. Within getpaper this is what to use for the LTYPE variable.
Now click the link to the HTML journal article. In this case, it takes us to ScienceDirect. At present, ScienceDirect is the only online journal using the HREFTYPE=0 in getpaper. Now we know all the information we need to add GeCoA to the script.
The laziest way is to find a journal under SetJournal with identical or similar values for LTYPE and HREFTYPE. For values of LTYPE=EJOURNAL and HREFTYPE=0, phlb looks the same, so I copy that, and paste it in alphabetical order for GeCoA, and modify the line to read:
- gecoa | GeCoA | GECOA ) HREFTYPE=0;JCODE="gecoa";LTYPE="EJOURNAL" ;;
The three sets of gecoa GeCoA GECOA is just what getpaper will take in terms of case-sensitivity from the command-line input—use your own discretion to decide which variants of case are likely to be entered the most.
Since we know phlb is similar to gecoa, now just search for phlb in the script. We can find an entry of phlb under GUI so copy that line, paste it alphabetically and modify it to read:
- FALSE gecoa "Geochimica et Cosmochimica Acta" \
Here I use the standard of lowercase for all journal codes, and we just need to match one of the options listed in JournalList.
Searching for phlb again takes us back to the top of the script, and we can find an entry under JournalList. Again, copy the phlb entry, paste it alphabetically, and modify it to read:
- printf "gecoa\tGeochimica et Cosmochimica Acta\n"
At this point we can test the script to see if the new journal is successfully added! For this purpose, I usually change the LIBPATH under InitVariables to be something like librarytest instead of library. Of course, if you are adding the new journal because you want a paper (the usual case) then you can just use your regular .bib file. Save the script, get a valid reference to test, and try it out!
Particularly if you are using a server which hosts papers but is not somehow already in use by getpaper, it might not work and requires HREFTYPE hacking. I guess that's when you email me...
It supports the zenity graphical dialogs. If you have zenity installed and call getpaper without any options, then you can just start answering the pop-up questions it asks you. This means you could add it as a custom item to your application panel and use getpaper without ever touching the command line. However, full GUI operation is not yet fully supported (look for a future release), so while you can run getpaper outside a terminal, it is not recommended. This is because error messages are still all sent to the terminal, so if a paper fails to download and you are running through zenity outside the command line, there is no feedback.
Since this is just a fancy feature, it's not a dependency, and if you don't have zenity installed, then you're just stuck at the command line.
If you are on a network that does not have journal subscription access, but you can ssh to a machine on such a network, getpaper can download through that server and automatically transfer the pdf back to your local computer. For example, if you are at home or travelling, you can probably connect to your instiution's mail server to download papers.
Just feed getpaper the -R option followed by user@host, and then you will be prompted for your server password at the appropriate time (unless you have passwordless ssh enabled by keys).
GUI support through zenity is presently untested with remote downloading.
Remote downloading seems to work well for me now. There is a minor known bug for false entry adding on failure which I also need to fix (but it fails a lot less now, so I hope this is less of an issue).
Installing dependencies on the remote server
Even without superuser access, we can locally install software on the server. Since things based on RedHat like CentOS are in fact a plurality of Linux servers (and also my case), I will explain how I did this using RPM. Firstly, you can check the basic information of the operating system simply by:
$-bash-3.2$ uname -a
Linux mailserver 2.6.18-274.3.1.el5xen #1 SMP Tue Sep 6 21:35:19 EDT 2011 i686 i686 i386 GNU/Linux
where 'mailserver' is the hostname which I changed slightly for security reasons.
Also, often under /etc we can find some kind of file with some kind of name like 'release':
$-bash-3.2$ ls /etc/*release*
And finally, we can see what is inside that file:
-bash-3.2$ more /etc/redhat-release
CentOS release 5.7 (Final)
So, now I know my mailserver runs CentOS 5.7, and I also know its architecture is i386 compatible, I can search on an RPM index for lynx, download the correct RPM, and copy it over to my mailserver.
Let's create the directory $HOME/local (after ssh'ing):
$ cd && mkdir local
...and move the lynx rpm into that directory!
But how about installing lynx locally without superuser permission? You may follow along ajaya's guide, but there was the missing point about relocating /etc as well as his insistance on giving me things to copy-paste with "username" instead of "$HOME" which will just fill that information in for you automatically when you paste the command and press enter! Of course, you still need to delete my 'lynx.rpm' and press the tab key to get your proper version of lynx.rpm:
$ rpm --root $HOME/local --dbpath lib/rpm --relocate /usr=/home/`whoami`/local --relocate /etc=/home/`whoami`/local/etc --nodeps -ivh lynx.rpm
ajaya's guide mentioned how to check if the dependencies are fulfilled, so I assume you can do that first.
Now if you run lynx, it will complain that there is no configuration file in /etc, so we need to pass the -cfg flag to lynx. However, we have not modified our PATH yet anyway, so let's do an easier trick in one step! Editing whichever shell run configuration as appropriate (in my case bashrc, so giving bash syntax here):
alias lynx="$HOME/local/bin/lynx -cfg=~/local/etc/lynx.cfg"
And if you logout and login (or otherwise source the new ~/.bashrc), you can now use lynx!
You need to make sure your .bashrc will alias even for non-interactive login. So for instance, if you have something like this at the top of your run configuration:
# Test for an interactive shell. There is no need to set anything
# past this point for scp and rcp, and it's important to refrain from
# outputting anything in those cases.
if [[ $- != *i* ]] ; then
# Shell is non-interactive. Be done now!
comment that stuff all out! Below that, put some new lines:
# need this for aliases in non-interactive mode...
shopt -s expand_aliases
If you want to know how I know this, go read this post on stackoverflow.
I think now it basically works!
Prior to this work, to download specific references, I was using the ADS Journal/Volume/Page Query Form. This tool is quite handy I find, but with all the mouse-clicking, it's taking me a minute or two to download, print, and BibTeX a given paper. This script simplifies that process to one command. For example, if I have the reference MNRAS 287 (1997) 495:
daid@flux ~ % getpaper -j mnras -v 287 -p 495
Processing: JOURNAL mnras VOLUME 287 PAGE 495
/home/daid/librarytest/library.bib does not exist!
Creating blank .bib file.
BIBCODE is 1997MNRAS.287..495K
Fetching bibtex file from ADS (http://adsabs.harvard.edu/cgi-bin/nph-bib_query?bibcode=1997MNRAS.287..495K&data_type=BIBTEXPLUS&db_key=ALL&nocookieset=1)
Determining URL path for PDF...
Downloading PDF from http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=1997MNRAS.287..495K&link_type=ARTICLE&db_key=ALL...
--2010-11-14 04:43:08-- http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=1997MNRAS.287..495K&link_type=ARTICLE&db_key=ALL
Resolving adsabs.harvard.edu... 22.214.171.124
Connecting to adsabs.harvard.edu|126.96.36.199|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://articles.adsabs.harvard.edu/cgi-bin/nph-iarticle_query?1997MNRAS.287..495K&data_type=PDF_HIGH&whole_paper=YES&type=PRINTER&filetype=.pdf [following]
--2010-11-14 04:43:09-- http://articles.adsabs.harvard.edu/cgi-bin/nph-iarticle_query?1997MNRAS.287..495K&data_type=PDF_HIGH&whole_paper=YES&type=PRINTER&filetype=.pdf
Resolving articles.adsabs.harvard.edu... 188.8.131.52
Connecting to articles.adsabs.harvard.edu|184.108.40.206|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/pdf]
Saving to: “/tmp/mnras.287.495.pdf”
[ <=> ] 622,637 286K/s in 2.1s
2010-11-14 04:43:11 (286 KB/s) - “/tmp/mnras.287.495.pdf” saved 
Moving downloaded PDF from temporary location: `/tmp/mnras.287.495.pdf' -> `/home/daid/librarytest/articles/1997/mnras.287.495.pdf'
It can also run off of a text file (not showing the output or method here), such as:
daid@flux /tmp % cat papers.txt
nupha 506 1
nupha 460 1
nupha 564 1
nupha 475 1
nupha 490 1
Please note journal years are not included in the query, since volume number is sufficient — the script determines the year itself automatically for later use.
It will automatically obtain and save the BibTeX data for you — usually I just put it all into my library.bib file, but you can change this in the script. It can optionally open papers with your favorite PDF viewer, as well as print them.
It will automatically store papers under subdirectories within your specified library location as type/year/ where type is like article, proceedings (others not added yet). If any of these directories don't exist, it will create them. The PDF file location is also output into the BibTeX file.
Intended Application and Use
You are intended to use this software responsibly and in compliance with all applicable terms of service for each online journal archive. There are various programs for downloading and viewing digital papers, and as far as I am aware, getpaper, when used on a case-by-case basis, should generally be fair use.
However, getpaper can in principle download any number of papers the user requests. getpaper has no internal mechanism for systematically downloading papers, because all the specific paper data must be provided from the user on an individual basis.
Using this script for systematic mass downloading is a likely violation of the terms and services of most peer-reviewed publication online archives. This is contrary to the intented application of getpaper. getpaper is intended to streamline the organization of an indivdual's digital library of papers, by coupling the download request with BibTex, Abstract, and other relevant information, so that these tasks do not need to be performed independently. I am not responsible for any abuse resulting from any individual's improper use of getpaper to systematically perform mass downloads or in any other way violate the terms and conditions of any archive getpaper accesses.
Use this tool only if you are going to do it responsibly!
Liks to Terms of Service pages
Here are some useful links so I've explicitly pointed you to the rules you should follow for various publishers:
After seeing my annual report harvesting script, a colleague (Jun Chen) suggested to me I write one for 'real' papers, so I did (except this one does not harvest, it just takes user input). Jun also wrote his own based on NSR lookup based on my initial work on the script. We traded a number of ideas concerning this topic, but ultimately all the actual code itself is my own.
Papers is a GUI for Mac OS X's Aqua that provides functionality similar to getpaper when combined with JabRef. That software is in no sense free or portable. As far as I know it lacks the ability to do multiple automated queries, download through an ssh tunnel, and other things my script can do. Considering the bulk of my script is a few hundred lines of bash shell script and gets similar or superior results to expensive proprietary software, one begins to wonder.
I retain the getpaper copyright under the GPL v.3.
Future Improvement Ideas
Below I list my ideal hopes. Keep in mind, every single item listed here is my idea and no users complained enough to me. If you have any problem or idea to improve my work, you really need to contact me, please!
Your complaint listed here! (Sorry, basically no one cares what I do, but if you care or like what I do, you need to complain to me and ask me to fix the program or add a new feature! If you don't ask or complain, it might never be fixed!)
Re-implement the way journal entries are handled so it's not in three different places!
Further fixing PROLA query for GUI mode (Remote mode works now).
Port to Ruby.
Implement Author/Year support through multiple-return-query.
Allow the check flag (to ask if the paper exists already on my machine). Let check flag accept options like Open.
Cygwin compat. (Personally I assume no one in science ever used a method so awful like Microsoft, but I must be mistaken.)
Ctrl+C SIGINT trap.
Properly catch all errors so a bibentry is only made if we were 100% successful (sometimes that happens...ugh).
12 Jun 2013 21:13:21 — version 0.979 New Astronomy Review, probably new fixes, more journals, missed some updates!
02 Nov 2012 11:53:07 — version 0.976 Vistas in Astronomy added for my wife. Can't anyone do this?
13 Jul 2012 17:58:49 — version 0.975 Physical Review had the wrong flag settings. Added the -b flag (for bibtex only, no paper download).
07 May 2012 18:28:44 — version 0.97 Remote download should hopefully work for all cases (see above to setup lynx). Also thanks to Mosè Giordano for a tip on using $HOME instead of whoami
12 Mar 2012 15:06:42 — version 0.967 More AIP support added (Review of Scientific Instruments)
02 Mar 2012 04:20:16 — version 0.966 Science journal should be LTYPE ARTICLE not EJOURNAL
18 Feb 2012 16:21:05 — version 0.965 Issue with ApJ and others (something I broke trying to fix something else?). A couple more journals.
16 Jan 2012 21:22:07 — version 0.96 ScienceDirect tries new tricks to give us 404 error, shift from wget to lynx identical to PROLA
09 Jan 2012 14:03:03 — version 0.95 Put a useful error message to check getpaper version on failed download
04 Jan 2012 13:54:29 — version 0.94 ScienceDirect changes link structure again...
21 Nov 2011 20:45:42 — version 0.93 PROLA was giving wget a 401 error, so now this is corrected for the local command-line version
22 Sep 2011 21:11:31 — version 0.92 Science journal was incorrectly listed with LTYPE ARTICLE instead of EJOURNAL
24 Jul 2011 20:51:57 — version 0.91 ScienceDirect hack for remote download, GeCoA added to list, beginnings of check flag, depcheck for wget on server added
22 Mar 2011 02:28:15 — version 0.90 Remote server download via ssh enabled.
05 Mar 2011 03:50:46 — version 0.87 ScienceDirect changed the link structure. Also added some more journals...
23 Jan 2011 07:35:53 — version 0.86 version 0.8 (and later) did not properly implement multiple return query choice
20 Jan 2011 00:42:56 — version 0.85 bugfix for bibtex parsing to not interpret backslash, etc (use read -r)
19 Jan 2011 23:25:18 — version 0.84 more papers, bug fix introduced in v.0.82 for ScienceDirect
15 Jan 2011 08:35:37 — version 0.83 zenity window size fix
15 Jan 2011 03:23:37 — version 0.82 Mac OS / BSD compat., older lynx (below v. 2.8.7) compat., bugfixes
14 Jan 2011 06:26:53 — version 0.81 Bugfix and more journals added
13 Jan 2011 05:03:59 — version 0.8 Multiple return query support
16 Dec 2010 19:21:05 — version 0.7 GUI support with zenity
23 Nov 2010 00:32:22 — version 0.61 Put some comments for PDFVIEWER and PRINTCOMMAND
18 Nov 2010 13:35:09 — version 0.6 Bug in file path sent to BibTeX fixed. Added Science journal.
16 Nov 2010 03:48:31 — version 0.5 Added NIM journals, added help display for available journals, a few more sanity checks
14 Nov 2010 03:50:06 — version 0.4 First public release
/ / (__)
/ PhD \ (oO)
/ | /| daid ||
* || ||------||