klab
"What I cannot create, I do not understand." – R. Feynman
Jonas Kubilius

Doing science in the open, 2013 edition

(For the 2011 edition, see Doing science in the open.)

Understanding open science

Open science is about sharing — papers, data, software, ideas… Sharing promotes replicability, transparency, knowledge accumulation, love… But it is only meaningful when open standards are implemented. Examples:

  • If you share your code in E-Prime, I can only use it if I own a copy of E-Prime (\$795). That’s not very useful.
  • If you publish a paper under a traditional license, I can read it but cannot (easily) reuse your data or figures.

General idea — Use tools that are as independent as possible:

  • Mature tools are preferred to beta since the developer(s) might abandon the project
  • Free and open source (F/OSS) projects are preferred to proprietary because you can trust the code, change it to your needs and it will never vanish.
  • Text files are more accessible than binary files since the latter may require proprietary software to use them.

Note: This is a general preference, not absolute. Your particular problem might have different requirements.

Tools

(dev) indicates that I do not consider the tool to be mature yet. It may change (e.g., become paid or change its API), it may disappear, or better (more widely accepted) tools of the same sort may emerge soon.)*

Open Science Framework

A project (dev) closely related to the Reproducibility Project (Brian Nosek and friends):

  • Organize your research (esp. important when collaborating)
  • Keep it in the cloud
  • Share it (if you want)
  • Be honest: everything gets recorded (version control), e.g., your a priori hypotheses
  • Your contributions are tracked and citable

Licenses (my recommendations)

  • Copyright (c): all rights reserved
  • Copyleft: all rights reversed
    • Permissive (anybody can use your stuff for anything they like – even sell it):
    • Viral (reciprocal – anything based on your work will be of the same same license type, ensuring perpetual access to it):
  • Public domain: no rights reserved

Google Scholar

Alternatives: ResearchGate (and here’ how my profile looks like), Academia.edu, Facebook Profile / Facebook Page, Mendeley (my profile)

Reading publications

Limitations of PDFs:

  • not pleasant to read on tablets since they don’t fit on screen
  • figures are all over the place rather than next to their first mention

(You may want to try text reflow for converting a PDF into text on the fly as a possible solution but I’m not aware of an app that would do the job well.) But even worse is reading papers online in the HTML format :

  • not available offline
  • does not have the same “stable” feel as a PDF
  • difficult to make notes

(For a magazine-like HTML experience, try PubMed’s PubReader on your browser.) Best solution so far is epub (an experimental paper format from PubMed):

  • reformats itself to fit the screen and still looks great
  • looks and feels like a book
  • note taking works just like on a PDF

Since 2008, NIH-funded scientists are required to submit their manuscripts and they’re published on PubMed within 12 months of the initial journal publication. Thus, epubs are already available for recent publications and appear pretty fast but, unfortunately, not instantaneously for the new ones.

Experiments on tablets

Alternative: online experiments (like L-POST) which you can learn to create them at Udacity

Papers

For collaboration: Google Drive

  • Real-time updates for all users
  • Google Scholar integrated

Writing, formatting, commenting: LyX

  • It’s LaTeX, so you get more robust formatting, exporting to many formats (useful for self-archiving, incl. Lirias), and F/OSS
  • Track Changes just like in MS Word
  • This blog entry was first written LyX for a lab presentation and then exported to a text format

IPython for papers

  • Easy format for running analyzes and sharing full papers
  • LaTeX is integrated
  • Run R using Rmagic (via rpy2 and with pandas; see more on the Gestalt Revision wiki)

    import pandas %load_ext rmagic data = pandas.read_csv("data.csv")

    -i sends the variable to R, the rest is R code

    %R -i data print(summary(data))

  • Share notebooks on IPython Notebook Viewer

  • Soon: work in the cloud with wakari.io (dev)

Open Science Paper (dev):

  • Easily generate a beautiful paper with the power of LaTeX and a great template
  • Can include R code for an automatic figure generation (via Knitr)

PythonTeX *(dev)*

  • LaTeX-based
  • Can include Python code for an automatic figure generation
  • demo

If you still prefer Word documents, at least stop sending multiple versions around. Dropbox is your true friend:

  • Collaborate: Right click > Dropbox > Share Folder > Browser window opens, choose your victim
  • Share (give access): Right click > Dropbox > Share Link… > Browser window opens, copy the link

Presentations

Traditional: Scribus

  • Produces vector graphics, meaning a text-based image (svg), good for easy manipulation, perfect rescaling, and googling
  • Perfect for posters and presentations
  • Images are easy to manipulate unlike in the next two options

Alternative: Inkscape LaTeX-basedBeamer

  • All advantages of LaTeX
  • Easy user interface in LyX
  • Easy export to other formats

Browser-based (usually HTML+JS)

  • Immediately publish online!
  • Can embed videos (youtube)
  • Prezi-like experience: impress.js

(but you may share pdfs of your presentations on figshare or slideshare (see my own example))

figshare (dev)

  • Citable: each figure gets a DOI (see my example)
  • Resources available under the Creative Commons Attribution 3.0 license, so you always retain the copyright (or at least you can always use it in your other papers)
  • Also accepts posters, media, papers, and datasets (the latter under CC0)

Disclaimer: I am a figshare advisor Alternative: Your own website with copyright information (use CC BY)

Keeping file history

  • For text-based files: use version control systems (githg) via online repositories (githubbitbucket). Notice that this is where text files win over binary.
  • Dropbox keeps file history (for 30 days, unlimited for Pro): Go to Dropbox online, click on the trash can to see deleted files or on the file to see its change history