Thursday 21 May 2009

The Amazing Ruby: Rake + EtherPad + LaTeX


Update (20120208): I recently launched writeLaTeX, a service that combines EtherPad with LaTeX preview. Try it out — it's free!


Update (20100518): EtherPad has finally shut down its servers after being acquired by Google, last year. The scripts below may still work (with some changes) with PiratePad, or with a pad that you host yourself, but I haven't tried it out myself.


Update (20090713): The epview AppJet application that I originally used seems to have disappeared. The code has been updated to use EtherPad's new export feature, so it should now be working again.

When I was in primary school, I spent most of my time on the 3 R's: reading, writing and arithmetic. Now I'm a PhD student studying maths; guess how I spend most of my time! The more things change, the more they stay the same, as they say.


One thing that has changed is that I rarely write things on my own, now. Most of my written work is collaborative, and almost all of it involves math. A lot of scientific writing fits this pattern, but I've found that tool support remains limited.


The `old fashioned' way is to pass LaTeX files around by e-mail, but this requires either one-at-a-time editing or manual merging. A better way is to put the document into a shared version control system, but this has high administration overhead, and it still requires lots of external coordination. (It's also almost totally unknown outside of the software engineering community, perhaps because it's not very user-friendly — did you forget to update before you committed?)


Google Docs is pretty good for general collaborative writing, because the document is stored on a central server. The authors can edit simultaneously via their browsers, from wherever in the world they happen to be. The key advantage is that there's one current version of your document; it's always up to date (plus or minus about 10s) and there is no manual merging. As an added bonus, you can easily view the state of your document at any time in the past, so you don't have to worry about losing text if you delete it. The main problem for me is that it has no support for typesetting math, so I just end up writing LaTeX markup in Google Docs. The editor is also a bit clumsy; for example (at least when I last used it), my undo stack reset itself every time someone else made a change.


Enter EtherPad. Again, the document is stored on a central server and edited via a browser, so there is one current version. Unlike Google Docs, it doesn't support images or formatting; it's just a plain text document, but that's fine when writing LaTeX markup. This simplicity allows the collaborative editor to be very responsive; updates are distributed so quickly that two authors can work on the same sentence without stepping on each others' toes. So, it's quite possible to write LaTeX collaboratively in EtherPad. But, there's a problem: you have to get your LaTeX source out of EtherPad so you can compile it; copying and pasting quickly gets tedious.


Enter Ruby. The following Ruby code grabs the current content of your pad. (Note that this code is linked below, so you don't have to copy and paste it.)

# Get the plain text content of an etherpad.
def get_etherpad pad
# Based on http://forums.etherpad.com/viewtopic.php?id=168
url = URI.parse("http://etherpad.com/ep/pad/export/#{pad}/latest?format=txt")
$stderr.print "Getting #{url}... "
s = Net::HTTP.get(url)
$stderr.puts "done."
s.strip
end

For extra streamlining, combine this with a custom Rake (Ruby make) task. (If you're not familiar with the rake or make tools, you might be interested in this article on dependency-based programming, by Martin Fowler.)


# See etherpad_file.

class EtherpadFileTask < Rake::FileTask

  attr_accessor :pad


  def remote_pad

    @remote_pad = get_etherpad(pad) unless @remote_pad

    @remote_pad

  end


  def needed?

    return true unless File.exists?(name)

    local_pad = File.open(name) {|f| f.read}

    return remote_pad != local_pad

  end

end


#

# Task to copy an etherpad to a local file.

# Each time it is invoked, it checks whether the etherpad has changed;

# if it has, the local file is updated; if it hasn't, the local file is

# left alone.

#

def etherpad_file(file_name, pad, &block)

  eft = EtherpadFileTask.define_task({file_name => []}) do |t|

    raise unless t.remote_pad

    File.open(t.name, 'w') {|f| f.write(t.remote_pad)}

    $stderr.puts "Wrote pad #{t.pad} to #{t.name}."

  end

  eft.pad = pad

  eft

end


Then add the following call to your Rakefile, where the Etherpad you want to use has the URL http://etherpad.com/your_pad_id.


etherpad_file 'your_local_file.tex', 'your_pad_id'


Now, when you run the command


rake your_local_file.tex


the script will grab the text from the pad and, if it differs from the local version (if any), it will overwrite the local version with the one from the pad. For even more convenience, add a LaTeX rule to your Rakefile, like:


rule '.pdf' => %w(.tex) do |t|

  tex = t.prerequisites.first

  dvi = tex.sub(/\.tex$/,'.dvi')

  ps = tex.sub(/\.tex$/,'.ps')

  sh <<SH

latex #{tex}

latex #{tex}

latex #{tex}

dvips -o #{ps} #{dvi}

ps2pdf #{ps} #{t}

SH

end


Now, you can run the command


rake your_local_file.pdf


and all the right stuff will happen. When the pad changes, and you run rake, the PDF will be rebuilt. Magic!


Of course, this is still far from the ideal collaborative LaTeX editing solution.

  • There is no syntax highlighting or reference auto-completion in the EtherPad editor.
  • There is no way to attach figures to the document; you have to exchange them some other way.
  • There's no safe way to import the pad into your favorite editor and then export it back again.
  • This method relies on the `epview' application, which may change or cease to exist at any time.
  • This method isn't very efficient; it goes out and gets the whole content of the document to compare with the local copy. All that's needed is the latest modification time, or maybe a hash of the document content.
  • EtherPad currently does not support private pads; anyone who knows the URL can read and edit your pad.

On the plus side, the method works for other text-based data you might stick on a pad. You could conceivably put any kind of code on several different pads and then use rake to compile them together. It would be rather slow, though.


For longer or more serious stuff (like the thesis I should be writing instead of this blog post), my preferred solution is the LyX `what you see is what you mean' (WYSIWYM) editor with version control (e.g. subversion). It may be that LyX will one day support collaborative editing, but so far there doesn't seem to be much progress on this.


For convenience, I've posted a demo Rakefile.rb file on my website, but all of the code you need is on this page.

5 comments:

mercurialmadnessman said...

I've tried writeLaTeX for collaboratively writing a LaTeX paper in realtime... the problem is that if the other person is typing, the page jumps around. So, effectively, it can't be used at the same time as someone else. Would you be able to fix this please?

casey.s.watts said...

This is exactly what I've been looking for! With some tweaking and this'll be exactly how I write things >:D

casey.s.watts said...

This is exactly what I've been looking for! With some tweaking and this'll be exactly how I write things >:D

koppor said...

Could you please post some words on the software behind writelatex.com? It doesn't seem to be etherpad anymore. The CodeMirror plugin for the code, but that doesn't support collaborative editing out of the box, does it?

Unknown said...

Thanks for the comments, and sorry for the very slow response --- I had not set up e-mail notifications on this blog, and I just noticed these now.

@mercurial: the jumping around should now be fixed on writeLaTeX

@Oliver: writeLaTeX uses an Operational Transformation protocol that's very similar to EtherPad, but I needed one that would work with our current stack, which is rails. We haven't written much about the underlying software, in part because it's always changing! But I will try to write something on this topic soon.