Matthew DiLoreto

A place to keep track of some of my things.

Literate Import

The following is a literate program written in emacs-lisp.

Rationale

I like to write literate programs using org-babel-tangle. It’s an excellent workflow for literate programming which allows me to write an entire project in a single file, then tangle all the source blocks to their own files in the project.

Sometimes though I want to incorporate some external files into the project, and this leads to a problem. The literate programming flow with org-babel is always:

  1. write source code in org file src blocks
  2. tangle source code out to individual project files

Once I add a file to the project directly, the org file describing my literate program will be missing some of the program, it will be incomplete.

I could copy the source code into the org file, and tangle it out, just like all the internal project files, but sometimes I want to add a ton of files to my project, e.g. initializing a project with a starter template like create-react-app for React projects, lein new for Clojure projects, mix new for Elixir projects, etc.

Almost all professional programming languages have some sort of code generation mechanism that becomes useful at some point. Reincorporating those generated files into the literate document is a necessary feature for me because I want to be able to use these tools while also maintaining my project as a literate program in a single org file.

My solution is to create a command to reincorporate a project’s file structure back into the root org file. From there I can use all the org-mode features I’m used to (folding, navigation, editing, moving, etc) to reorganize the code into a cohesive narrative, which is the entire point of writing literate programs in the first place.

Implementation

First, we create a group for the new functionality. I added this because I will need a customizable variable, which we will see next. One interesting thing to note is that I can associate this new group with existing groups, here org-babel and org.

(defgroup literate-import nil
  "Import missing project files into a top-level literate org-file"
  :tag "literate-import"
  :group 'org-babel
  :group 'org)

Next is the custom variable. I realized that, when importing source blocks from files in a project, it is not immediately clear which language is being used. I found this gist which gives a pretty decent mapping of file extension->source language, and I supplemented it a bit with other sources. Of course, when using org-babel, the source language can be anything you want, and it will still get tangled properly, so even:

#+begin_src nonexistent-language :tangle ./something.txt

will still tangle the source block contents to the file something.txt, so this feature isn’t strictly necessary, but it will be convenient to automatically fill in the source language in the majority of cases.

(defcustom literate-import-language-extensions
  '(
    ;; ("Awk" . "awk")
    (".awk" . "awk")
    (".auk" . "awk")
    (".gawk" . "awk")
    (".mawk" . "awk")
    (".nawk" . "awk")

    ;; ("C" . C)
    (".c" . "C")
    (".cats" . "C")
    (".h" . "C")
    (".idc" . "C")
    (".w" . "C")

    ;; ("R" . R)
    (".r" . "R")
    (".rd" . "R")
    (".rsx" . "R")

    ;; IDK what this language is, but it isn't in the list in the gist
    ;; ("Calc" . calc)

    ;; ("Clojure" . clojure)
    (".clj" . "clojure")
    (".boot" . "clojure")
    (".cl2" . "clojure")
    (".cljc" . "clojure")
    (".cljs" . "clojure")
    (".cljs.hl" . "clojure")
    (".cljscm" . "clojure")
    (".cljx" . "clojure")
    (".hic" . "clojure")

    ;; ("Css" . css)
    (".css" . "css")

    ;; SCSS isn't in the org-babel list, but I will include it here anyway
    ;; ("Scss" . scss)
    (".scss" . "scss")

    ;; I actually am familiar with ditaa and it probably wouldn't be used in isolation, so won't have a file extension
    ;; ("Ditaa" . ditaa)

    ;; Don't know dot
    ;; ("Dot" . dot)

    ;; ("Emacs lisp" . emacs-lisp)
    (".el" . "emacs-lisp")
    (".emacs" . "emacs-lisp")
    (".emacs.desktop" . "emacs-lisp")

    ;; ("Forth" . forth)
    (".fth" . "forth")
    (".4th" . "forth")
    (".f" . "forth")
    (".for" . "forth")
    (".forth" . "forth")
    (".fr" . "forth")
    (".frt" . "forth")
    (".fs" . "forth")

    ;; ("Fortran" . fortran)
    (".f90" . "fortran")
    (".f" . "fortran")
    (".f03" . "fortran")
    (".f08" . "fortran")
    (".f77" . "fortran")
    (".f95" . "fortran")
    (".for" . "fortran")
    (".fpp" . "fortran")

    ;; ("Gnuplot" . gnuplot)
    (".gp" . "gnuplot")
    (".gnu" . "gnuplot")
    (".gnuplot" . "gnuplot")
    (".plot" . "gnuplot")
    (".plt" . "gnuplot")

    ;; ("Haskell" . haskell)
    (".hs" . "haskell")
    (".hsc" . "haskell")

    ;; ("Java" . java)
    (".java" . "java")
    (".jsp" . "java")

    ;; ("Javascript" . js)
    (".js" . "js")
    ("._js" . "js")
    (".bones" . "js")
    (".es" . "js")
    (".es6" . "js")
    (".frag" . "js")
    (".gs" . "js")
    (".jake" . "js")
    (".jsb" . "js")
    (".jscad" . "js")
    (".jsfl" . "js")
    (".jsm" . "js")
    (".jss" . "js")
    (".njs" . "js")
    (".pac" . "js")
    (".sjs" . "js")
    (".ssjs" . "js")
    (".sublime-build" . "js")
    (".sublime-commands" . "js")
    (".sublime-completions" . "js")
    (".sublime-keymap" . "js")
    (".sublime-macro" . "js")
    (".sublime-menu" . "js")
    (".sublime-mousemap" . "js")
    (".sublime-project" . "js")
    (".sublime-settings" . "js")
    (".sublime-theme" . "js")
    (".sublime-workspace" . "js")
    (".sublime_metrics" . "js")
    (".sublime_session" . "js")
    (".xsjs" . "js")
    (".xsjslib" . "js")

    ;; This one isn't present in the list, but I know it
    ;; ("LaTeX" . latex)
    (".tex" . "latex")


    ;; ("Lilypond" . lilypond)
    (".ly" . "lilypond")
    (".ily" . "lilypond")

    ;; Assuming common lisp
    ;; ("Lisp" . lisp)
    (".lisp" . "lisp")
    (".asd" . "lisp")
    (".cl" . "lisp")
    ;; (".l" . lisp)                       ; collision with Pico Lisp
    (".lsp" . "lisp")
    (".ny" . "lisp")
    (".podsl" . "lisp")
    (".sexp" . "lisp")

    ;; ("Makefile" . makefile)
    (".mak" . "makefile")
    (".d" . "makefile")
    (".mk" . "makefile")
    (".mkfile" . "makefile")

    ;; Can't find anything online about this other than wxMaxima
    ;; ("Maxima" . maxima)
    ("wxmx" . "maxima")

    ;; ("Matlab" . matlab)
    (".matlab" . "matlab")
    (".m" . "matlab")

    ;; ("Ocaml" . ocaml)
    (".ml" . "ocaml")
    (".eliom" . "ocaml")
    (".eliomi" . "ocaml")
    (".ml4" . "ocaml")
    (".mli" . "ocaml")
    (".mll" . "ocaml")
    (".mly" . "ocaml")

    ;; IDK what this is
    ;; ("Octave" . octave)

    ;; ("Org" . org)
    (".org" . "org")

    ;; ("Perl" . perl)
    (".pl" . "perl")
    (".al" . "perl")
    (".cgi" . "perl")
    (".fcgi" . "perl")
    (".perl" . "perl")
    (".ph" . "perl")
    (".plx" . "perl")
    (".pm" . "perl")
    (".pod" . "perl")
    (".psgi" . "perl")
    (".t" . "perl")

    ;; ("Pico Lisp" . picolisp)
    (".l" . "picolisp")

    ;; Not in the list
    ;; ("PlantUML" . plantuml)

    ;; ("Python" . python)
    (".py" . "python")
    (".bzl" . "python")
    (".cgi" . "python")
    (".fcgi" . "python")
    (".gyp" . "python")
    (".lmi" . "python")
    (".pyde" . "python")
    (".pyp" . "python")
    (".pyt" . "python")
    (".pyw" . "python")
    (".rpy" . "python")
    (".tac" . "python")
    (".wsgi" . "python")
    (".xpy" . "python")

    ;; ("Ruby" . ruby)
    (".rb" . "ruby")
    (".builder" . "ruby")
    (".fcgi" . "ruby")
    (".gemspec" . "ruby")
    (".god" . "ruby")
    (".irbrc" . "ruby")
    (".jbuilder" . "ruby")
    (".mspec" . "ruby")
    (".pluginspec" . "ruby")
    (".podspec" . "ruby")
    (".rabl" . "ruby")
    (".rake" . "ruby")
    (".rbuild" . "ruby")
    (".rbw" . "ruby")
    (".rbx" . "ruby")
    (".ru" . "ruby")
    (".ruby" . "ruby")
    (".thor" . "ruby")
    (".watchr" . "ruby")

    ;; ("Sass" . sass)
    (".sass" . "sass")

    ;; ("Scala" . scala)
    (".scala" . "scala")
    (".sbt" . "scala")
    (".sc" . "scala")

    ;; ("Scheme" . scheme)
    (".scm" . "scheme")
    (".sld" . "scheme")
    (".sls" . "scheme")
    (".sps" . "scheme")
    (".ss" . "scheme")

    ;; Not in the list
    ;; ("Screen" . screen)

    ;; ("Shell Script" . shell)
    (".sh" . "shell")
    (".bash" . "shell")
    (".bats" . "shell")
    (".cgi" . "shell")
    (".command" . "shell")
    (".fcgi" . "shell")
    (".ksh" . "shell")
    (".sh.in" . "shell")
    (".tmux" . "shell")
    (".tool" . "shell")
    (".zsh" . "shell")

    ;; ("Sql" . sql)
    (".sql" . "sql")
    (".cql" . "sql")
    (".ddl" . "sql")
    (".inc" . "sql")
    (".prc" . "sql")
    (".tab" . "sql")
    (".udf" . "sql")
    (".viw" . "sql")

    ;; IDK why they make the distinction
    ;; ("Sqlite" . sqlite)


    ;; ("Stan" . stan)
    (".stan" . "stan")
    )
  "Should be kept up to date with 'org-babel-load-languages, and the version
should also match.

This list detects the correct language to use in a generated src block
based on the extension of the original file.

Transliterated from https://gist.github.com/ppisarczyk/43962d06686722d26d176fad46879d41
"
  :group 'literate-import
  :version "24.1"
  :type '(alist :key-type string :value-type (group string)))

Utilities

This next function reads the :tangle property (the second returned from org-babel-get-src-block-info) of a src block, which is a file name. Conveniently, this also works when the :tangle property is defined somewhere other than the line of the src block itself, such as a parent heading or root document property. E.g.

At the heading level:

* Some heading
:PROPERTIES:
:header-args:emacs-lisp: :tangle ./config.el
:END:

Below are some code blocks that will both be tangled to ./config.el, even though they do not specify it directly on the begin_src line.

#+begin_src emacs-lisp
...
#+end_src


#+begin_src emacs-lisp
...
#+end_src

At the file level:

#+title: Doom Emacs Config
#+PROPERTY: header-args:emacs-lisp :tangle ./config.el

Every emacs-lisp source block in this whole file will be tangled to ./config.el unless otherwise specified.

#+begin_src emacs-lisp
...
#+end_src

#+begin_src emacs-lisp
...
#+end_src

#+begin_src emacs-lisp
...
#+end_src

#+begin_src emacs-lisp
...
#+end_src
(defun literate-import/src-block-tangle-target ()
  (expand-file-name (alist-get :tangle (nth 2 (org-babel-get-src-block-info)))))

And this (clumsy) function simply collects those tangle targets for every source block in the entire org file. I say clumsy because I’m ignoring errors from org-babel-next-src-block and tracking point location to determine if there are any more src blocks, and recursively collecting the results. This could all probably be nicely expressed in a loop, but I didn’t want to figure that out.

(defun literate-import/list-tangle-targets-in-current-buffer (&optional p acc)
  (let ((loc (or p (point-min)))
        (res (or acc '())))
    (goto-char loc)
    (ignore-errors (org-babel-next-src-block))
    (if (eq (point) loc)
        (seq-filter 'file-exists-p res)
      (literate-import/list-tangle-targets-in-current-buffer (point) (cons (literate-import/src-block-tangle-target) res)))))

Core

Now the real meat of the program. The outline I initially wrote explains my thought process pretty well:

;; OUTLINE:
;; ----------
;; let tree = empty tree
;; Get all project files
;; Get all :tangle targets
;; for each project file
;;      if file not in targets
;;              merge file hierarchy to tree
;; convert tree to org mode tree
;; print tree at end of current buffer

The only deviation I ended up making in the actual implementation was to not collect the file hierarchy in a tree, and then convert that to an org-mode tree, but instead to simply collect the files in a list, and traverse that list, building up the org tree using the standard insertion commands e.g.

Original idea:

'(root (a (file1.txt) b (file2.txt) c (file3.txt)))

somehow converts to:

* root
** a
*** file1.txt
** b
*** file2.txt
** c
*** file3.txt

Instead just start with:

'(root/a/file1.txt root/b/file2.txt root/c/file3.txt)

and iterate through that list and use org-mode’s navigation/insertion commands to produce the result directly in a temporary buffer.

(defun literate-import/import-project ()
  (interactive)
  (save-excursion
    (let ((tree nil)                    ; let tree = empty tree
          (proj-files (completing-read-multiple "Select file(s) to import: "
                                           (projectile-project-files (projectile-acquire-root)))) ; Get all project files
          (tangle-targets (delete-dups (literate-import/list-tangle-targets-in-current-buffer)))) ; Get all :tangle targets
      (cl-loop for file in proj-files do              ; for each project file
               (when (not (memq file tangle-targets)) ; if file not in targets
                 (setq tree (cons file tree)))) ; merge file hierarchy to tree
      ;; convert tree to org mode tree
      ;; print tree at end of current buffer as org mode tree
      (insert (literate-import/convert-to-org-tree tree)))))

One awesome feature here comes from the use of completing-read-multiple, in combination with projectile’s project file collecting function projectile-project-files. This lets me super easily mark the exact set of files I want to include in the import, and only generate the src blocks for those files. Of course this also means that this project is dependent on projectile (completing-read-multiple on the other hand is built-in to emacs), so won’t be as reusable for other folks, but this literate style workflow is so niche anyway that I doubt this will be very useful for anyone other than myself.

The above function uses this next one to print the list of files into the org mode heading hierarchy, insert the src block and the file content between.

(defun literate-import/convert-to-org-tree (file-list)
  (with-temp-buffer
    (org-mode)
    (cl-loop for file in file-list do
             (let* ((subdirs (split-string file "/"))
                    (file-name (first (last subdirs))))
               (cl-loop for dir in subdirs do
                        (when (and (not (string-equal dir ".")) (> (length dir) 0))
                          (if-let ((starting-point (ignore-errors (org-find-exact-headline-in-buffer dir))))
                              (goto-char (marker-position starting-point))
                            (progn
                              (org-insert-subheading nil)
                              (insert dir)
                              (when (string-equal dir file-name)
                                (newline)
                                (insert "#+begin_src ")
                                (insert (if-let ((src-lang (assoc (concat "." (file-name-extension file-name)) literate-import-language-extensions)))
                                            (cdr src-lang)
                                          "TODO_REPLACE_ME"))
                                (insert " :tangle ")
                                (insert file)
                                (newline)
                                (insert (with-temp-buffer
                                          (insert-file-contents file)
                                          (buffer-string)))
                                (insert "#+end_src")
                                (newline))))))))
    (buffer-string)))

Lessons learned

Overall this was a fun project to work on during the weekend, and I needed it for a personal project I’m writing (in a literate style of course) which includes code generated by both blitz.js and leiningen, so I’m glad I was able to finish it and actually make use of it to return the beautiful literate style to my project!

Future work

My main idea here was to generate my project structure once, import it into my literate document, then rearrange the generated org mode structure and mark it up with prose. With blitzjs though, the code generation tool is integral to the day-to-day workflow, and new code is added all the time with it. My next step will be to set up a listener in the project to automatically import missing files into the literate document once the files are generated on the command-line, potentially with a different heading structure, such that the generated queries/mutations/components/schemas are co-located under the same org heading, instead of having the headings simply mirror the file system hierarchy.