parsepy0.0.1-SNAPSHOTParse Python-style configuration files, in Clojure dependencies
| (this space intentionally left almost blank) | ||||||
We want to parse Python config files in the form:
| |||||||
To accomplish this, we'll use the
Instaparse library,
which we import as We'll also set ourselves up to do some guard-rail programming, putting our tests inline with our code to help our description of how this module works. | (ns parsepy.core (:require [clojure.test :refer [is]] [instaparse.core :as insta])) | ||||||
ParsingLet's start with recognizing | (def sections (insta/parser "body = newline* section-tag+ section-tag = lbrace terms rbrace newline+ <lbrace> = <'['> <rbrace> = <']'> <newline> = <'\n'> <terms> = #'[a-zA-Z0-9\\s]+'")) | ||||||
This produces output that's easily manipulated with Clojure functions; namely, nested vectors. | (is (= (sections "[My Section 1]\n") [:body [:section-tag "My Section 1"]])) | ||||||
We also make sure multiple terms work. | (is (= (sections "[a]\n\n\n[b]\n") [:body [:section-tag "a"] [:section-tag "b"]])) | ||||||
Let's add assignment statements in the form
Our grammar becomes more complicated, to handle multiple sections, each with (potentially) multiple assignments and comments. | (def parser (insta/parser "<body> = <newline*> section+ section = <comment*> section-tag (<comment> | assignment)* <section-tag> = <space>* <lbrace> section-terms <rbrace> newline+ <comment> = <hash> #'.+?\\n+' <hash> = '#' section-terms = s0 st* <s0> = #'[a-zA-Z_]' <st> = (space | #'[a-zA-Z0-9_]+') <assignment> = lvalue <space*> <equal> <space>* const newline+ <space> = ' ' equal = <'='> lbrace = <'['> rbrace = <']'> <newline> = <'\n'> <const> = #'\\S+' lvalue = #'[a-zA-Z][a-zA-Z0-9_]*'")) | ||||||
We also tranform our parsed data into something more natural for consumption by other Clojure functions, by turning any left-hand-side values into keywords, and concatenating the strings that make up any section titles. | (def transform-options {:lvalue keyword :section-terms (partial apply str)}) | ||||||
Our parsing function then is just | (defn parse [input] (->> (parser input) (insta/transform transform-options))) | ||||||
and it works on both single sections | (is (= (parse "[a]\nb=1\n") [[:section "a" :b "1"]])) | ||||||
and multiple sections. | (is (= (parse "[a section] # A comment y = 999 z = torNado_3 [b] q=10 ") [[:section "a section" :y "999" :z "torNado_3"] [:section "b" :q "10"]])) | ||||||
Parsing config files is now very straightforward. | (comment (clojure.pprint/pprint (parse (slurp "/Users/jacobsen/Dropbox/icecube/live/config/conf/defaults.conf"))) ==> ([:section "log" :log_location "$HOME/.i3live.log" :max_log_mb "100" :max_log_files "7"] [:section "dbserver" :loglevel "info" :port "7000" :cachedir "/mnt/data/i3live" :cachesize "1000000000" :jsonport "7002" :jsonfile "$HOME/catchall.json"] [:section "filewatcher" :loglevel "info"] ;; ... ) ) | ||||||
We could do more with this in terms of getting nicer data structures, or handling multi-line assignments, but this module already does most of what Python's ConfigParser does, with relatively little code. | |||||||