parsepy0.0.1-SNAPSHOTParse Python-style configuration files, in Clojure dependencies
| (this space intentionally left almost blank) | ||||||
We want to parse Python config files in the form:
| |||||||
To accomplish this, we'll use the
Instaparse library,
which we import as We'll also set ourselves up to do some guard-rail programming, putting our tests inline with our code to help our description of how this module works. | (ns parsepy.core
(:require [clojure.test :refer [is]]
[instaparse.core :as insta])) | ||||||
ParsingLet's start with recognizing | (def sections
(insta/parser
"body = newline* section-tag+
section-tag = lbrace terms rbrace newline+
<lbrace> = <'['>
<rbrace> = <']'>
<newline> = <'\n'>
<terms> = #'[a-zA-Z0-9\\s]+'")) | ||||||
This produces output that's easily manipulated with Clojure functions; namely, nested vectors. | (is (= (sections "[My Section 1]\n")
[:body [:section-tag "My Section 1"]])) | ||||||
We also make sure multiple terms work. | (is (= (sections "[a]\n\n\n[b]\n")
[:body [:section-tag "a"]
[:section-tag "b"]])) | ||||||
Let's add assignment statements in the form
Our grammar becomes more complicated, to handle multiple sections, each with (potentially) multiple assignments and comments. | (def parser
(insta/parser
"<body> = <newline*> section+
section = <comment*> section-tag (<comment> | assignment)*
<section-tag> = <space>* <lbrace> section-terms <rbrace> newline+
<comment> = <hash> #'.+?\\n+'
<hash> = '#'
section-terms = s0 st*
<s0> = #'[a-zA-Z_]'
<st> = (space | #'[a-zA-Z0-9_]+')
<assignment> = lvalue <space*> <equal> <space>* const newline+
<space> = ' '
equal = <'='>
lbrace = <'['>
rbrace = <']'>
<newline> = <'\n'>
<const> = #'\\S+'
lvalue = #'[a-zA-Z][a-zA-Z0-9_]*'")) | ||||||
We also tranform our parsed data into something more natural for consumption by other Clojure functions, by turning any left-hand-side values into keywords, and concatenating the strings that make up any section titles. | (def transform-options {:lvalue keyword
:section-terms (partial apply str)}) | ||||||
Our parsing function then is just | (defn parse [input]
(->> (parser input)
(insta/transform transform-options))) | ||||||
and it works on both single sections | (is (= (parse "[a]\nb=1\n")
[[:section "a" :b "1"]])) | ||||||
and multiple sections. | (is (= (parse "[a section]
# A comment
y = 999
z = torNado_3
[b]
q=10
")
[[:section "a section" :y "999" :z "torNado_3"]
[:section "b" :q "10"]])) | ||||||
Parsing config files is now very straightforward. | (comment (clojure.pprint/pprint (parse (slurp "/Users/jacobsen/Dropbox/icecube/live/config/conf/defaults.conf"))) ==> ([:section "log" :log_location "$HOME/.i3live.log" :max_log_mb "100" :max_log_files "7"] [:section "dbserver" :loglevel "info" :port "7000" :cachedir "/mnt/data/i3live" :cachesize "1000000000" :jsonport "7002" :jsonfile "$HOME/catchall.json"] [:section "filewatcher" :loglevel "info"] ;; ... ) ) | ||||||
We could do more with this in terms of getting nicer data structures, or handling multi-line assignments, but this module already does most of what Python's ConfigParser does, with relatively little code. | |||||||