X-Git-Url: http://git.droids-corp.org/?p=protos%2Flibecoli.git;a=blobdiff_plain;f=doc%2Farchitecture.rst;fp=doc%2Farchitecture.rst;h=64ecaa8124c6e61814cbb2f582b164c528fb9bd3;hp=0000000000000000000000000000000000000000;hb=e18710da81b4c53b357dde2ca55005344edc314f;hpb=31ab7b9ac5570db9da50014abb31adb618218324 diff --git a/doc/architecture.rst b/doc/architecture.rst new file mode 100644 index 0000000..64ecaa8 --- /dev/null +++ b/doc/architecture.rst @@ -0,0 +1,153 @@ +.. SPDX-License-Identifier: BSD-3-Clause + Copyright 2019 Olivier Matz + +Architecture +============ + +Most of *libecoli* is written in C. It provides a C API to ease the +creation of interactive command line interfaces. It is split in several +parts: + +- the core: it provides the main API to parse and complete strings. +- ecoli nodes: this is the modular part of *libecoli*. Each node + implements a specific parsing (integers, strings, regular expressions, + sequence, ...) +- yaml parser: load an ecoli tree described in yaml. +- editline: helpers to instantiate an *editline* and connect it to + *libecoli*. +- utilities: logging, string, strings vector, hash table, ... + +The grammar tree +---------------- + +The *ecoli nodes* are organized in a tree. An *ecoli grammar tree* +describes how the input is parsed and completed. Let's take a simple +example: + +.. figure:: simple-tree.svg + + A simple *ecoli grammar tree*. + +We can also represent it as text like this:: + + sh_lex( + seq( + str(foo), + option( + str(bar) + ) + ) + ) + +This tree matches the following strings: + +- ``foo`` +- ``foo bar`` + +But, for instance, it does **not** match the following strings: + +- ``bar`` +- ``foobar`` +- (empty string) + +Parsing an input +---------------- + +When the *libecoli* parses an input, it browses the tree (depth first +search) and returns an *ecoli parse tree*. Let's decompose what is done +when ``ec_node_parse_strvec(root_node, input)`` is called, step by step: + +1. The initial input is a string vector ``["foo bar"]``. +2. The *sh_lex* node splits the input as a shell would have done it, in + ``["foo", "bar"]``, and passes it to its child. +3. The *seq* node (sequence) passes the input to its first child. +4. The *str* node matches if the first element of the string vector is + ``foo``. This is the case, so it returns the number of eaten + strings (1) to its parent. +5. The *seq* node passes the remaining part ``["bar"]`` to its next + child. +6. The option node passes the string vector to its child, which + matches. It returns 1 to its parent. +7. The *seq* node has no more child, it returns the number of eaten + strings in the vector (2). +8. Finally, *sh_lex* compares the returned value to the number of + strings in its initial vector, and return 1 (match) to its caller. + +If a node does not match, it returns ``EC_PARSE_NOMATCH`` to its parent, +and it is propagated until it reaches a node that handle the case. For +instance, the *or* node is able to handle this error. If a child does +not match, the next one is tried until it finds one that matches. + +The parse tree +-------------- + +Let's take another simple example. + +.. figure:: simple-tree2.svg + + Another simple *ecoli grammar tree*. + +In that case, there is no lexer (the *sh_lex* node), so the input must +be a string vector that is already split. For instance, it matches: + +- ``["foo"]`` +- ``["bar", "1"]`` +- ``["bar", "100"]`` + +But it does **not** match: + +- ``["bar 1"]``, because there is no *sh_lex* node on the top of the tree +- ``["bar"]`` +- ``[]`` (empty vector) + +At the time the input is parsed, a *parse tree* is built. When it +matches, it describes which part of the *ecoli grammar tree* that +actually matched, and what input matched for each node. + +Passing ``["bar", "1"]`` to the previous tree would result in the +following *parse tree*: + +.. figure:: parse-tree.svg + + The *ecoli parse tree*, result of parsing. + +Each node of the *parse tree* references the node of the *grammar tree* +that matched. It also stores the string vector that matched. + +.. figure:: parse-tree2.svg + + The same *ecoli parse tree*, showing the references to the grammar + tree. + +We can see that not all of the grammar nodes are referenced, since the +str("foo") node was not parsed. Similarly, one grammar node can be +referenced several times in the parse tree, as we can see it in the +following example: + +.. figure:: simple-tree3.svg + + A simple *ecoli grammar tree*, that matches ``[]``, ``[foo]``, + ``[foo, foo]``, ... + +Here is the resulting *parse tree* when the input vector is ``[foo, foo, +foo]``: + +.. figure:: parse-tree3.svg + + The resulting *parse tree*. + +Node identifier +--------------- + +XXX + +Todo +---- + +- completions +- C example +- ec_config +- parse yaml +- params are consumed +- nodes +- attributes