Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parser classes, xml, sexp, ini, and toml #519

Open
masatake opened this issue Aug 14, 2015 · 14 comments
Open

Add parser classes, xml, sexp, ini, and toml #519

masatake opened this issue Aug 14, 2015 · 14 comments

Comments

@masatake
Copy link
Member

I will leave Tokyo tomorrow. So I will give you some attractions:)

Like regex and xcmd, having xml parser class will be useful.
We can cover svg, html, xhtml, ant, docbook, ...xpath can be used to specify interesting elements.

I found following code in a public header file of libxml2.

/**
 * XML_GET_LINE:
 *
 * Macro to extract the line number of an element node.
 */
#define XML_GET_LINE(n)                     \
    (xmlGetLineNo(n))

For lisp family, S expression parser class will be uesful.
I think current lisp related parsers are not useful. Generally lisp programmer introduce the application own define-something with using define-macro/defmacro. Definitions defined with define-something should be captured as tags.

Following are def s in emacs I'm using.

def-edebug-spec     defadvice
defalias    default-boundp
default-file-modes  default-font-height
default-indent-new-line     default-line-height
default-toplevel-value  default-value
defconst    defcustom
defcustom-c-stylevar    defface
defgroup    defimage
define-abbrev   define-abbrev-table
define-abbrevs  define-alternatives
define-auto-insert  define-button-type
define-category     define-ccl-program
define-char-code-property   define-charset
define-charset-alias    define-charset-internal
define-coding-system    define-coding-system-alias
define-coding-system-internal   define-compilation-mode
define-derived-mode     define-error
define-fringe-bitmap    define-generic-mode
define-global-abbrev    define-global-minor-mode
define-globalized-minor-mode    define-hash-table-test
define-ibuffer-column   define-ibuffer-filter
define-ibuffer-op   define-ibuffer-sorter
define-key  define-key-after
define-mail-abbrev  define-mail-alias
define-mail-user-agent  define-minor-mode
define-mode-abbrev  define-obsolete-face-alias
define-obsolete-function-alias  define-obsolete-variable-alias
define-prefix-command   define-skeleton
define-translation-hash-table   define-translation-table
define-widget   define-widget-keywords
defined-colors  defining-kbd-macro
defmacro    defmath
defsubst    deftheme
defun   defvar
defvar-local    defvaralias

Realizing the concept optlib is one of my primary motivation of working on ctags.
However, now I recognize regex syntax I know is not so portable. It is just "syntax error" in MacOSX.
regex on macosx is very limited.
If we introduce a new parser class pcre, users can write a parser with more powerful syntax and portable way. I will never think making current regex parser obsolete but just introduce newer one.

Do you have more ideas about parser classes?
Following code in parse.h is the start point.

typedef enum  {
  METHOD_NOT_CRAFTED    = 1 << 0,
  METHOD_REGEX          = 1 << 1,
  METHOD_XCMD           = 1 << 2,
  METHOD_XCMD_AVAILABLE = 1 << 3,
} parsingMethod;

Happy hacking.

@masatake masatake modified the milestone: Feature plan Sep 30, 2015
@masatake
Copy link
Member Author

masatake commented Oct 5, 2015

https://github.com/arduino/ctags/blob/master/gir.c

This is very impressive parser. We should import this then generalize it.

@masatake
Copy link
Member Author

masatake commented Oct 5, 2015

    $ ./ctags -o - --langdef=maven \
    --xpath-maven="a,artifactId{}///*[local-name()='project' and namespace-uri()='http://maven.apache.org/POM/4.0.0']/*[local-name()='artifactId' and namespace-uri()='http://maven.apache.org/POM/4.0.0']/text()" pom.xml

    build-tools-root    pom.xml /^  <artifactId>build-tools-root</artifactId>$/;"   a

@masatake
Copy link
Member Author

masatake commented Oct 5, 2015

Hard-coded version now works!!!

% ./ctags -x  pom.xml 
build-tools-root artifactId    9 pom.xml <artifactId>build-tools-root</artifactId>

@masatake
Copy link
Member Author

masatake commented Oct 5, 2015

Hey, @p-montanus, I need libxml2. What we should do in gentle way?
What I did is:

--- a/Makefile.in
+++ b/Makefile.in
@@ -68,14 +68,15 @@ COVERAGE_CFLAGS=--coverage
 COVERAGE_LDFLAGS=--coverage
 endif

-ALL_CFLAGS = $(CFLAGS) --std=gnu99 -Wall $(COVERAGE_CFLAGS)
+ALL_CFLAGS = $(CFLAGS) --std=gnu99 -Wall $(COVERAGE_CFLAGS) `pkg-config --cflags libxml-2.0`
+

 DEBUG_CPPFLAGS ?= -DDEBUG
 ALL_CPPFLAGS = $(CPPFLAGS)         \
    $(DEBUG_CPPFLAGS)           \
    -DDATADIR=\"$(pkgdatadir)\"     \
    -DPKGCONFDIR=\"$(pkgsysconfdir)\"   \
-   -DPKGLIBEXECDIR=\"$(pkglibexecdir)\"
+   -DPKGLIBEXECDIR=\"$(pkglibexecdir)\" 

 include $(srcdir)/source.mak

@@ -173,7 +174,7 @@ V_CC_1   =
 all: $(CTAGS_EXEC) $(READ_LIB) $(READ_CMD)

 $(CTAGS_EXEC): $(OBJECTS)
-   $(V_CC) $(CC) $(LDFLAGS) -o $@ $(OBJECTS) $(LIBS)
+   $(V_CC) $(CC) $(LDFLAGS) -o $@ $(OBJECTS) $(LIBS) `pkg-config --libs libxml-2.0`

 $(READ_CMD): readtags.c readtags.h
    $(V_CC) $(CC) -DREADTAGS_MAIN -I. -I$(srcdir) -I$(srcdir)/main $(DEFS) $(ALL_CPPFLAGS)  $(ALL_CFLAGS) $(LDFLAGS) -o $@ $(srcdir)/readtags.c

@p-montanus
Copy link
Contributor

Hey, @p-montanus, I need libxml2. What we should do in gentle way?

Luke, use PKG_CONFIG_MODULES PKG_CHECK_MODULES in configure.ac, use @*_CFLAGS@ and @*_LIBS@ in Makefile.in.

PKG_CHECK_MODULES([LIBXML2], [libxml-2.0], [: if-found], [: if-not-found])
LIBXML2_CFLAGS = @LIBXML2_CFLAGS@
LIBXML2_LIBS = @LIBXML2_LIBS@
ALL_CFLAGS += $(LIBXML2_CFLAGS)
LIBS += $(LIBXML2_LIBS)

@masatake
Copy link
Member Author

masatake commented Oct 5, 2015

Great. After merging your #592 and #601, I will put make a PR. Instead of targeting maven, I will rewrite ant parser with this new technology.

@ffes, @k-takata, and @cweagans, is libxml2 available on your maintained platform?
I found I can implement a XML based parser easily with libxml2. I would like to use it in ctags.
I would like to hear your comment about using libxml2.

@k-takata
Copy link
Member

k-takata commented Oct 5, 2015

(@masatake You misspelled my name. I have fixed it.)

I confirmed that MSYS2 has libxml2 packages (mingw-w64-i686-libxml2 and mingw-w64-x86_64-libxml2), so it would be easy to use libxml2 on MSYS2.
But I'm not sure we can use it on MSVC. (Maybe we can, but not so easy I think.)

@masatake
Copy link
Member Author

masatake commented Oct 5, 2015

(@masatake You misspelled my name. I have fixed it.)

I'm very sorry.

I confirmed that MSYS2 has libxml2 packages (mingw-w64-i686-libxml2 and mingw-w64-x86_64-libxml2), so it would be easy to use libxml2 on MSYS2.
But I'm not sure we can use it on MSVC. (Maybe we can, but not so easy I think.)I confirmed that MSYS2 has libxml2 packages (mingw-w64-i686-libxml2 and mingw-w64-x86_64-libxml2), so it would be easy to use libxml2 on MSYS2.
But I'm not sure we can use it on MSVC. (Maybe we can, but not so easy I think.)

Thank you for the comment.

Instead of reworking on ant.c, it will be better to create main/lxpath.c. So I can put all libxml2 related ifdef/endif into the one file.

@b4n
Copy link
Member

b4n commented Oct 5, 2015

Luke, use PKG_CONFIG_MODULES in configure.ac, use @*_CFLAGS@ and @*_LIBS@ in Makefile.in.

Spelled PKG_CHECK_MODULES it is Obi-Wan ;)

@p-montanus
Copy link
Contributor

Spelled PKG_CHECK_MODULES it is Obi-Wan ;)

Spelling is fixed, peacefully.
May the Force be with you.

@masatake masatake changed the title Add parser classes, xml, sexp, and pcre Add parser classes, xml, sexp, ini, and pcre Oct 14, 2015
@masatake masatake mentioned this issue Nov 11, 2015
7 tasks
@arichiardi
Copy link

Hi folks what is the status of this one? Is some help needed? I came here while investigating how to generate good tags for Clojure.

@masatake
Copy link
Member Author

Meta sexp parser has two aspects.

  1. it can be used as a kind of template for parsers like elips, cl, scheme, and, Clojure.
  2. it helps to capture user-defined defX in the parsers. In other words, the sexp meta parser helps a ctags user writing a subparser in the parsers like elips, cl, scheme, and, Clojure. About the concept, subparser, see http://docs.ctags.io/en/latest/running-multi-parsers.html?highlight=subparser .

I think, what I want is understandable to lisp hackers.
The idea is very attractive to me. However, I don' have time to work on it.
If you are interested in lisp family, you can try to implement it.

If you are just interested in Clojure, you can implement it with a crazy mtable meta parser.
See http://docs.ctags.io/en/latest/optlib.html?highlight=mtable#byte-oriented-pattern-matching-with-multiple-regex-tables . It is not documented well. See also #1620 .

@arichiardi
Copy link

Ok thanks! This information is very valuable, I will see what I can do!

@masatake
Copy link
Member Author

masatake commented May 27, 2021

class (meta parser) C level Optlib level note
regex yes YES
libxml(xpath) yes no See #3897
libyaml yes no Not so useful. libypath is needed.
S expression no no This should cover clojure, elisp, lisp, scheme.
json no no We have json parser.
iniconf no no We have iniconf parser.
toml no no
packci no no interpreter version of packcc

@masatake masatake changed the title Add parser classes, xml, sexp, ini, and pcre Add parser classes, xml, sexp, ini, and toml Oct 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants