http://www.jpicedt.org

Package jpicedt.format.input.util

This package contains helper classes for building a parser based on the well-known RegExp scheme, yet with a strong object-oriented approach in mind.

See:
          Description

Class Summary
AbstractRegularExpression This is the abstract superclass for all regular expressions that may help building a RegExp-based parser.
AlternateExpression A regular expression that mimics the "x|y" RegExp syntax.
CommentExpression Parse comment strings.
Context A class that stores context information about the parsing process, like: current line number, current parsed substring, block markers, stack for markers… By convention, end markers (EOF, EndOfBlocks, …) always refer to a position one character ahead of the last character (e.g. of the block), so that String.substring() works properly w/o adding 1 to the end-index.
EnclosingExpression An expression that can encompass a sub-expression it encloses with markers, e.g. "{" + sub-expression + "}"
The interpret() methods work as follows :
look up an endMarker matching beginMarker in Context.getRemainingSubstring (that is, skip enclosed blocks with the same markers type) set this endMarker as the new Context's endMarker save enclosed expression as "value", and interpret it restore old Context's endMarker
ExpressionConstants Constants used by subclasses of AbstractRegularExpression.
InstanciationExpression An expression that can instanciate a new Element by cloning the given graphic element when it finds a given litteral tag, then add it to the current PicGroup in the pool.
LiteralExpression An expression specified by a String to be exactly matched at the current cursor position.
NotParsableExpression Any string (but w/o line-feeds !)
NumericalExpression An expression containing only digits, possibly preceded by whitespaces ; a post-delimiters can be specified, as well as the number's type (int or double) and its sign
OpenLaTeXJPICXmlExtractor Cette classe permet d'extraire le code JPIC-XML enfoui dans un fichier au au format "open LaTeX JPIC-XML", c'est à dire du JPIC-XML où le code XML proprement dit est commenté à la LaTeX, sauf le code LaTeX des éléments <text>, celui-ci appairaissant en clair.
OptionalExpression an expression that represents a pattern repeating at most once
ParserEvent An event that gets sent as an argument of the "action" method during an interpret operation
PicPointExpression An expression for 2D-Point parsing e.g. "(12.3, 34.5)" or "[12.1;-16]" If a coordinate conversion is necessary, it must be computed in the body of the action() method
Pool Offers a means for expressions belonging to the parser-tree to share variables across the tree.
Pool.Key<T> Enforces use of strong typing for keys being pushed in the map .
RegExExpression An expression specified by a java.util.regex.Pattern regular expression.
RepeatExpression An expression that represents a pattern repeating a given number of times
RootExpression This is the super-class for head-expressions that contain grammar rule for a given format.
SequenceExpression An expression that represents a sequence of expressions to be matched exactly in the same order they're being added to the sequence.
StatementExpression An expression for "statement"-parsing, i.e. a name followed by an assignment sign followed by a numerical value e.g.
TeXExtractor Détecte si le dessin codé est codé dans l'une des variante de TeX (c'est à dire LaTeX env picture de base, ou Epic/Eepic, ou pstricked).
TeXJPICXmlExtractor Cette classe permet d'extraire le code JPIC-XML enfoui dans un fichier au format de sauvegarde TeX.
WhiteSpaces Multiple white spaces (w/o line-feeds)
WhiteSpacesOrEOL Multiple white spaces and/or '\n'
WildCharExpression a RegExp that represents a single occurence of a wild-char, i.e.
WordExpression A RegExp that parses a word, that is, a string: either composed of letters only, or letters and digits only (see java.lang.Character.isLetter() for details), or terminated by the specified end-delimiter (in which case it may contain chars not restricted to letters)
 

Exception Summary
REParserException An Exception manager to be used by RE-parsers (i.e. those built on top of AbstractRegularExpression's).
REParserException.BeginGroupMismatch a "begin group" has no matching "end group"
REParserException.BlockMismatch a closing delimiter has no matching opening delimiter (see EnclosingExpression)
REParserException.EndGroupMismatch a "end group" has no matching "begin group"
REParserException.EndOfPicture the end of the picture environment was encoutered.
REParserException.EndOfPictureNotFound the end of the picture environment wasn't found in the current Reader.
REParserException.EOF the end of the file (or the underlying Reader) was reached abnormally, e.g. in the course of a AbstractRegularExpression.interpret() operation.
REParserException.IncompleteSequence signals an incomplete SequenceExpression
REParserException.NotFoundInFile a mandatory expression wasn't found
REParserException.NumberFormat aka NumberFormatException
REParserException.NumberSign signals an error concerning the sign of a number (see NumericalExpression)
REParserException.SyntaxError a syntax error has occured ; should be used as a last resort, when no specific exception message applies.
 

Package jpicedt.format.input.util Description

This package contains helper classes for building a parser based on the well-known RegExp scheme, yet with a strong object-oriented approach in mind. Keyword human-readibility !

The base superclass of this package is the AbstractRegularExpression class. Two daughter classes then help building a grammar tree, namely AlternateExpression and SequenceExpression, which perform RegExp-like OR and AND operations respectively.

Regular expressions work hand-in-hand with two important classes : an instance of the Context class, which is used to feed successive pieces of text to the set of reg-exp's that build up the grammar tree, and an instance of the Pool class, which allows regular expression to share data across the whole grammar tree. A RootExpression might then help building a stand-alone parser, by serving as a communication hub between the context, the pool and the various regular expressions that build up the tree.

We make use of the classical callback mechanism to process parsing events, yet instead of using a unique, separate content-handler (e.g. as in most XML/HTML parsers), each AbstractRegularExpression has its own content-handler, which is part of the same class, and is implemented in the core of the action(ParserEvent e) method. This allows subclasses to easily specialize content-handling behaviour only, by simply overiding the latter method. (see e.g. InstanciationExpression for an example).

As a rule of thumb, developpers should as much as possible avoid dealing directly with the Context instance in their own implementations, and rather rely on existing helper classes, since the latters already encapsulate much of the tricky communication scheme. To set the stage, classes LiteralExpression, NumericalExpression, StatementExpression and WordExpression comprise a minimal set of helpers from which it might be easy to build a rather complicated grammar rule. Besides, classes OptionalExpression and RepeatExpression allow to handle RegExp-like *, + and ? operations quite easily.


http://www.jpicedt.org

Submit a bug : syd@jpicedt.org