In this article, we’ll go over the structure and core concepts of the
Recognizers-Text library so that you can build your own!
NOTE: We’ll be using the English Number Recognizer to show you the internal workflow.
First we’ll examine the
IModel interface which defines a contract as shown:
Only one method,
Parse is required to implement per the code contract. To understand this method we can take a look at the
AbstractNumberModel class where the contract is implemented:
This class is responsible for parsing a user query from LUIS and determining which type of number (number, ordinal, percentage) to map. Once the type of number entity is determined, we can create the number model in two steps:
- Extract the entity or entities from the input text.
- Parse each entity to some value that we can handle.
So, the key concepts to understand how the recognizers work is the implementation of the Extractors and Parsers. The Extractor and Parser components depend on what kind of number entities we want to recognize from a query.
All extractors will inherit from a
BaseNumberExtractor which includes an an interface of
IExtractor as shown below:
The abstract class
BaseNumberExtractor already includes a common implementation for the number extractor in the
Extract method which you can use as a base.
The extractor classes only needs to implement a dictionary with the regexes that will match the desired entity structure. The sample code from the repo provides a extractors for cardinals, doubles, fractions, integers, numbers, ordinals and percentages by default; which are all types of numbers we may want to extract from the user query. Each one of these classes defines its’ own dictionary of regexes, and populates the dictionary with new regular expressions in the constructor.
Taking a look at the English
DoubleExtractor class as an example:
In this implementation, we build the dictionary with the regexes that will match with the English Double Number format.
Here we can also note that the extractor makes use of the regexes of the
IntegerExtractor.AllIntRegex). So the extractors can be combined with one other to evaluate more complex entities. For example, the
NumberExtractor uses the regexes of the cardinal and fractional extractors and the
CardinalExtractor use the regexes of the Integer and Double extractors.
Additionally, for the implementation of the
NumberExtractor we have the ability to select the kind of number we will expect to extract, currently we made the distinction of pure numbers, currency related numbers and anything else.
After we use the extractor we are left with string values which we still need to parse.
Like extractors, parsers also require an interface:
We can see this interface
IParser in the class
BaseNumberParser. This special parser is meant to be language-agnostic, and can be used for several different languages by simply providing the proper language parser configuration which will provide the unique logic specific to the language which you’re writing recognizers for. In the following example, we define an
EnglishNumberParserConfiguration which implements the
In this class we configure the
CultureInfo and various particular tokens (Decimal separator, Fraction separator, Half-dozen identifier) unique to the language’s logic.
In the english parser configuration, we provide three static dictionaries for entity mapping to ordinals, cardinals, and round numbers. If you are creating number recognizers in another language, this class is where the bulk of your custom logic to handle the specifics of the language will be.
What happens when a particular value doesn’t directly map to our static dictionaries? In case the dictionaries are not enough to map all extracted entities, the
INumberParserConfiguration interface also provides a function
ResolveCompositeNumber to normalize the input obtained from the extractor and another function to resolve complex numbers that cannot be directly mapped as shown below.
Once the language-specific parser configuration is defined (example: SpanishNumberConfiguration.cs), we are ready to use the
To make it easier to get the Parser, the library has an
AgnosticNumberParserFactory which only needs the Type of the Parser and the proper instance of
Now that we have defined extractors and parsers, we can test to verify that our recognizers work. In this example, we show the initialization of the English Number Recognizer for ordinal numbers:
In a nutshell, to write a recognizer for another language, you only need to define your own extractors which will use the
BaseNumberExtractor interface and supply your own regular expressions unique to the language. Then, define a
LanguageNumberParserConfiguration class which will use the
INumberParserConfiguration interface along with custom logic and properties specific to the language.
Hopefully we’ve provided you with enough of a roadmap to help you get started creating your own recognizers.
The Bot Framework Team