The LUIS Team has developed a new recognizer library that provides greater accuracy in identifying numerics, and also allows the developer to provide context to which those numerics refer to. Luis now incorporates a new library for number recognition in Microsoft.Recognizers.Text.Numbers
, which implements a solution using Regular Expressions. Regular Expressions (Regex
) are a well established and proven method used to identify specific patterns, and used quite regularly in all sorts of applications. This allows the machine-learning back end service in LUIS
to concentrate on interpreting natural language, and allowing the number recognizers to provide the heavy lifting for numerics.
New Recognizers vs. The Old
Entity recognizers in LUIS
have two main parts:
1) Recognition of the entity
2) Resolution of the entity into a value for an application to use
Comparing the new LUIS Recognizer vs the Old:
Query | Old Recognizer | New Recognizer |
---|---|---|
one thousand | “one thousand” | “1000” |
1,000 | “1,000” | “1000” |
1/2 | “1/2” | “0.5” |
one half | “one half” | “150” |
one hundred fifty | “one hudred fifty” | “150” |
one hundred and fifty | “one hundred and fifty” | “150” |
one point five | “one point five” | “1.5” |
two dozen | “two dozen” | “24” |
NOTE: LUIS JSON response will return a
string
From our list of examples, there are many ways in which numeric values are used to quantify, express, and describe pieces of information, with more possibilities than the examples listed.
The old LUIS
number recognizers implemented machine learning recognizers which worked well, but did not include resolution; and would sometimes miss recognizing some forms of numbers. Using the new number recognizers which provide resolution, LUIS
is able interpret more variations a user could provide in a query, and return consistent numeric values.
Using a direct example from our table:
We passed in a query of two dozen
and in the LUIS response we have 24
!
Adding Composite Entities for Context
What about including units? Training artificial intelligence services such as LUIS
to not only recognize numerics but also the context in which the user is referring to, is one of the big challenges to solve in developing natural language understanding. Thankfully, LUIS
already incorporates many common pre-built entities available for an application.
For our example, we simply implement the ordinal and percentage pre-built entities into our model to provide some context in our number recognition, and receive the following LUIS response:
Open Source
Lastly, we are both excited and proud to announce that we’re releasing this new library open-source to the public. We are looking forward to working with the community to develop something truly special. The new recognizers currently provide English, Spanish and Chinese language support. In the next article, we’ll discuss the mechanics of how the Number Recognizers
works so that developers can produce their own recognizers.
Happy Making!
Ezequiel Jadib and Matthew Shim from the Bot Framework Team