Identify single/multiple food elements inside a string (user input)

Identify single/multiple food elements inside a string (user input) - php

This is my first post after trying to find a solution to my question without luck.
I'll appreciate if you can help me :)
I'm trying to develope a solution were the user input what they have eaten for breakfast in a texbox, so lets say "an orange with toast bread and milk" and my app recognizes the food or identify them to see how many calories has each one from the following table:
Food - cooked - Calories
Orange cake - oven - 200
Cow Milk - raw - 50
Sheep Milk - raw - 40
Orange - juice - 15
cereal bread - toast - 10
bread - toast - 5
bacon - toast - 10
The solution I've made is a fulltext search for the whole string without doing any explode/implode functs. So the results I get are (by memory, so it's not accurate):
Fulltext rank - Food - cooked - Cal
10,523634 - bacon - toast - 10
5,2342342 - sheep milk - raw - 40
5,2342342 - cow milk - raw - 50
4,2342345 - cereal bread- toast - 10
3,2342344 - orange cake - oven - 200
2,2342342 - orange - juice - 15
$query="
SELECT Food, cooked,
MATCH ( Food, cooked)
AGAINST ( '$search' ) AS score
FROM food_table
WHERE
MATCH ( Food, cooked)
AGAINST ( '$search' )
ORDER BY Score
DESC LIMIT 50";
I discovered that some scores where the same, sheep milk and cow milk so I added a new row in mysql called "milk - average" to be the first solution in fulltext and then I delete the rest of "same rank" solutions (I don't have more info from the user, so I just make an average of calories from different kind of milks)
But still, this is not very accurate, for example, with orange or others, fulltext give me a wrong first option, "orange cake - oven" when I wanted to have just "orange - juice" that matches better (at least it matches one column perfectly). But still, the results are giving me multiple options for the same input and doing a score discrimination is not enough to let the app "understand" that if it's entered once, it shouldn't have two results with the same input.
Just in case if I explained myself wrongly, the final results I want are:
input:
an orange with toast bread and milk
Solution:
orange - juice - 15
bread - toast - 5
milk - average - 45 (this one, as said, is adding a new mysql row with the data)
Total: 65 calories
I don't want the code (if you have time is more than welcome) but the funcions I need to use for this purpose, or any other better way to do all of this, and I'll google it to understand.
The second part of the code is to identify the food even if they have any typo, for example oarnge. I think this is done with the Levenshtein distance not sure if I can apply the same solution for the whole need..
Thanks in advance!!

I think you have some options to solve your problem:
Writing a natural language parser
(NLP on Wikipedia)
You can use some parsing tools (just google nlp php) to map a phrase into a tree, do some part-of-speech tagging and so extract the words you need (maybe with their adjectives, so you can find if and how the food is cooked).
This way can be quite complex.
Limit user input
Only you know how your app is designed, but consider the possibility of changing the way the user can interact with it. You can force the user to click on a "add" button and select from a list of foods.
Somwhere in the middle
If you think that typing it's more natural and fast maybe you can find a compromise between the two above. Like asking the user to put commas between the "aliments" and/or implementing some sort of autocompletion.
In this case just some regular expressions can do the job.
For sure there are other paths to follow, like doing statistical nlp or using a dictionary to keep only useful words...
For what concerns typing errors: yes, Levenshtein distance is a widely used technique and you can use it (if you split the phrase in some manner so you have a string comparable to the Food column of your database).

Related

Parsing, formatting and generating data based on input

For some known inputs I have some known outputs/results. Based on this I want the program to generate result based on the input as per pre-filled input-results data.
Example input:
Enjoy your tea in the morning then have some bread in the lunch. Enjoy the taste of a garlic chicken in the dinner.
Your day starts with cold coffee. In the noon have some rice and fish curry.
Example output:
Have tea in the morning. Have some bread in the lunch. Have garlic chicken in the dinner.
Have cold coffee. Have some rice and fish curry.
I don't want to use string replace or regexp as it will break often. How or where do I start ?

If you have a large number of input and output pairs, you can treat this as a sequence to sequence task. The input can be considered your source and output can be considered as a target. You can easily develop a baseline model using OpenNMT.

Not really clear on your how to approach your specific problem, but let me go about a few ways to solve text related issues, since it seems to be what you are interested at.
Level 0 Static text hashing
IF, and that's a big if, your input is static, you could have digests maping inputs to outputs. But, as you mentioned, this is easily breakable. Even one extra space would result in a mismatch and that's why it's level 0.
Level 1 Pre-process your input:
Remove all extra spaces before, after and in-between words.
Remove stopwords from your input:
List of common stop-words https://www.textfixer.com/tutorials/common-english-words.txt
This step would transform your input to:
Enjoy tea morning bread lunch. Enjoy taste garlic chicken dinner.
day starts cold coffee. noon rice fish curry.
Next you could remove verbal conjugation, which doesn't apply to your example, but let's assume you had a sentences like:
drink tea, drank juice and drinks soda.
This sentence your become:
drink tea, drink juice drink soda
You could go even deeper and have synonyms normalization, example:
drink tea, sip water, slurped a juice, swallow beer
Then, all of them would become:
drink tea, drink water, drink juice, drink beer
After these steps are done, you have kind of a non statistical way of processing text. It all comes down to removing any redundancy and language flourish and getting down to the literal stuff.
And, of course, this approach loses a ton of the value contained in the english language. You can't tell sarcasm, you can't have analogies. So, this works for some domains, but it's not that advanced.
This approach is more about text processing and not language processing. See the difference?
If you need a smarter way to go about this, you should look into full text search algorithms
Level 2 Full text search algorithms
There are several ways to do this, here is one.
You've got a sentence like:
I want pizza
This search term would become
want piz za
And would search for
want piz
piz za
want za
This is super basic stuff, and it's just to show you how raw text processing works and ways you could go about this. Maybe you could have your inputs processed by level 1 to make them simpler and less variable and then have them processed by level 2 to be indexed in a db and then you have a nice way to query them
Level 3 NLP - Natural Language Processing
This is still not machine learning, but it is smarter and it's built on top of all the other steps. basically you would clean your inputs of nonsense and try to apply english gramatical structure to it.
To know more: https://dev.to/nicfoxds/getting-started-in-nlp-b0e
level 4 Deep learning stuff
Basically, google.
You get a bunch of text, a bunch of search queries, a bunch of user tracking data mapping queries to text. You feed all of that into a neural network and statistical models will detect patterns for you and make your search better as it goes.
Summary
If this is a project are serious about, look into NLU. It will give you a decent outcome as you track usage. Then, when you have enough user data, go for the deep learning stuff.
There's no easy way around this, you either do this by hand or implement a database that has some of those features, like elasticsearch. But as one of the comments mentioned, php is not a language for this.

If your input is truly known, then you can use str_replace() e.g.
$input = 'Enjoy your tea in the morning then have some bread in the lunch. Enjoy the taste of a garlic chicken in the dinner.
Your day starts with cold coffee. In the noon have some rice and fish curry.';
$old = array('Enjoy your ', ' then have ', '. Enjoy the taste of a ', 'Your day starts with ', '. In the noon have ');
$new = array('Have ' , '. Have ' , '. Enjoy ' , 'Have ' , '. Have ' );
$output = str_replace($old, $new, $input);
Beware of case sensitivity and things like spaces, periods and other punctuation.
If your input is less known, then you could use regex as you surmised.

Translating strings with multiple sections needing pluralization

We're using the Symfony Translation component in our PHP application. It is capable of handling pluralisation in a very clever way, but as far as I can tell it can only handle a single "quantity" per string.
For example, it can translate
I have 3 apples.
Or
I have 1 orange.
But I can't work out a way to handle more complex strings like:
I have 3 apples and 1 orange.
Now, the obvious solution is to translate them separately and then join them together, but in my real life situation the strings are more complicated than this and according to our German team the order of the components cannot always be guaranteed to be the same. Sticking with my fake apples and oranges example, we could have the English string:
I'll have 3 apples each morning and 1 orange each weekday afternoon for the next 2 weeks.
I'd like to have a translation string like:
I'll have {{1 apple|%count_apples% apples}} each morning and {{1 orange|%count_oranges% oranges}} each weekday afternoon for the next {{1 week|%count_weeks% weeks}}.
And we need to consider that in another language, the structure of the sentence might only work if we use:
For the next 2 weeks, I'll have 3 apples each morning and 1 orange each weekday afternoon.
For the next {{1 week|%count_weeks% weeks}}, I'll have {{1 apple|%count_apples% apples}} each morning and {{1 orange|%count_oranges% oranges}} each weekday afternoon.
To complicate things, further, the word for "and" might change depending on if one of the quantities is a plural. Right now, we're only bothered about English and German but will need other languages in the mid-term future and then there isn't even just a singular and plural.
We're open to using something other than the Symfony Translation component for this section if required as it is quite self-contained.
Does anybody have any past experience in this, or ideas as to how to go about implementing this?

Implementing Bayes classifier (in PHP)

I have a theoretical question about a Naive Bayes Classifier. Assume I have trained the classifier with the following training data:
class word count
-----------------
pos good 1
sun 1
neu tree 1
neg bad 1
sad 1
Assume I now classify "good sun great". There are now two options:
1) classify against the trainingdata, which remains static. Meaning both "good" and "sun" come from the positive category, classifying this string as a positive. After classification, the training table remains unchanged. All strings are thus classified against the static set of training data.
2) You classify the string, but then update the training data, as in the table underneath. Thus, the next string will be classified against a more "advanced" set of training data than this one. By the end of (automatic) classification, the table that started out as a simple training set, will have grown in size, having been expanded with many words (and updated word counts)
class word count
-----------------
pos good 2
sun 2
great 1
neu tree 1
neg bad 1
sad 1
In my implementation of NMB I used the first method, but I'm now second-guessing I should have done the latter. Please enlighten me :-)

The method you've implemented is indeed the popular and accepted way of building classifiers (and not just Bayesian ones).
Using "unlabeled" data, i.e. data you have no ground-truth about, to update the classifier, is a more advanced and complicated technique, sometimes called "semi-supervised learning".
Using this class of algorithms might or might not be a good fit to your specific task - it's usually a matter of trial and error.
If you do decide to incorporate unlabeled data into your model, you should probably try out one of the popular algorithms of doing that, e.g. EM.

Algorithm that creates "teams" based on a numeric skill value

I am building an application that helps manage frisbee "hat tournaments". The idea is people sign up for this "hat tournament". When they sign up, the provide us with a numeric value between 1 and 6 which represents their skill level.
Currently, we are taking this huge list of people who signed up, and manually trying to create teams out of this based on the skill levels of each player. I figured, I could automate this by creating an algorithm that splits up the teams as evenly as possible.
The only data feeding into this is the array of "players" and a desired "number of teams". Generally speaking we are looking at 120 players and 8 teams.
My current thought process is to basically have a running "score" for each team. This running score is the total of all assigned players skill levels. I loop through each skill level. I go through rounds of picks once inside skill level loop. The order of the picks is recalculated each round based on the running score of a team.
This actually works fairly well, but its not perfect. For example, I had a range of 5 pts in my sample data array. I could very easily, manually swap players around and make the discrepancy no more then 1 pt between teams.. the problem is getting that done programatically.
Here is my code thus far: http://pastebin.com/LAi42Brq
Snippet of what data looks like:
[2] => Array
(
[user__id] => 181
[user__first_name] => Stephen
[user__skill_level] => 5
)
[3] => Array
(
[user__id] => 182
[user__first_name] => Phil
[user__skill_level] => 6
)
Can anyone think of a better, easier, more efficient way to do this? Many thanks in advance!!

I think you're making things too complicated. If you have T teams, sort your players according to their skill level. Choose the top T players to be captains of the teams. Then, starting with captain 1, each captain in turn chooses the player (s)he wants on the team. This will probably be the person at the top of the list of unchosen players.
This algorithm has worked in playgrounds (and, I dare say on the frisbee fields of California) for aeons and will produce results as 'fair' as any more complicated pseudo-statistical method.

A simple solution could be to first generating a team selection order, then each team would "select" one of the highest skilled player available. For the next round the order is reversed, the last team to select a player gets first pick and the first team gets the last pick. For each round you reverse the picking order.
First round picking order could be:
A - B - C - D - E
second round would then be:
E - D - C - B - A
and then
A - B - C - D - E etc.

It looks like this problem really is NP-hard, being a variant of the Multiprocessor scheduling problem.
"h00ligan"s suggestions is equivalent to the LPT algorithm.
Another heuristic strategy would be a variation of this algorithm:
First round: pick the best, second round: pair the teams with the worst (add from the end), etc.
With the example "6,5,5,3,3,1" and 2 teams this would give the teams "6,1,5" (=12) and "5,3,3" (=11). The strategy of "h00ligan" would give the teams "6,3,3" (=12) and "5,5,1" (=11).

This problem is unfortunately NP-Hard. Have a look at bin packing which is probably a good place to start and includes an algorithm you can hopefully tweak, this may or may not be useful depending on how "fair" two teams with the same score need to be.

Simple, tricky and interesting questions and exercises for PHP Beginners [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I'm giving a small PHP course over the next weekend and i would like to present a few questions and exercises for my students, so they can practice with an objective, a fun one. I already presented the basics for them, now it's time for some action.

Finding ways to implement simple algorithms always provides great practice. If you think they're ready for higher level data structures (linked lists, graphs, etc.) then you could give them a Depth-First Search problem. If they're not at that level yet, try working with arrays and for/while loops. You can iterate lots of functions over entire arrays very easily. For example, average the values of an array, sum the values, or create a new array of N-1 elements (where the first array had N elements), each of which is the difference of element N and element N+1 in the original array.
If you want to try any of the examples into the real world, try grade calculation algorithms (given a list of grades, find the GPA) or shopping carts (you bought 1 of item X, 3 of item Y, 2 of item Z... total price?)
You can also make it a bit for complicated by having weighted grades (a B in a 3 hours class and an A in a 1 hour class = a GPA of 3.25)
I would also recommend doing a little bit of work with either databases or files input/output. The ability to save the results of your work and recall them later will GREATLY extend their understanding of complex larger systems like websites.
If you think it's not too complicated (I don't know the level of the students), one assignment I had in a class a couple of years ago (which we did in PERL) could be modified. It involved the following text document:
1 | Billy | Bob | Kentucky | Yale
2 | Sally | Sue| Virginia | Harvard
...
We were told to assume the pattern id | first_name | last_name | state | university, however there could be a variable amount of white-space. There were also some malformated entries, such as:
...
7 | Joe | 3 | Ohio | MIT
...
Clearly 3 isn't a last name. We were told to use regular expressions to verify that the ID was an integer less than 10000, the first and last names consisted only of letters, the state had to start with a capital letter and be followed by some number of lower-case letters, and the university had to consist only of letters. If there were any errors we had to say what the error was and what line of the file it was on. (For example: "Error on line 7: 3 is an invalid last name. Should be only letters")
After this we entered a loop (our program was interactive and ran from shell) where they could enter 1 for id, 2 for first name, 3 for last name, etc. They entered 0 to quit. Whatever they put in, they could then type a string to search for and it would find a student who matched that criteria and display their information. Instead of an interactive loop, if you're teaching PHP for use on a web server, maybe allow them to submit a form and check the $_POST information.

For example u can give them a simple for loop statement and ask them to implement it with while statement or vise versa. and do this for other statements like switch case and if.

Arrays are a stumbling point for most beginners I know. I'd personally run them through single-and multi-dimensional array looping and stepping. With MVC frameworks becoming so prevalent, the foreach loop and array functions become vital to programming success.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.