Find record with keywords from sentences - php

I have some problems in my search function. When some user type the sentences in search field I want to get the result from the keywords inside the sentence which user type before. For example I have database table like this:
ID | Keywords | Answer
-----------------------------------------------------------------------------
1 | price, room | The price room is $150 / night
2 | credit card | Yes, you could pay with credit card
3 | location | The Hotel location is in the Los Angeles
4 | how to, way to, book | You could pay with credit card or wire transfer
5 | room, size | The room size is 50sqm
And this is the examples of sentences which user input:
What is the room price ?
From that sentences the system will find the keywords inside the senteces in that case the keywords is room and price.
And from that keywords the systems will show the answer is The price room is $150 / night
Can I pay with credit card ?
From that sentences the system will find the keywords inside the sentences in that case the keywords is credit card.
And from that keywords the systems will show the answer is Yes, you could pay with credit card
What is the room size ?
From that sentences the system will find the keywords inside the sentences in that case the keywords is room and size.
And from that keywords the systems will show the answer is The room size is 50sqm
The example 1 and 3 has room in the sentences. I would also want to know that the keywords is room price and room size.
How could I find the keywords from the sentences which user already input ?
How to I get the answer from database with that keywords ?
From that examples I want to know how could I to do that with PHP and MySql ? Or maybe there is some way to build that ? Please anybody knows to do this could help me. Thanks before.

I would suggest not to store keywords separating with commas in single row, instead insert them in different rows. Because when you will try to search any text which is in keywords it will always check for credit card or price, room. It will not consider price and room as different words instead it will consider this as string.
For your question, try following code :
$que = 'What is the room price';
$keywords = str_replace(" ", ",", $que);
$sql = 'select answer from your_table where keywords IN (' . $keywords . ')';
OR you can try for FIND_IN_SET() to search comma separated keywords.
It may work.

My approach would be to use the concept of STOP WORDS remove all STOP WORDS from the user query.
Then only to search for ALL the KEY WORDS in the user query.
DATA entry needs to remove most of the users data to be robust. What if they intend to break your system by inserting CODE.
STOP WORDS include 'the' 'a' 'of'
The idea is to remove as much rubbish as you can and then to be very picky about other words.
Log the query data for inspection in case of failure.
log the ACCESS data that you think you are processing
and then set a timeout on the response time.
eg. if you know that the query should only ever take
X ms. Then anything that takes longer than that is suspect. It could have gotten past your protective layer. DO make sure you log the IP address and timestamp in the log files - preferably right at the start of the log entry.
Then write scripts for handling a SLICE.
A SLICE is a nice way to help the system administrators
who may have to send you a slice of the log files.
The slice can be complicated - from DAY (YYYYMMDDmm.s) to another DAY and they may have had an overnight compression system running - so your script needs to access normal log files and compressed log files. Sometimes the files are split up by system failures - ie. the system died for some reason.
Your SLICE info can be packaged up into an email etc. and sent to you for analysis.
Good luck.

Related

Fuzzy date match

I have a mysql db of clients and crawled a website retrieving all the reviews for the past few years. Now I am trying to match those reviews up with the clients so I can email them. The problem is that the review site allowed them to enter anything they wanted for the name, so in some cases I have full first name and last initial, and in some cases first initial and last full name. It also gives an approximate time it was posted such as "1 week ago", "6 months ago" and so on which we already have converted to an approximate date.
Now I need to try matching those up to the clients. Seems the best way would be to do a fuzzy search on the names, and then once I find all John B% I look for the one with a job completion date nearest the posting of the review naturally eliminating anything that was posted before jobs were completed.
I put together a small sample dataset where table1 is the clients, table2 is the review to match on here:
http://sqlfiddle.com/#!9/23928c/6/0
I was initially thinking of doing a date_diff, but then I need to sort by the lowest number. Before I tackle this on my own, I thought I would ask if anyone has any tricks they want to share.
I am using PHP / Laravel to query MySql
You can use DATEDIFF with absolute values:
ORDER BY ABS(DATEDIFF(`date`, $calculatedDate)) DESC
To find records that match your estimation closely, positive or negative.

Managing quotas on CATI software - workaround

I'm trying to figuriung out a way to manage quotas over a CATI system (written in PHP+SQL and XML)
let's say we have a population like this:
CITY | #MALE | #FEMALE | AGE CLUSTER (YOUNG) | AGE CLUSTER (OLD)
NY 200 250 350 100
LA 300 350 250 400
Then we have the db containing all the ppl to be interviewed:
(name, city, sex, age cluster, telphone)
this db will not be necessarely representative of the first table, we have to consider also wrong tel number and any other sort of situation that may force us to drop a record and pass forward.
So, how we can achieve a good quota management at the end of the campaign? What's the best approach? It would be great, also, to maintain quotas over the time: let's say my campign'll last 1 year, I would like to perform a checkpoint at the end of the first 2 month and discover that quotas are ok...
The queXS software (I am the author) implements quotas for telephone interviewing (it calls them row quotas). The code is available here.
Have a look at the admin/rowquota.php file and the functions/functions.operator.php file.
Basically what occurs is:
Setup:
You have a list of people to be interviewed (sample) as you describe
There should be 2 lists, split by area (LA, NY)
Each sample would have a quota of Males, Females, and Age cluster Young/Old
Running:
The system records the outcomes of contacts to each number
Where the outcome is "completed" the system finds all quotas that are fulfilled by that record and adds to the quota
Where the quota is reached - all records that match the query (e.g. Males in LA) will be excluded
Describing the code here would be a bit tedious as a lot of the code is specific to the database setup of the system, but if you require further explanation please let me know.

Regex to match from first uppercase to end of sentence of string to highlight array of words

I know the title is quite complicate to understand.
Basically i got a text lets say around 20000 chars.
When i perform a search i want to extract the sentence where any of the matched words are found and highlight them.
I got an array of the words to highlight called $words , and let call the main text $text.
So my code is the following:
foreach($words as $word):
$regex = '/[^.!?\n]*\b'.preg_quote($word,"/").'\b[^.!?\n]*/i';
preg_match_all($regex, $text, $matches);
count($matches[0]) > 3 ? $search_q= 3 : $search_q=count($matches[0]);
for ($i=0; $i < $search_q; $i++):
echo preg_replace('/\b('.preg_quote($word,"/").')\b/i','<span class="highlighted">$1</span>',$matches[0][$i]).'[..] ';
endfor;
endforeach;
Problem with this code is when 2 words belongs to the same sentence , then the sentence will be printed twice. I want to print it just once with both words highlighted but i dont got a clue on how to do that.
Thanks for the help guys
UPDATE: TEST SCENARIO
Lets supose that:
$text="A new holiday shopping tradition: Smartphones and social networks
Many consumers will take out their phones before their wallets this holiday season with even more visiting social media sites before tackling their gift lists.
More than one-quarter (27 percent) of smartphone owners plan to use their devices for holiday shopping to search for store locations (67 percent), compare prices (59 percent) and check product availability (46 percent). Additionally, 44 percent say they plan to use social media to seek discounts, read reviews and check family and friends’ gift lists.
“Consumers are using online and mobile platforms to make the most of their holiday budgets, and the survey indicates that they will do more than just compare prices,” said Paul. “Retailers that use mobile and online channels to show product availability, locations and pricing but add customized promotions and gift ideas may encourage shoppers to come in the door for a specific gift and take additional items to the register.”";
And the words are:
$words=array('social','media');
With my code i get this:
A new holiday shopping tradition: Smartphones and **social** networks[..]
Many consumers will take out their phones before their wallets this holiday season with even more visiting **social** media sites before tackling their gift lists[..]
Additionally, 44 percent say they plan to use **social** media to seek discounts, read reviews and check family and friends’ gift lists[..]
Many consumers will take out their phones before their wallets this holiday season with even more visiting social **media** sites before tackling their gift lists[..]
Additionally, 44 percent say they plan to use social **media** to seek discounts, read reviews and check family and friends’ gift lists[..]
Instead i want:
A new holiday shopping tradition: Smartphones and **social** networks[..]
Many consumers will take out their phones before their wallets this holiday season with even more visiting **social** **media** sites before tackling their gift lists[..]
Additionally, 44 percent say they plan to use **social** **media** to seek discounts, read reviews and check family and friends’ gift lists[..]
With fge code i get:
social[..]
social[..]
social[..]
media[..]
media[..]
I hope that with examples its easy to understand. Thanks a lot
Your head will probably hurt less if you split the text into an array of sentences and examine each sentence in turn. If the list of words isn't too long you could put the entire list into your regex. Something like:
/\b(\Qword1\E|\Qword2\E|\Qword3\E)\b/
First, I don't get why you use such a complicated regex: you do use word anchors, so why bother with the complemented character classes?
Second, this solution assumes that words do not contains special regex characters...
Here is what you can do:
$w = preg_quote($word, "/");
$fullword = '\b\Q' . $w . '\E\b';
$regex = '/' . $fullword . '(?!.*' . $fullword . ')/i';
Explanation: \Q means that all characters, until \E, should all be treated literally (which means you are safe if the word contains a dot). So, you match your word (it is anchored), and then you say that you should NOT match the word again (?!.*\b\Qwordhere\E\b).
This means that if a sentence contains the word several times, it will only match the last occurrence!
Finally, to highlight, use:
preg_replace('/(' . $fullword . ')/ig', '<span class="highlighted">$1</span>', $text);

How to query a MSSQL database using a concatenated field

Is there anybody that can give advice on solving this issue that I am having. I am running PHP 5.3 with MSSQL. Just to explain what happens and what I need to be able to do...
The user selects a specific run (row) from a table on the home page. The columns in the table are:
date
division
start mile/yard (in the format 565.1211 i.e. mile 565 and yard 1211) [this column is made up of a concatenation of two separate columns "mile" and "yard" from my database]
end mile/yard
start lat
start long
end lat
end long
total
report available (yes or no)
The user can select a row by clicking on a cell in the report column where report=yes. This data is posted onto the next page. The next page allows the user to change the start and end mile/yard data so that they can see a specific section of that run.
For example the user has selected on the home page to view data from start mile/yard 565.1211 to end mile/yard 593.4321. The user can change the section that they want to see by typing into two text boxes. One box is a "start mile/yard" and the second box is "end mile/yard". So the user may type into the "start mile/yard" text box 570.2345 and "end mile/yard" 580.6543. What I want to happen is to query data from where the user has input...
SELECT id, CAST(mile AS varchar(6)) + '.' + CAST(yard AS varchar(6)) AS Mile, gps_lat, gps_long, rotten, split, broken, quality
FROM table
WHERE mile/yard BETWEEN start mile ??? AND end mile ???
ORDER BY Mile
My problem is, how would I go about querying this information from my database when the user types in a combination of mile/yard (580.6543)? I assume that I will have to split the data into mile and yard again (How would I do this)... also How do retrieve the information? It would be simple to do just search by mile (e.g. WHERE mile BETWEEN 570 AND 580), but how do I search by yard and mile at the same time?
Unfortunately I will not be able to change the database structure as this is what I have to work with... If anyone can think of a better way of doing what I am doing... I am all ears!!
I understand this is a long question, so anything that is unclear, please let me know!
Cheers,
Neil
You shouldn't cast those mile/yard numbers to varchars. You lose the ability to compare them AS NUMBERS, which is what they were to start with.
Convert the user's mile/yard values to a number, then compare against those numbers in the database.
You're trying to force apples and grapes to be oranges, and comparing them to pears and plums... just make everything a pineapple, so to speak.
Besides, by doing the cast + concatenation, you lose any chance of ever possibly putting indexes on those fields. If your table grows "large", you'll kill performance by forcing full-table scans for every query.
As Marc B said I was trying to force apples and grapes to be oranges, and comparing them to pears and plums... just make everything a pineapple, so to speak. I have changed the structure of my script now.
Cheers,
Neil

Match high and low numbers based on number in-between

Greetings,
I am planning a PHP and MySQL based app that will help in locating the correct page for a particular address in a map book. The map book uses a high and low address range to place sections of a street or highway in the book, each section with its own page (or sub-section of a page).
The user will enter the street address, house number separate from street name, and the desired result is to print details including the map page. What would be the best way to determine the corresponding high and low range based on the house number given in MySQL?
The table will be similar to this:
id, street_name, low_address, high_address, map_page
An example entry would be:
1, Elm Street, 1, 100, 30
Thanks!
Hate it when I end up posting my own solution, but MySQL's BETWEEN function fit the bill for this. Here is what i ended up with:
MySQL Prepared Statement
SELECT * FROM dispatch WHERE streetName = ? AND ? BETWEEN lowAddress AND highAddress LIMIT 1
PHP
$getInfo->execute(array($streetName, $houseNum));

Categories