How to detect specific keyword from text in a large DB? - php

I have had at this issue for a day now. From PHP + MYSQL angle. I but because of the amount of data, most all scripts, that I've tried have timed out.
So we have two tables:
People with the row name - about 4000 unique entries
Texts with the row message - about 24 000 entries
Messages have their own format, that names get put into [] tags, like so: [Jenna].
Sadly, not all entries from Texts are correctly formatted. However I do have alot of names in People. So I want to parse trough the Texts->message's and see if any names from People is matched. Of course I do not want to match [Somename], since its already tagged.
Ultimately, the goal is to then do an UPDATE query, so the freshly matched message would be then formatted correctly with [] tag. I don't know if, this could be achieved inside the same single SQL query?!
This is a regex example on, what I want to detect and explanation on what is going on inside preg_match_all(): https://regex101.com/r/cQ6gK5/1
This is what I tried, as advanced MySQL is not my strongest side:
<?
function GetPeople () {
global $DB;
$results = $DB->query("SELECT `name` FROM People");
while ($result = $DB->fetch_array($results)) {
$return[] = $result['name'];
}
return implode('|', $return);
}
$people = GetPeople();
echo '<table><tr><th>Message raw</th><th>Matches</th>';
$results = $DB->query("SELECT `message` FROM Texts WHERE `message` NOT REGEXP '\[(.+?)\]'");
while ($result = $DB->fetch_array($results)) {
if (preg_match_all('/(?:(?:^|[\s])(' . $people . ')[\s|\n])/i', $result['message'], $matches)) {
echo '<tr><td>' . $result['message'] . '</td><td><pre>'; print_r($matches); echo '</pre></td></tr>';
}
}
echo '</table>';
I have indexed out the name and message in MySQL, because I assume, that makes it easier to search. And I imagine, that all this could be done without the php matching and only with SQL query alone. Sadly, I could never get it so optimized as it should be on my own. Any help is highly appreciated, thank you.

You could try something like this:
SELECT texts.message
FROM texts
JOIN people on texts.message LIKE CONCAT('%', people.name, '%');
This will join the two tables and then perform a like comparison based on the 'names' column in the 'people' table.

Related

Efficient 2 dimensional array search in MySql

I am trying to design an application and part of it is to show users new articles in different categories after the last visit of the user to the webapp. To this I use MySql and have a table that keeps track of last visits and I can query the table to get a php array like below:
$array =[[user1,category1, datetime1],[user1,category2, datetime2],[user1,category3,datetime3]];
Where user is the user id and datetime is the visited datetime and category is the article category.
Having the setup above, I am trying to get new articles from the article table where the publish date is after user last visited to categories.
I can achieve this by multiple OR in a query like below, however it is not really a good and nice looking query, and probable not scalable. Is there any other way of doing this which is simpler and faster?
$multiwhere=[];
foreach($array as $a){
$multiwhere[]="select article_id from articles where category=".$a[1]." and publish_date>".$a[2];
}
And the final query would be like this:
"Select * from articles where article_id in (".implode(" or ".$multiwhere.")";
I deeply appreciate any suggestion to improve the query above.
Your query is almost correct, apart from the fact that you first retrieve all the article_id you want, and then use them to query for those articles. You can do that in one step, like so:
$multiwhere = [];
foreach ($array as $a) {
$multiwhere[] = "(category = " . $a[1] . " AND publish_date >= " . $a[2] .")";
}
$query = "SELECT * FROM articles";
if (count($multiwhere) > 0) {
$query = " WHERE " . implode(" OR ", $multiwhere);
}
One query will do.
I kept the way you use the $array, but it looks weird to me. Especially around publish_date. I cannot change that because I don't know the type of the field. And, of course, $array is quite a bad name. It tells you what the type of the variable is, not what it contains, as it should. A better name would be: $lastCategoryVisits, or something like that. Your loop should look something like this:
foreach ($lastCategoryVisits as $lastCategoryVisit) {
$category = $lastCategoryVisit["category"];
$lastVisit = $lastCategoryVisit["lastVisit"];
$QueryConditions[] = "(category = '$category' AND publish_date >= '$lastVisit')";
}
Don't be afraid to write out what your code actually does. It might be a bit longer, but now you can see what is going on. This will not slow down the execution of your code at all.
Finally, it would be better to always use prepared statements to prevent the possibility of SQL-injection. If you get into the habit of always doing this you don't have to use excuses like: "It is not important in this project.", "I'll to it later when the code works." or "The data for this query doesn't come from an user.".

php and mysql live search

I'm currently working on a live search that displays results directly from a mysql db.
The code works, but not really as i want it.
Let's start with an example so that it is easier to understand:
My database has 5 columns:
id, link, description, try, keywords
The script that runs the ajax request on key up is the following:
$("#searchid").keyup(function () {
var searchid = encodeURIComponent($.trim($(this).val()));
var dataString = 'search=' + searchid;
if (searchid != '') {
$.ajax({
type: "POST",
url: "results.php",
data: dataString,
cache: false,
success: function (html) {
$("#result").html(html).show();
}
});
}
return false;
});
});
on the results.php file looks like this:
if ($db->connect_errno > 0) {
die('Unable to connect to database [' . $db->connect_error . ']');
}
if ($_REQUEST) {
$q = $_REQUEST['search'];
$sql_res = "select link, description, resources, keyword from _db where description like '%$q%' or keyword like '%$q%'";
$result = mysqli_query($db, $sql_res) or die(mysqli_error($db));
if (mysqli_num_rows($result) == 0) {
$display = '<div id="explainMessage" class="explainMessage">Sorry, no results found</div>';
echo $display;
} else {
while ($row = $result->fetch_assoc()) {
$link = $row['link'];
$description = $row['description'];
$keyword = $row['keyword'];
$b_description = '<strong>' . $q . '</strong>';
$b_keyword = '<strong>' . $q . '</strong>';
$final_description = str_ireplace($q, $b_description, $description);
$final_keyword = str_ireplace($q, $b_keyword, $keyword);
$display = '<div class="results" id="dbResults">
<div>
<div class="center"><span class="">Description :</span><span class="displayResult">' . $final_description . '</span></div>
<div class="right"><span class="">Keyword :</span><span class="displayResult">' . $final_keyword . '</span></div>
</div>
<hr>
</div>
</div>';
echo $display;
}
}
}
now, let's say that i have this row in my DB:
id = 1
link = google.com
description = it's google
totry = 0
keywords: google, test, search
if i type in the search bar:
google, test
i have the right result, but if i type:
test, google
i have no results, as obviously the order is wrong.
So basically, what o'd like to achieve is something a bit more like "tags", so that i can search for the right keywords without having to use the right order.
Can i do it with my current code (if yes, how?) or i need to change something?
thanks in advance for any suggestion.
PS: I know this is not the best way to read from a DB as it has some security issues, i'm going to change it later as this is an old script that i wrote ages ago, i'm more interested in have this to work properly, and i'm going to change method after.
Normalize your schema
The rules of relational database are very simple (at least the first three).
keywords: google, test, search
...breaks the second rule. Each keyword should be in its own row in a related table. Then you can simply write your query as....
SELECT link, description, resources, keyword
FROM _db
INNER JOIN keywords
ON _db.id=keywords.db_id
WHERE keyword.value IN (" . atomize($q) . ")
(where atomize explodes the query string, applies mysqli_escape_paramter() to each entry to avoid breaking your code, encloses each term in single quotes and concatenates the result).
Alternatively you could use MySQL's full text indexing which does this for you transparently.
Although hurricane makes some good points in his/her answer, they do not mention that none of the solutions proposed there does not scale to handle large volumes of data with any efficiency (decomposing the field into a new table/using full text indexing does).
Untested code but modify according to your needs,
$q = $_REQUEST['search'];
$q_comma = explode(",", $q);
$where_in_set = '';
$count = count($q_comma);
foreach( $q_comma as $q)
{
$counter++;
if($counter == $count) {
$where_in_set .= "FIND_IN_SET('$q','keywords')";
}else {
$where_in_set .= "FIND_IN_SET('$q','keywords') OR ";
}
}
$sql_res = "select link, description, resources, keyword from _db where $where_in_set or description like '%$q%'";
There are 2 solutions I can think of:
Use fulltext index and search.
You can split the search string into words in php for example using explode() and serach for the words not in a single serach criteria, but in separate ones. This latter one can be very resource intensive, since you are seraching in multiple fields.
LIKE '%google, test%' will match id=1 but not '%google,test%' (no space between coma) nor '%google test%' (space delimiter) nor '%test, google%'. Put each keyword as separate table or you can split input keywords into several single keyword and use OR operator such as LIKE 'google%' OR LIKE 'test%'
Not an ideal solution, but instead of treating your search datastring as one element, you can have php treat it as an array of keywords separated by a comma (by using explode). You'd then build a query depending on how many keywords were sent.
For example, using "google, test" your query would be:
$sql_res = "select link, description, resources, keyword from _db where (description like '%$q1%' or keyword like '%$q1%') AND (description like '%$q2%' or keyword like '%$q2%')";
Where $q1 and $q2 are "google" and "test".
First of all as you say it is not a good way to do it. I think you are writing a autocompleter.
Seperators for words
"google, test" or "test, google" is a attached words. First you need to define a seperator for users. Usually it is a whitespace ' '.
When you define it you need to split words.
$words = explode(" ",$q);
// now you get two words "google," and "test"
Then you need to create a sql which gives you multiple search chance.
There are a lot example in MySQL LIKE IN()?
Now you get your result.
Text similarity
Select all result from db and in a while search a text from another text. It gives you a dobule point for similarity. Best result is your result.
Php Similarity Example
Important Info
If you ask my opinion don't use it like that bcs it is very expensive. Use autocompleters on html side. Here is an example

MySQL: multiple search/select queries at the same time?

I have a question on how to go about the next phase of a project I am working on.
Phase I:
create a php script that scraped directory for all .txt file..
Open/parse each line, explode into array...
Loop through array picking out pieces of data that were needed and INSERTING everything into the database (120+ .txt files & 100k records inserted)..
this leads me to my next step,
Phase II:
I need to take a 'list' of several 10's of thousand of numbers..
loop through each one, using that piece of data (number) as the search term to QUERY the database.. if a match is found I need to grab a piece of data in a different column of the same record/row..
General thoughts/steps I plan to take
scrape directory to find 'source' text file.
open/parse 'source file'.... line by line...
explode each line by its delimiting character.. and grab the 'target search number'
dump each number into a 'master list' array...
loop through my 'master list' array.. using each number in my search (SELECT) statement..
if a match is found, grab a piece of data in another column in the matching/returned row (record)...
output this data.. either to screen or .txt file (havent decided on that step yet,..most likely text file through each returned number on a new line)
Specifics:
I am not sure how to go about doing a 'multiple' search/select statement like this?
How can I do multiple SELECT statements each with a unique search term? and also collect the returned column data?
is the DB fast enough to return the matching value/data in a loop like this? Do I need to wait/pause/delay somehow for the return data before iterating through the loop again?
thanks!
current function I am using/trying:
this is where I am currently:
$harNumArray2 = implode(',', $harNumArray);
//$harNumArray2 = '"' . implode('","', $harNumArray) . '"';
$query = "SELECT guar_nu FROM placements WHERE har_id IN ($harNumArray2)";
echo $query;
$match = mysql_query($query);
//$match = mysql_query('"' . $query . '"');
$results = $match;
echo("<BR><BR>");
print_r($results);
I get these outputs respectively:
Array ( [0] => sample_source.txt )
Total FILES TO GRAB HAR ID's FROM: 1
TOAL HARS FOUND IN ALL FILES: 5
SELECT guar_nu FROM placements WHERE har_id IN ("108383442","106620416","109570835","109700427","100022236")
&
Array ( [0] => sample_source.txt )
Total FILES TO GRAB HAR ID's FROM: 1
TOAL HARS FOUND IN ALL FILES: 5
SELECT guar_nu FROM placements WHERE har_id IN (108383442,106620416,109570835,109700427,100022236)
Where do I stick this to actually execute it now?
thanks!
update:
this code seems to be working 'ok'.. but I dont understand on how to handle the retirned data correctly.. I seem to only be outputting (printing) the last variable/rows data..instead of the entire list..
$harNumArray2 = implode(',', $harNumArray);
//$harNumArray2 = '"' . implode('","', $harNumArray) . '"';
//$query = "'SELECT guar_num FROM placements WHERE har_id IN ($harNumArray2)'";
$result = mysql_query("SELECT har_id, guar_num FROM placements WHERE har_id IN (" . $harNumArray2 . ")")
//$result = mysql_query("SELECT har_id, guar_num FROM placements WHERE har_id IN (0108383442,0106620416)")
or die(mysql_error());
// store the record of the "example" table into $row
$row = mysql_fetch_array($result);
$numRows = mysql_num_rows($result);
/*
while($row = #mysql_fetch_assoc($result) ){
// do something
echo("something <BR>");
}
*/
// Print out the contents of the entry
echo("TOTAL ROWS RETURNED : " . $numRows . "<BR>");
echo "HAR ID: ".$row['har_id'];
echo " GUAR ID: ".$row['guar_num'];
How do I handle this returned data properly?
thanks!
I don't know if this answers your question but I think you're asking about sub-queries. They're pretty straightforward and just look something like this
SELECT * FROM tbl1 WHERE id = (SELECT num FROM tbl2 WHERE id = 1);
That will only work if there is one unique value to that second subquery. If it returns multiple rows it will return a parse error. If you have to select multiple rows research JOIN statements. This can get you started
http://www.w3schools.com/sql/sql_join.asp
I am not sure how to go about doing a 'multiple' search/select statement like this?
With regards to a multiple select, (and I'll assume that you're using MySQL) you can perform that simply with the "IN" keyword:
for example:
SELECT *
FROM YOUR_TABLE
WHERE COLUMN_NAME IN (LIST, OF, SEARCH, VALUES, SEPARATED, BY COMMAS)
EDIT: following your updated code in the question.
just a point before we go on... you should try to avoid the mysql_ functions in PHP for new code, as they are about to be deprecated. Think about using the generic PHP DB handler PDO or the newer mysqli_ functions. More help on choosing the "right" API for you is here.
How do I handle this returned data properly?
For handling more than one row of data (which you are), you should use a loop. Something like the following should do it (and my example will use the mysqli_ functions - which are probably a little more similar to the API you've been using):
$mysqli = mysqli_connect("localhost", "user", "pass");
mysqli_select_db($mysqli, "YOUR_DB");
// make a comma separated list of the $ids.
$ids = join(", ", $id_list);
// note: you need to pass the db connection to many of these methods with the mysqli_ API
$results = mysqli_query($mysqli, "SELECT har_id, guar_num FROM placements WHERE har_id IN ($ids)");
$num_rows = mysqli_num_rows($results);
while ($row = mysqli_fetch_assoc($results)) {
echo "HAR_ID: ". $row["har_id"]. "\tGUAR_NUM: " . $row["guar_num"] . "\n";
}
Please be aware that this is very basic (and untested!) code, just to show the bare minimum of the steps. :)

Specialized Search Query Refinement for Auto-Complete function

I am doing a query for an autocomplete function on a mysql table that has many instances of similar titles, generally things like different years, such as '2010 Chevrolet Lumina' or 'Chevrolet Lumina 2009', etc.
The query I am currently using is:
$result = mysql_query("SELECT * FROM products WHERE MATCH (name) AGAINST ('$mystring') LIMIT 10", $db);
The $mystring variable gets built as folows:
$queryString = addslashes($_REQUEST['queryString']);
if(strlen($queryString) > 0) {
$array = explode(' ', $queryString);
foreach($array as $var){
$ctr++;
if($ctr == '1'){
$mystring = '"' . $var . '"';
}
else {
$mystring .= ' "' . $var . '"';
}
}
}
What I need to be able to do is somehow group things so only one version of a very similar actually shows in the autosuggest dropdown, leaving room for other products with chevrolet in them as well. Currently it is showing all 10 spots filled with the same product with different years, options, etc.
This one should give some of you brainiacs a good workout :)
I think the best way to do this would be to create a new field on the products table, something like classification. All the models would be entered with the same classification (e.g. "Chevrolet"). You could then still MATCH AGAINST name, but GROUP BY classification. Assuming you are using MySQL you can cheat a little and get away with selecting values and matching against values that you are not grouping by. Technically in SQL this gives undefined results and many SQL engines will not even let you try to do this, but MySQL lets you do it -- and it returns a more-or-less random sample that matches. So, for example, if you did the above query, grouped by classification, only one model (picked pretty much at random) will show up in the auto-completer.

Whats the best way to retrieve information from Sphinx (in PHP)?

I'm new to sphinx, and I'm seting it up on a new website.
It's working fine, and when i search with the search in the console, everything work.
Using the PHP api and the searched, gives me the same results as well. But it gives me only ids and weights for the rows found. Is there some way to bring some text fields togheter with the 'matches' hash, for example?
If there is no way to do this, does anyone have a good idea about how to retrieve the records from the database (sql) in the sphinx weight sort order (searching all them at the same time)?
Yeah, sphinx doesn't bring the results.
But I found out a simple way to reorder the query using the IN() clause, to bring all together.
Quering something
SELECT * FROM table WHERE id IN(id_list... )
just indexing the result, with their id in the table:
while ($row = mysql_fetch_objects)
$result[$row->id] = $row;
and having the matching results from sphinx, its very easy to reorder:
$ordered_result = array();
foreach ($sphinxs_results['matches'] as $id => $content)
$ordered_result[] = $result1[$id];
this shall work, if your $sphinxs_results are in the correct order.
its almost pat's answer, but with less one loop. Can make some diference in big results, I guess.
You can use a mysql FIELD() function call in your ORDER BY to ensure everything is in the order sphinx specified.
$idlist = array();
foreach ( $sphinx_result["matches"] as $id => $idinfo ) {
$idlist[] = "$id";
}
$ids = implode(", ", $idlist);
SELECT * FROM table WHERE id IN ($ids) ORDER BY FIELD(id, $ids)
unfortually sphinx didn't returns matched fields, only its ids (sphinx index didn't contains data - only hash from data).
Post about this issue you can find on the sphinxsearch.com forum.
As Alex says, Sphinx doesn't return that information. You will have to use the IDs to query the database yourself - just loop through each ID, get your relevant data out, keeping the results in weighting order. To do it all in one query, you could try something like the following (psuedo-code - PHP ain't my language of choice):
results = db.query("SELECT * FROM table WHERE id IN (%s)", matches.join(", "));
ordered_results = [];
for (match in matches) {
for (result in results) {
if (result["id"] == match) {
ordered_results << result;
}
}
}
return ordered_results;

Categories