Search a string for millions of terms with php

Search a string for millions of terms with php - php

I have a string that I need to search using php to see if any of 2-3million terms are present. The strings could be 1000 words long and the search terms may be up to 5 words long.
I have tried strpos and strstr but execution time is more than 60 seconds.
Can anyone suggest an alternative?
So far I have the following:
$query = "SELECT City FROM cities";
$result = mysql_query($query);
if ($row = mysql_fetch_array($result)) {
do {
$city = " " . $row['City'] . " ";
if(strpos($string, $city)!== False) {
echo $city . "<br />\n";
}
} while ($row = mysql_fetch_array($result));
}

Take the load from PHP and give to Mysql.
Here instead of doing the search with PHP inside a loop, you can use mysql LIKE function to search for a string.
eg: SELECT City FROM cities LIKE %search text%
if you have more search text, either you can use OR switch or JOIN sql queries to find those.
This will be quicker and you don't take 60 seconds for execution.
Good luck
-- Sajith

I would suggest following:
Using associative arrays create two lookup tables: words and terms, so each entry should represent single word/term
Now you can search through lookup tables with built in complexity O(1)

Related

How to detect specific keyword from text in a large DB?

I have had at this issue for a day now. From PHP + MYSQL angle. I but because of the amount of data, most all scripts, that I've tried have timed out.
So we have two tables:
People with the row name - about 4000 unique entries
Texts with the row message - about 24 000 entries
Messages have their own format, that names get put into [] tags, like so: [Jenna].
Sadly, not all entries from Texts are correctly formatted. However I do have alot of names in People. So I want to parse trough the Texts->message's and see if any names from People is matched. Of course I do not want to match [Somename], since its already tagged.
Ultimately, the goal is to then do an UPDATE query, so the freshly matched message would be then formatted correctly with [] tag. I don't know if, this could be achieved inside the same single SQL query?!
This is a regex example on, what I want to detect and explanation on what is going on inside preg_match_all(): https://regex101.com/r/cQ6gK5/1
This is what I tried, as advanced MySQL is not my strongest side:
<?
function GetPeople () {
global $DB;
$results = $DB->query("SELECT `name` FROM People");
while ($result = $DB->fetch_array($results)) {
$return[] = $result['name'];
}
return implode('|', $return);
}
$people = GetPeople();
echo '<table><tr><th>Message raw</th><th>Matches</th>';
$results = $DB->query("SELECT `message` FROM Texts WHERE `message` NOT REGEXP '\[(.+?)\]'");
while ($result = $DB->fetch_array($results)) {
if (preg_match_all('/(?:(?:^|[\s])(' . $people . ')[\s|\n])/i', $result['message'], $matches)) {
echo '<tr><td>' . $result['message'] . '</td><td><pre>'; print_r($matches); echo '</pre></td></tr>';
}
}
echo '</table>';
I have indexed out the name and message in MySQL, because I assume, that makes it easier to search. And I imagine, that all this could be done without the php matching and only with SQL query alone. Sadly, I could never get it so optimized as it should be on my own. Any help is highly appreciated, thank you.

You could try something like this:
SELECT texts.message
FROM texts
JOIN people on texts.message LIKE CONCAT('%', people.name, '%');
This will join the two tables and then perform a like comparison based on the 'names' column in the 'people' table.

searching an array of strings in mysql

i have an array of strings and i need to search whether the strings of the array exits in database or not. i am using following query:
foreach($array as $s)
{
if(preg_match('/^(\bpho)\w+\b$/', $s))
{
$query="select * from dictionary where words REGEXP '^".$s."$'";
$result=$db->query($query) or die($db->error);
if($result)
{
$count += $result->num_rows;
}
}
}
but this query taking long time to execute. PLease provide a solution to reduce the searching time

I don't think your problem here is about your code. I think you should optimize your database.
I'm not very good at it but I think you could add indexes in your database to speed up the research

Combine all the search strings into a single regular expression using alternation.
$searches = array();
foreach ($array as $s) {
if (preg_match('/^(\bpho)\w+\b$/', $s)) {
$searches[] = "($s)";
}
}
$regexp = '^(' . implode('|', $searches) . ')$';
$query="select 1 from dictionary where words REGEXP '$regexp'";
$result=$db->query($query) or die($db->error);
$count = $result->num_rows;
If $array doesn't contain regular expressions, you don't need to use the SQL REGEXP operator. You can use IN:
$searches = array();
foreach ($array as $s) {
if (preg_match('/^(\bpho)\w+\b$/', $s)) {
$searches[] = "'$s'";
}
}
$in_list = implode(',', $searches);
$query="select 1 from dictionary where words IN ($in_list)";
$result=$db->query($query) or die($db->error);
$count = $result->num_rows;

Searching the whole database is a large job, I think a better way is you can cache some parts of the database, and than search in the cache. Redis is very good.

Modify your query so that is doesn't select all table columns - that is a waste of resources. Instead, just let the database count the number of rows containing the search query and return back only a single answer (matches):
$query = "SELECT COUNT(id) AS matches FROM dictionary WHERE words REGEXP '^".$s."$'";
How are you indexing your database? If your words column is not indexed properly, then your regexp would take a long time. Examine your database structure and potentially add indexing to the words column.
P.S. And don't forget to fetch the matches column instead of using num_rows

Is it faster to use php array count() than SQL row count?

In a nutshell: Is it faster to use PHPs array count() on a number of arrays vs. using SQL row count multiple times?
I'm having an issue with a slow query that I attribute to the COUNT(*) function. I will explain what I am doing and then what I'm anticipating might be significantly faster.
Currently I'm looping a function that does a count of about 20,000 rows each iteration. It returns the number of rows for each month in a year:
// CREATE MONTHLY LINKS
public function monthly_links() {
$months = array('','January','February','March','April','May','June','July','August', 'September','October','November','December');
for ($i=1 ; $i <= 12 ; $i++) {
$array[] = "<a href='monthly.php?month=" . $i . "&status=3'>" . $months[$i] . " " . $this->chosen_year() . " (" . $this->number_positions_month_year($i, $this->chosen_year()) . ")</a>";
}
return $array;
}
// SHOW NUMBER OF POSITIONS FOR EACH MONTH/YEAR
public function number_positions_month_year($month, $year) {
$sql = $this->client_positions_sql() . " AND MONTH(`exp_date`) = " . $month . " AND YEAR(`exp_date`) = " . $year;
$res = $this->conn->query($sql);
$count = $res->num_rows;
return $count;
}
The code is not that important in the example above because essentially what I am asking is: Is it faster to do the following...?
Query the table once while dumping each months corresponding ids to an array (there will be 12 arrays)
Using PHP count() on each array to get the number of entries for each month

You can use SQL's group by function to group by month.
SELECT COUNT(*), MONTH(exp_date)
FROM theTableYouAreUsing
GROUP BY MONTH(exp_date)
Then in php, in the array that's returned you get the count for the month you need.
Speed wise, this is a lot quicker than a separate query for each month.

In short: Premature optimization is the root of all evil :) So usually in the end it doesn't really matter. However, keep in mind that depending on when and where you need the number of rows how you fetch and handle the result you don't have the full data in a single array available, so you don't have an array to call count() on. Just to use count() seems not to be a valid reason to create such an array, because it unnecessarily consumes memory.

I am going to have to agree with KingCrunch on this. What you really need to look at is that type of application you are having if this is low voulme of users or something like that then doing it in the database will be faster now if you have lots of traffic blah blah blah and to avoid the db getting overloaded by soemthing php can do then php will be faster when you get to something like that at scale. Something to alsways keep in mind is that if you send the result set over to php it is going to have to recive the data over the network and then count it meaning more data and network latancy, but again that is assuming this is with a remote db. But try not to over optimize.

PHP Search Using Multiple Words

I am trying to create a search function where a user can input two words into a text field and it will split the words and construct a MySQL query.
This is what I have so far.
$search = mysql_real_escape_string( $_POST['text_field']);
$search = explode(" ", $search);
foreach($search as $word)
{
$where = "";
$where .= "product_code LIKE '%". $word ."%'";
$where .= "OR description LIKE '%". $word ."%'";
$query = "SELECT * FROM customers WHERE $where";
$result = mysql_query($query) or die();
if(mysql_num_rows($result))
{
while($row = mysql_fetch_assoc($result))
{
$customer['value'] = $row['id'];
$customer['label'] = "{$row['id']}, {$row['name']} {$row['age']}";
$matches[] = $customer;
}
}
else
{
$customer['value'] = "";
$customer['label'] = "No matches found.";
$matches[] = $customer;
}
}
$matches = array_slice($matches, 0, 5); //return only 5 results
It constructs and runs the query, but returns funny results.
Any help would be appreciated.

MySQL has something called LIMIT, so you last row would be needless.
Use Full-Text-Search for this: http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html - It's faster and more elegant

If your database is on MyISAM table format you could do a Fulltext search on the columns you are interested as Sn0opy mentioned already
Personally I believe that when it comes to mySQL if you actually want to create a great search engine use Sphinx (http://sphinxsearch.com/) or Solr (http://lucene.apache.org/solr/)
There may be a learning curve on both of them, but the results are professional.

Any chance of anything more specific than "funny results"? Off the cuff there are several possibilities but it really depends upon the results that are being returned. My PHP is a bit rusty so I will apologize up front if my brain throws in some java rules instead, but at first blush...
Name the array something other than $search. It probably isn't the problem, but it looks odd to have the array created by explode() carry the name of the string being exploded. Try something like $searched = explode(" ", $search); and then use $searched in the subsequent foreach() loop.
What if the user only puts in one search term? If there is no space in $text_field then explode will return an empty array, which should thoroughly jack up your query. You should at least verify that there is a space in $text_field before exploding $search. Likewise, what if the user enters two search terms, but one of the terms is two words separated by a space? Again you are going to get "funny results" because you will get results that you don't want along with duplicated results as the query extends itself to both of the words in a term individually.
Without knowing more of what you mean by "funny results" it is really difficult to trouble shoot this one.

Specialized Search Query Refinement for Auto-Complete function

I am doing a query for an autocomplete function on a mysql table that has many instances of similar titles, generally things like different years, such as '2010 Chevrolet Lumina' or 'Chevrolet Lumina 2009', etc.
The query I am currently using is:
$result = mysql_query("SELECT * FROM products WHERE MATCH (name) AGAINST ('$mystring') LIMIT 10", $db);
The $mystring variable gets built as folows:
$queryString = addslashes($_REQUEST['queryString']);
if(strlen($queryString) > 0) {
$array = explode(' ', $queryString);
foreach($array as $var){
$ctr++;
if($ctr == '1'){
$mystring = '"' . $var . '"';
}
else {
$mystring .= ' "' . $var . '"';
}
}
}
What I need to be able to do is somehow group things so only one version of a very similar actually shows in the autosuggest dropdown, leaving room for other products with chevrolet in them as well. Currently it is showing all 10 spots filled with the same product with different years, options, etc.
This one should give some of you brainiacs a good workout :)

I think the best way to do this would be to create a new field on the products table, something like classification. All the models would be entered with the same classification (e.g. "Chevrolet"). You could then still MATCH AGAINST name, but GROUP BY classification. Assuming you are using MySQL you can cheat a little and get away with selecting values and matching against values that you are not grouping by. Technically in SQL this gives undefined results and many SQL engines will not even let you try to do this, but MySQL lets you do it -- and it returns a more-or-less random sample that matches. So, for example, if you did the above query, grouped by classification, only one model (picked pretty much at random) will show up in the auto-completer.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Search a string for millions of terms with php - php

I would suggest following: Using associative arrays create two lookup tables: words and terms, so each entry should represent single word/term Now you can search through lookup tables with built in complexity O(1)

Related

How to detect specific keyword from text in a large DB?

searching an array of strings in mysql

Is it faster to use php array count() than SQL row count?

PHP Search Using Multiple Words

Specialized Search Query Refinement for Auto-Complete function

Categories

Resources