SQL LIKE Querying PHP Serialized data with Drupal 7 - php

Using Drupal 7, and I'm trying to get results from the database using the LIKE command but it doesn't recognize my wildcards. I'm not sure if this is even a Drupal issue, or if I'm doing something wrong. Anyways here's an example of the data I'm trying to match, along with my patters
Data to Match
a:2:{i:1;s:2:"17";i:2;s:1:"3";}
My like Queries
$pattern1 = 'a:2:{i:1;s:2:"17";i:2;s:1:"%";}'//works
$pattern2 = 'a:2:{i:1;s:1:"%";i:2;s:1:"3";}'//fails
$result = db_query(
"
SELECT pa.nid, pa.model, pa.combination
FROM {$Product_Adjustments} pa
WHERE pa.combination LIKE :pattern
",
array(
':pattern' => $pattern1
)
);
Additionally, I've tried the '_' wildcard, but that doesn't bring anything up either

Are you sure the pattern is correct? Notice pattern 1, the first string is 2 long, and in pattern 2 you're looking for one that's only 1 long. Are you sure that's right? Are the lengths of the individual pieces of that serialized data predictable enough to even query this way? It seems unlikely, and you'll probably have to store some normalized data instead.

Related

PHP - Optimising preg_match of thousands of patterns

So I wrote a script to extract data from raw genome files, heres what the raw genome file looks like:
# rsid chromosome position genotype
rs4477212 1 82154 AA
rs3094315 1 752566 AG
rs3131972 1 752721 AG
rs12124819 1 776546 AA
rs11240777 1 798959 AG
rs6681049 1 800007 CC
rs4970383 1 838555 AC
rs4475691 1 846808 CT
rs7537756 1 854250 AG
rs13302982 1 861808 GG
rs1110052 1 873558 TT
rs2272756 1 882033 GG
rs3748597 1 888659 CT
rs13303106 1 891945 AA
rs28415373 1 893981 CC
rs13303010 1 894573 GG
rs6696281 1 903104 CT
rs28391282 1 904165 GG
rs2340592 1 910935 GG
The raw text file has hundreds of thousands of these rows, but I only need specific ones, I need about 10,000 of them. I have a list of rsids. I just need the genotype from each line. So I loop through the rsid list and use preg_match to find the line I need:
$rawData = file_get_contents('genome_file.txt');
$rsids = $this->get_snps();
while ($row = $rsids->fetch_assoc()) {
$searchPattern = "~rs{$row['rsid']}\t(.*?)\t(.*?)\t(.*?)\n~i";
if (preg_match($searchPattern,$rawData,$matchedGene)) {
$genotype = $matchedGene[3]);
// Do something with genotype
}
}
NOTE: I stripped out a lot of code to just show the regexp extraction I'm doing. I'm also inserting each row into a database as I go along. Heres the code with the database work included:
$rawData = file_get_contents('genome_file.txt');
$rsids = $this->get_snps();
$query = "INSERT INTO wp_genomics_results (file_id,snp_id,genotype,reputation,zygosity) VALUES (?,?,?,?,?)";
$stmt = $ngdb->prepare($query);
$stmt->bind_param("iissi", $file_id,$snp_id,$genotype,$reputation,$zygosity);
$ngdb->query("START TRANSACTION");
while ($row = $rsids->fetch_assoc()) {
$searchPattern = "~rs{$row['rsid']}\t(.*?)\t(.*?)\t(.*?)\n~i";
if (preg_match($searchPattern,$rawData,$matchedGene)) {
$genotype = $matchedGene[3]);
$stmt->execute();
$insert++;
}
}
$stmt->close();
$ngdb->query("COMMIT");
$snps->free();
$ngdb->close();
}
So unfortunately my script runs very slowly. Running 50 iterations takes 17 seconds. So you can imagine how long running 18,000 iterations is gonna take. I'm looking into ways to optimise this.
Is there a faster way to extract the data I need from this huge text file? What if I explode it into an array of lines, and use preg_grep(), would that be any faster?
Something I tried is combining all 18,000 rsids into a single expression (i.e. (rs123|rs124|rs125) like this:
$rsids = get_rsids();
$rsid_group = implode('|',$rsids);
$pattern = "~({$rsid_group })\t(.*?)\t(.*?)\t(.*?)\n~i";
preg_match($pattern,$rawData,$matches);
But unfortunately it gave me some error message about exceeding the PCRE expression limit. The needle was way too big. Another thing I tried is adding the S modifier to the expression. I read that this analyses the pattern in order to increase performance. It didn't speed things up at all. Maybe maybe pattern isn't compatible with it?
So then the second thing I need to try and optimise is the database inserts. I added a transaction hoping that would speed things up but it didn't speed it up at all. So I'm thinking maybe I should group the inserts together, so that I insert multiple rows at once, rather than inserting them individually.
Then another idea is something I read about, using LOAD DATA INFILE to load rows from a text file. In that case, I just need to generate a text file first. Would it work out faster to generate a text file in this case I wonder.
EDIT: It seems like whats taking up most time is the regular expressions. Running that part of the program by itself, it takes a really long time. 10 rows takes 4 seconds.
This is slow because you're searching a vast array of data over and over again.
It looks like you have a text file, not a dbms table, containing lines like these:
rs4477212 1 82154 AA
rs3094315 1 752566 AG
rs3131972 1 752721 AG
rs12124819 1 776546 AA
It looks like you have some other data structure containing a list of values like rs4477212. I think that's already in a table in the dbms.
I think you want exact matches for the rsxxxx values, not prefix or partial matches.
I think you want to process many different files of raw data, and extract the same batch of rsxxxx values from each of them.
So, here's what you do, in pseudocode. Don't load the whole raw data file into memory, rather process it line by line.
Read your rows of rsid values from the dbms, just once, and store them in an associative array.
for each file of raw data....
for each line of data in the file...
split the line of data to obtain the rsid. In php, $array = explode(" ", $line, 2); will yield your rsid in $array[0], and do it fast.
Look in your array of rsid values for this value. In php, if ( array_key_exists( $array[0], $rsid_array )) { ... will do this.
If the key does exist, you have a match.
extract the last column from the raw text line ('GC or whatever)
write it to your dbms.
Notice how this avoids regular expressions, and how it processes your raw data line by line. You only have to touch each line of raw data once. That's good, because your raw data is also your largest quantity of data. It exploits php's associative array feature to do the matching. All that will be much faster than your method.
To speed the process of inserting tens of thousands of rows into a table, read this. Optimizing InnoDB Insert Queries
+1 to #Ollie Jones' answer. He posted while I was working on my answer. So here's some code to get you started.
$rsids = $this->get_snps();
while ($row = $rsids->fetch_assoc()) {
$key = 'rs' . $row['rsid'];
$rsidHash[$key] = true;
}
$rawDataFd = fopen('genome_file.txt', 'r');
while ($rawData = fgetcsv($rawDataFd, 80, "\t")) {
if (array_key_exists($rawData[0], $rsidHash)) {
$genotype = $rawData[3];
// do something with genotype
}
}
I wanted to give the LOAD DATA INFILE approach to see how well that works, so I came up with what I thought is a nice elegant approach, heres the code:
$file = 'C:/wamp/www/nutri/wp-content/plugins/genomics/genome/test';
$data_query = "
LOAD DATA LOCAL INFILE '$file'
INTO TABLE wp_genomics_results
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
IGNORE 18 ROWS
(#rsid,#chromosome,#locus,#genotype)
SET file_id = '$file_id',
snp_id = (SELECT id FROM wp_pods_snp WHERE rsid = SUBSTR(#rsid,2)),
genotype = #genotype
";
$ngdb->query($data_query);
I put a foreign key restraint on the snp_id (thats the ID for my table of RSIDs) column so that it only enters genotypes for rsids that I need. Unfortunately this foreign key restraint caused some kind of error which locked the tables. Ah well. It might not have been a good approach anyhow since there are on average 200,000 rows in each of these genome files. I'll go with Ollie Jones approach since that seems to be the most effective and viable approach I've come across.

how to implement the an effective search algorithm when using php and a mysql database?

I'm new to web design, especially backend design so I have a few questions about implementing a search function in PHP. I already set up a MySQL connection but I don't know how to access specific rows in the MySQL table. Also is the similar text function implemented correctly considering I want to return results that are nearly the same as the search term? Right now, I can only return results that are the exact same or it gives "no result." For example, if I search "tex" it would return results containing "text"? I realize that there are a lot of mistakes in my coding and logic, so please help if possible. Event is the name of the row I am trying to access.
$input = $_POST["searchevent"];
while ($events = mysql_fetch_row($Event)) {
$eventname = $events[1];
$eventid = $events[0];
$diff = similar_text($input, $event, $hold)
if ($hold == '100') {
echo $eventname;
break;
else
echo "no result";
}
Thank you.
I've noticed some of the comments mentioned more efficient ways of performing the search than with the "similar text" function, if I were to use the LIKE function, how would it be implemented?
A couple of different ways of doing this:
The faster one (performance wise) is:
select * FROM Table where keyword LIKE '%value%'
The trick in this one is the placement of the % which is a wildcard, saying either search everything that ends or begins with this value.
A more flexible but (slightly) slower one could be the REGEXP function:
Select * FROM Table WHERE keyword REGEXP 'value'
This is using the power of regular expressions, so you could get as elaborate as you wanted with it. However, leaving as above gives you a "poor man's Google" of sorts, allowing the search to be bits and pieces of overall fields.
The sticky part comes in if you're trying to search names. For example, either would find the name "smith" if you searched SMI. However, neither would find "Jon Smith" if there was a first and last name field separated. So, you'd have to do some concatenation for the search to find either Jon OR Smith OR Jon Smith OR Smith, Jon. It can really snowball from there.
Of course, if you're doing some sort of advanced search, you'll have to condition your query accordingly. So, for instance, if you wanted to search first, last, address, then your query would have to test for each:
SELECT * FROM table WHERE first LIKE '%value%' OR last LIKE '%value%' OR address LIKE '%value'
Look at below example :
$word2compare = "stupid";
$words = array(
'stupid',
'stu and pid',
'hello',
'foobar',
'stpid',
'upid',
'stuuupid',
'sstuuupiiid',
);
while(list($id, $str) = each($words)){
similar_text($str, $word2compare, $percent);
if($percent > 90) // Change percentage value to 80,70,60 and see changes
print "Comparing '$word2compare' with '$str': ";
}
You can check with $percent parameter for how strong match you want to apply.

LIKE Condition in PHP Not Work correctly

i have a row in my database with name "active_sizes" and i want filter my website items by size, for this, i use LIKE Condition in php :
AND active_sizes LIKE '%" . $_GET['size'] . "%'
but by using this code i have problem
for example when $_GET['size']=7.0 this code shows items that active_sizes=17.0
my active_sizes value looks like 17.0,5.0,6.5,7.5,,
thanks
Using comma-separated values in a single field in a database is indicative of bad design. You should normalize things, and have a seperate "item_sizes" table. As it stands now, you need a VERY ugly where clause to handle such sub-string mismatches:
$s = (intval)$_GET['size'];
... WHERE (active_sizes = $s) // the only value in the field
OR (active_sizes LIKE '$s%,') // at the beginning of the field
OR (active_sizes LIKE '%,$s,%') // in the middle of the field
OR (active_sizes LIKE '%,$s') // at the end of the field
Or, if you normalized things properly and had these individual values in their own child table:
WHERE (active_sizes_child.size = $s)
I know which one I'd choose to go with...
You don't state which DB you're using, but if you're in MySQL, you can temporarily accomplish the same thing with
WHERE find_in_set($s, active_sizes)
at the cost of losing portability. Relevant docs here: http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_find-in-set
You Have % signs around your $_GET value. Combined with LIKE, this means that any string that simply contains your get value will be retuned. If you want an exact match, use the = operator instead, without the percentage signs.
This will solve your immediate issue:
AND active_sizes LIKE '" . mysql_real_escape_string($_GET['size']) . "%'
If you are using the database other than MySQL, use corresponding escape function. Never trust input data.
Besides, I'd suggest using numeric field (DECIMAL or NUMERIC) for active_sizes field. This will accelerate your queries, will let you consume less memory, create queries like active_sizes BETWEEN 16.5 AND 17.5, and generally this is more correct data type for a shoe size.

php, how to get the keywords out of an url?

the question might be a bit confusing, so here is what i have:
i insert in the database the previous link where a person came from like tihs:
$came_from = $_SERVER['HTTP_REFERER']; // get previous link
if the link is from google.com it will come like this:
http://www.google.com/#sclient=psy&hl=en&source=hp&q=this+is+a+test&pbx=1&oq=this+is+a+teat&aq=f&aqi=g-s1g-v1&aql=1&gs_sm=s&gs_upl=887l82702l3.10.3.1l17l0&bav=on.2,or.r_gc.r_pw.r_cp.&fp=c3d3303&biw=1920&bih=995
if we look inside we can find q=this+is+a+testas beeing the keywords that i search for.
my question is how can i create a query to return http://www.google.com/ | this+is+a+test ?
i know that the keywords have the + sign in between them.
so far i came up with this, but not exactly what i wanted:
SELECT SUBSTRING_INDEX (table, '+', 1), table FROM table.table WHERE table LIKE '%+%' LIMIT 20
any ideas?
thanks
edit: what happend is that sometimes i get some other url's that don't have q= but maybe seearch=, so i want to keep track of the + sign
As it's been pointed out, you can't reliably get the keywords without supplying the parameters to look for. Here's what I would do:
$url = 'http://www.google.com/#sclient=psy&hl=en&source=hp&q=this+is+a+test&pbx=1&oq=this+is+a+teat&aq=f&aqi=g-s1g-v1&aql=1&gs_sm=s&gs_upl=887l82702l3.10.3.1l17l0&bav=on.2,or.r_gc.r_pw.r_cp.&fp=c3d3303&biw=1920&bih=995';
$possible = array('q', 'ssearch', 'oq');
$query_str = NULL;
foreach ($possible as $search) {
if (isset($arr[$search])) {
$query_str = $arr[$search];
break;
}
}
Basically all this does is parse the url using PHP's parse_str() and look for the parameter q. If it's not there, it uses ssearch, and then oq. You can add more of them if you need to. If by the end of it it's not found, $query_str will be NULL.
Unless you have a very compelling reason to do it with MySQL only, just process everything on the PHP side. Databases are made to store data, not process it. What I would do is have PHP figure out the search engine and the keywords used and insert those into the DB, as separate fields. ie, have a table like so:
search_engine | query_str
------------- | -----------
google | test
yahoo | something
...
If you know that you need q=... then you can use regexp. I will update post if that's what you need.
As everyone is saying, you need to use the key value (in your example, q). In MySQL, you can do something like this:
SELECT SUBSTRING_INDEX(table, '?q=', -1), table FROM table.table WHERE table LIKE '?' LIMIT 20
I'd also suggest you rename your table column to something other than 'table'.

MySQL: How to search for spelling variants? ("murrays", "murray's" etc)

I want to search like this: the user inputs e.g. "murrays", and the search result will show both records containing "murrays" and records containing "murray's". What should I do in my query.pl?
What do you think about using the SOUNDEX function and the SOUNDS LIKE operator ?
That way, you can simply do:
SELECT * from USERS WHERE name SOUNDS LIKE 'murrays'
I'm pretty sure it doesn't work for every case, and perhaps it is not the most efficient way to solve the problem, but it could fit your needs.
This won't help if you absolutely need to do these queries in SQL, but if you can set up a Lucene search index for it, you gain a lot of this kind of "fuzzy search" functionality. Note though that Lucene is quite a complex topic by itself.
What you could do is create an extra field in the database, which contains the data with all special characters stripped from it, and search there. A bit lame, I know. Looking forward to see smarter answers ;)
Quick and dirty:
SELECT * FROM myTable WHERE REPLACE(name, '\'', '') = 'murrays'
I would first build a search column which has the text without punctuation and then search on that. Otherwise you'll have have to have a series of regular expressions to search against or check individual records in PHP for matching: both of which are computational intensive operations.
Maybe something like this: (untested!)
SELECT * FROM users WHERE REPLACE(user_name, '\'', '') = "murrays"
If this is for single word searching, you could try using Soundex or Metaphone functions? These would handle sounds-like as well as spelling
Not sure if MySQL has these, but PHP does (which would require separate columns to hold these values).
Otherwise, Richy's no-punctuation extra column seems best.
You could try adding a replace to your query like this
replace(name, '''','')
to temporarily get rid of the apostrophes for the match.
select name from nametable where name = replace(name,'''','');
This query should be able to pick up "murrays" or "murray's".
var inputStr = "murrays";
inputStr = String.Replace("'", "\'", inputStr);
SELECT * FROM ATable WHERE Replace(AField, '\'', '') = inputStr OR AField = inputStr
strip user input and names in database from all non-letter characters.
Use levenstein distance or soundex to find murrays with murray or marrays. This is optional but your users would love that.

Categories