Google Style Regular Expression Search

Google Style Regular Expression Search - php

It's been several years since I have used regular expressions, and I was hoping I could get some help on something I'm working on. You know how google's search is quite powerful and will take stuff inside quotes as a literal phrase and things with a minus sign in front of them as not included.
Example: "this is literal" -donotfindme site:examplesite.com
This example would search for the phrase "this is literal" in sites that don't include the word donotfindme on the webiste examplesite.com.
Obviously I'm not looking for something as complex as Google I just wanted to reference where my project is heading.
Anyway, I first wanted to start with the basics which is the literal phrases inside quotes. With the help of another question on this site I was able to do the following:
(this is php)
$search = 'hello "this" is regular expressions';
$pattern = '/".*"/';
$regex = preg_match($pattern, $search, $matches);
print_r($matches);
But this outputs "this" instead of the desired this, and doesn't work at all for multiple phrases in quotes. Could someone lead me in the right direction?
I don't necessarily need code even a real nice place with tutorials would probably do the job.
Thanks!

Well, for this example at least, if you want to match only the text inside the quotes you'll need to use a capturing group. Write it like this:
$pattern = '/"(.*)"/';
and then $matches will be an array of length 2 that contains the text between the quotes in element 1. (It'll still contain the full text matched in element 0) In general, you can have more than one set of these parentheses; they're numbered from the left starting at 1, and there will be a corresponding element in $matches for the text that each group matched. Example:
$pattern = '/"([a-z]+) ([a-z]+) (.*)"/';
will select all quoted strings which have two lowercase words separated by a single space, followed by anything. Then $matches[1] will be the first word, $matches[2] the second word, and $matches[3] the "anything".
For finding multiple phrases, you'll need to pick out one at a time with preg_match(). There's an optional "offset" parameter you can pass, which indicates where in the string it should start searching, and to find multiple matches you should give the position right after the previous match as the offset. See the documentation for details.
You could also try searching Google for "regular expression tutorial" or something like that, there are plenty of good ones out there.

Sorry, but my php is a bit rusty, but this code will probably do what you request:
$search = 'hello "this" is regular expressions';
$pattern = '/"(.*)"/';
$regex = preg_match($pattern, $search, $matches);
print_r($matches[1]);
$matches1 will contain the 1st captured subexpression; $matches or $matches[0] contains the full matched patterns.
See preg_match in the PHP documentation for specifics about subexpressions.
I'm not quite sure what you mean by "multiple phrases in quotes", but if you're trying to match balanced quotes, it's a bit more involved and tricky to understand. I'd pick up a reference manual. I highly recommend Mastering Regular Expressions, by Jeffrey E. F. Friedl. It is, by far, the best aid to understanding and using regular expressions. It's also an excellent reference.

Here is the complete answer for all the sort of search terms (literal, minus, quotes,..) WITH replacements . (For google visitors at the least).
But maybe it should not be done with only regular expressions though.
Not only will it be hard for yourself or other developers to work and add functionality on what would be a huge and super complex regular expression otherwise
it might even be that it is faster with this approach.
It might still need a lot of improvement but at least here is a working complete solution in a class. There is a bit more in here than asked in the question, but it illustrates some reasons behind some choices.
class mySearchToSql extends mysqli {
protected function filter($what) {
if (isset(what) {
//echo '<pre>Search string: '.var_export($what,1).'</pre>';//debug
//Split into different desires
preg_match_all('/([^"\-\s]+)|(?:"([^"]+)")|-(\S+)/i',$what,$split);
//echo '<pre>'.var_export($split,1).'</pre>';//debug
//Surround with SQL
array_walk($split[1],'self::sur',array('`Field` LIKE "%','%"'));
array_walk($split[2],'self::sur',array('`Desc` REGEXP "[[:<:]]','[[:>:]]"'));
array_walk($split[3],'self::sur',array('`Desc` NOT LIKE "%','%"'));
//echo '<pre>'.var_export($split,1).'</pre>';//debug
//Add AND or OR
$this ->where($split[3])
->where(array_merge($split[1],$split[2]), true);
}
}
protected function sur(&$v,$k,$sur) {
if (!empty($v))
$v=$sur[0].$this->real_escape_string($v).$sur[1];
}
function where($s,$OR=false) {
if (empty($s)) return $this;
if (is_array($s)) {
$s=(array_filter($s));
if (empty($s)) return $this;
if($OR==true)
$this->W[]='('.implode(' OR ',$s).')';
else
$this->W[]='('.implode(' AND ',$s).')';
} else
$this->W[]=$s;
return $this;
}
function showSQL() {
echo $this->W? 'WHERE '. implode(L.' AND ',$this->W).L:'';
}
Thanks for all stackoverflow answers to get here!

You're in luck because I asked a similar question regarding string literals recently. You can find it here: Regex for managing escaped characters for items like string literals
I ended up using the following for searching for them and it worked perfectly:
(?<!\\)(?:\\\\)*(\"|')((?:\\.|(?!\1)[^\\])*)\1
This regex differs from the others as it properly handles escaped quotation marks inside the string.

Related

How to get a number from a html source page?

I'm trying to retrieve the followed by count on my instagram page. I can't seem to get the Regex right and would very much appreciate some help.
Here's what I'm looking for:
y":{"count":
That's the beginning of the string, and I want the 4 numbers after that.
$string = preg_replace("{y"\"count":([0-9]+)\}","",$code);
Someone suggested this ^ but I can't get the formatting right...

You haven't posted your strings so it is a guess to what the regex should be... so I'll answer on why your codes fail.
preg_replace('"followed_by":{"count":\d')
This is very far from the correct preg_replace usage. You need to give it the replacement string and the string to search on. See http://php.net/manual/en/function.preg-replace.php
Your second usage:
$string = preg_replace(/^y":{"count[0-9]/","",$code);
Is closer but preg_replace is global so this is searching your whole file (or it would if not for the anchor) and will replace the found value with nothing. What your really want (I think) is to use preg_match.
$string = preg_match('/y":\{"count(\d{4})/"', $code, $match);
$counted = $match[1];
This presumes your regex was kind of correct already.
Per your update:
Demo: https://regex101.com/r/aR2iU2/1
$code = 'y":{"count:1234';
$string = preg_match('/y":\{"count:(\d{4})/', $code, $match);
$counted = $match[1];
echo $counted;
PHP Demo: https://eval.in/489436
I removed the ^ which requires the regex starts at the start of your string, escaped the { and made the\d be 4 characters long. The () is a capture group and stores whatever is found inside of it, in this case the 4 numbers.
Also if this isn't just for learning you should be prepared for this to stop working at some point as the service provider may change the format. The API is a safer route to go.

This regexp should capture value you're looking for in the first group:
\{"count":([0-9]+)\}
Use it with preg_match_all function to easily capture what you want into array (you're using preg_replace which isn't for retrieving data but for... well replacing it).
Your regexp isn't working because you didn't escaped curly brackets. And also you didn't put count quantifier (plus sign in my example) so it would only capture first digit anyway.

Regex to detect everything not between {} and then search from whats matched

I am very new to both stackoverflow and Regexes so please forgive mistakes.
I have been searching thoroughly for a Regex to match all text that is not between curly brackets {} and from that text find certain words. For example from the following string:
$content = 'Hello world, { this } is a string with { curly brackets } and this is for testing'
I would like the search for word this to return only the second occurrence of this because its in the area which is not inside curly brackets.
Even if I can get a Regex to match the substrings outside the curly brackets, things get simplified for me. I found this Regex /(}([^}]*){)/ but it cannot select the parts Hello world, and and this is for testing because these are not inside }{ and it only selects is a string with part.
Also I would like to ask if it is possible to combine two Regex for a single purpose like mine. For example the first Regex finds strings outside {} and second finds specific words that are searched for.
I want to use this Regex in php and for now I am using a function which is more like a hack. The purpose is to find specific words that are not in {} ,replace them reliably and write to text files.
Thanks in advance for your help.

(*SKIP)(*F)
You're in luck, as php's PCRE regex engine has a syntax that is wonderful for this kind of task. This tidy regex work like a charm (see demo):
{[^{}]*}(*SKIP)(*F)|\bthis\b
Okay, but how does it work?
Glad you asked. The left side of the alternation | matches complete {braces}then deliberately fails, after which the engine skips to the next position in the string. The right side matches the this words you want, and we know they're the right ones because they weren't matched by the expression on the left...
How to use it in PHP
Just the usual, something like:
$regex = "~{[^{}]*}(*SKIP)(*F)|\bthis\b~";
$count = preg_match_all($regex,$string,$matches);
You'll want to have a look at $matches[0]
Further reading about this and similar exclusion techniques
This situation is very similar to this question about "regex-matching a pattern unless...", which, if you're interested and enjoyed (*SKIP) power, you might like to read to fully understand the technique and how it can be extended.

With strings not very long, I'd use simple string manipulation functions to make these searchable
$content = 'Hello world, { this } is a string with { curly brackets } and this is for testing';
function searchify($stack,$charStart='{',$charEnd='}') {
$searchArea = '';
$first = explode($charStart,$stack);
foreach ($first as $string) {
list($void,$ok) = (strpos($string,$charEnd) ? explode($charEnd,$string) : array('',$string));
$searchArea.= $ok;
}
return $searchArea;
}
this returns a cleared string, then strtr...
$replacing = array
('with'=>'this',
"\n"=>'<br>',
' '=>"<br>",);
$raw = searchify($content);
$replaced = strtr($raw,$replacing);
var_dump($replaced);
...to replace values in it.

regular expressions checking two strings

Hi wonder if anyone can help - I'm trying to check for occurance of one of two possible strings using regex - but my knowlege of regex is very limited, so I'm not having much sucess.
I'm trying to look for 'Email' and 'eMailConfirm', this is what I have so far and is working for Email
subject is the id of a input field, so it could be 'name','Email','eMailConfirm'
$subject = $getPromoOuter['label'];
$pattern = '/^Email/';
preg_match($pattern, $subject, $matches, PREG_OFFSET_CAPTURE, 0);
I tried a number of potential expressions to try and incorporate the second string but I can't seem to get it to play (plus a few guesswork ones based on others)
any idea how I can concatenate those two strings and check for an occurance of either?
Thanks for looking

I'll just place an answer here, as I do think I have a good idea what your requirement is.
Your current regex is /^Email/ which matches any string which starts with 'Email'. (whether or not it has to start with it is unclear to me).
In case you need to match either Email or eMailConfirm, not at the start of the string, you should go for
/Email|eMailConfirm/
If the matches do need to be at the front of the string, just prepend both with a '^' character:/^Email|^eMailConfirm/

Replacing a string inside a string in PHP

I have strings in my application that users can send via a form, and they can optionally replace words in that string with replacements that they also specify. For example if one of my users entered this string:
I am a user string and I need to be parsed.
And chose to replace and with foo the resulting string should be:
I am a user string foo I need to be parsed.
I need to somehow find the starting position of what they want to replace, replace it with the word they want and then tie it all together.
Could anyone write this up or at least provide an algorithm? My PHP skills aren't really up to the task :(
Thanks. :)

$result = preg_replace('/\band\b/i', 'foo', $subject);
will find all occurences of and where it's a word on its own and replace it with foo. \b ensures that there is a word boundary before and after and.

use preg_replace. You don't need to think so hard about this though you will have to learn a little bit about regexes. :)

Read up on str_replace, or for more complex replacements on Regular Expressions and preg_replace.
Examples for both:
<?php
$str = 'I am a user string and I need to be parsed.';
echo str_replace( 'and', 'foo', $str ) . "\n";
echo preg_replace( '/and/', 'foo', $str ) . "\n";
?>
In response to the comments of this answer, note that both examples above will replace every occurrence of the search string (and), even when it happens to be within another word.
To take care of that you either have to add the word separators to the str_replace call (see the comment of an example), but this will get quite complicated when you want to take care of all common word separators (space, commas, dots, exclamation marks, question marks etc.).
An easier to way to fix this problem is to use the power of regular expressions and make sure, the actual search string is not found within another word. See Tim Pietzcker's example below for a possible solution.

preg_match returning weird results

I am searching a string for urls...and my preg_match is giving me an incorrect amount of matches for my demo string.
String:
Hey there, come check out my site at www.example.com
Function:
preg_match("#(^|[\n ])([\w]+?://[\w]+[^ \"\n\r\t<]*)#ise", $string, $links);
echo count($links);
The result comes out as 3.
Can anybody help me solve this? I'm new to REGEX.

$links is the array of sub matches:
If matches is provided, then it is filled with the results of search. $matches[0] will contain the text that matched the full pattern, $matches[1] will have the text that matched the first captured parenthesized subpattern, and so on.
The matches of the two groups plus the match of the full regular expression results in three array items.
Maybe you rather want all matches using preg_match_all.

If you use preg_match_pattern, (as Gumbo suggested), please note that if you run your regex against this string, it will both match the value of your anchor attribute "href" as well as the linked Text which in this case happens to comtain an url. This makes TWO matches.
It would be wise to run an array_unique on your resultset :)

In addition to the advice on how to use preg_match, I believe there is something seriously wrong with the regular expression you are using. You may want to trying something like this instead:
preg_match("_([a-zA-Z]+://)?([0-9a-zA-Z$-\_.+!*'(),]+\.)?([0-9a-zA-Z]+)+\.([a-zA-Z]+)_", $string, $links);
This should handle most cases (although it wouldn't work if there was a query string after the top-level domain). In the future, when writing regular expressions, I recommend the following web-sites to help: http://www.regular-expressions.info/ and especially http://regexpal.com/ for testing them as you're writing them.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Google Style Regular Expression Search - php

Related

How to get a number from a html source page?

Regex to detect everything not between {} and then search from whats matched

regular expressions checking two strings

Replacing a string inside a string in PHP

preg_match returning weird results

Categories

Resources