Why does preg_match not find some literal words - php

Using PHP, I am trying to test for the presence of various words and patterns in a string but am not able to figure out why I am seeing odd behaviour when attempting to match certain words.
Example 1:
Why does the following not return 1?
$test = 'clen=a.le​ngth;for(i=0;i<clen;i++)b+=St​ring.fr​omCh​arCode(a.char​CodeAt(i)^2)';
$result = preg_match('/(string)/i', $test, $matches);
$result is always zero for the above even though the word "String" is present in the subject string.
Example 2:
However, let's say I slightly change my regex to the following:
$test = 'clen=a.le​ngth;for(i=0;i<clen;i++)b+=St​ring.fr​omCh​arCode(a.char​CodeAt(i)^2)';
$result = preg_match('/st.+(ring)/i', $test, $matches);
The above returns the value of 1 for $result. Seems like when I split up the word "string" into separate parts, I can get a match.
Example 3:
Once again when I slightly modify the regex in this example, it also returns zero but I'm not sure why:
$test = 'clen=a.le​ngth;for(i=0;i<clen;i++)b+=St​ring.fr​omCh​arCode(a.char​CodeAt(i)^2)';
$result = preg_match('/(tring)/i', $test, $matches);
Trying to match on the sequence of characters such as "tring" returns 0 but when matching on "ring" it returns 1. But "tring" doesn't sound like any type of special or reserved word!
This behaviour is also the same for various other words such as "document" and "unescape" and I'm sure there are many others.
I am assuming that some words are probably being treated differently by the regex engine because they might be reserved or special in some way but I have not been able to find an official explanation for the above behaviour.
I apologise if I am missing something really obvious and would really appreciate it if someone can please explain this to me.
Many thanks.

i think your first regex is fine. Look here
https://regex101.com/r/tO9vN8/1
But there seems to be a Problem with the charset, i had to rewrite the expression - if i copy from this site, the regex did not match.
I hope this will be the right direction ...

Related

Correct regex for this pattern

I've got some issues understanding this regex.
I tried doing a pattern but does not work like intended.
What I want is [A-Za-z]{2,3}[0-9]{2,30}
That is 2-3 letters in the beginning and 2-30 numbers after that
FA1321321
BFA18098097
I want to use it to validate an input field but can't figure out how the regex should look like.
Can any one that can help me out even explain a bit about it?
Your regex is correct - just make sure to surround it with / in PHP, and perhaps ^, $ if you want it to strictly match the entire string (no extra characters before/after).
$pattern = "/^[A-Za-z]{2,3}[0-9]{2,30}$/"
$found = preg_match($pattern, $your_str);
From the PHP documentation:
preg_match() returns 1 if the pattern matches given subject, 0 if it does not, or FALSE if an error occurred.

How to get a number from a html source page?

I'm trying to retrieve the followed by count on my instagram page. I can't seem to get the Regex right and would very much appreciate some help.
Here's what I'm looking for:
y":{"count":
That's the beginning of the string, and I want the 4 numbers after that.
$string = preg_replace("{y"\"count":([0-9]+)\}","",$code);
Someone suggested this ^ but I can't get the formatting right...
You haven't posted your strings so it is a guess to what the regex should be... so I'll answer on why your codes fail.
preg_replace('"followed_by":{"count":\d')
This is very far from the correct preg_replace usage. You need to give it the replacement string and the string to search on. See http://php.net/manual/en/function.preg-replace.php
Your second usage:
$string = preg_replace(/^y":{"count[0-9]/","",$code);
Is closer but preg_replace is global so this is searching your whole file (or it would if not for the anchor) and will replace the found value with nothing. What your really want (I think) is to use preg_match.
$string = preg_match('/y":\{"count(\d{4})/"', $code, $match);
$counted = $match[1];
This presumes your regex was kind of correct already.
Per your update:
Demo: https://regex101.com/r/aR2iU2/1
$code = 'y":{"count:1234';
$string = preg_match('/y":\{"count:(\d{4})/', $code, $match);
$counted = $match[1];
echo $counted;
PHP Demo: https://eval.in/489436
I removed the ^ which requires the regex starts at the start of your string, escaped the { and made the\d be 4 characters long. The () is a capture group and stores whatever is found inside of it, in this case the 4 numbers.
Also if this isn't just for learning you should be prepared for this to stop working at some point as the service provider may change the format. The API is a safer route to go.
This regexp should capture value you're looking for in the first group:
\{"count":([0-9]+)\}
Use it with preg_match_all function to easily capture what you want into array (you're using preg_replace which isn't for retrieving data but for... well replacing it).
Your regexp isn't working because you didn't escaped curly brackets. And also you didn't put count quantifier (plus sign in my example) so it would only capture first digit anyway.

php regexp "AND"

I have to create translations for the project I work on. The simplest solution was to change all stings to a functioncall(string) so that I could get unique string hashes everytime.
My code has the following different t() function uses:
<label for="anon"><?php echo t('anonymously? (20% extra)')?></label>
exit(t("Success! You made {amount} oranges out of it!", array('amount' => $oranges)));
echo t('You failed.');
My current regexp is:
$transMatches = preg_match_all('/[^a-z]t\([^)(]+/m', $contents, $matches);
The problem is that it fails on #1 example, matchin "anonymously?".
What I really want to achieve is: "match t( then match either ' or " then match anything except what you matched for ' or " and )"
Idea: t\(['|"](.*?)[^'|"]\)?
I cannot make above regexp to work.
How could I do AND in regexp so that it matches "['|"] AND )" OR "['|"] AND, array"
Please help me on regexp and explain why it works.
Thank you!
Parsing function arguments may be quite complex, but you need to parse only the first argument which (for simplicity) can assume always to be string escaped either with ' or with ", thus those regexps may match"
"[^"\\\\]*(?:\\\\.[^"\\\\]*)*"
\'[^\'\\\\]*(?:\\\\.[^\'\\\\]*)*'
Therefore you just need to match:
'~[^\w\d]t\(\s*("[^"\\\\]*(?:\\\\.[^"\\\\]*)*"|\'[^\'\\\\]*(?:\\\\.[^\'\\\\]*)*\')~i'
[^\w\d] assumes that no test1t will match, \s* makes you space tolerant...
With this regexp you'll get results like:
'anonymously? (20% extra)'
"Success! You made {amount} oranges out of it!"
And I can't imagine situation where you would need to parse out array too, can you describe it in comment/question?
Is this what you need?
Using Backreferences in The Regular Expression - http://www.regular-expressions.info/brackets.html
But it looks strange what are you doing, and why?
Are you replacing the function call with some result? why dont you just let it call the function and return translation from it?

RegEx pattern to match a string between two characters, but exclude the characters

Greetings all,
I have yet another RegEx question. I have done hours of searching but apparently I have missed the one key article I need.
I need to use preg_match() in PHP to match a string that is between parenthesis, but NOT have the parenthesis show up in the result. I have seen examples of similar issues, but I believe my problem is different because it's actually parenthesis that I am dealing with, with is a meta character in RegEx. Any merit to that?
Anyways...
String is:
"200 result=1 (SIP/100-00000033)"
Current code is:
preg_match("/\((.*)\)/s", $res, $matches);
$matches[0] becomes:
"(SIP/100-00000033)"
What I WANT is:
"SIP/100-00000033"
I apologize because I'm sure this is VERY simple but I'm just not grasping it. Would anyone care to educate me?
Thank you in advance!!
Well, it all refers to the way you group items in the regular expression. Your solution is actually correct, you're just using the wrong index for matches. Try:
$matches[1]
If that somehow gives errors, post'em and we'll fix.
If you really want the full match to exclude the parentheses, you can use look-ahead and look-behind assertions:
preg_match('/(?<=\().*(?=\))/s', $res, $matches);

Google Style Regular Expression Search

It's been several years since I have used regular expressions, and I was hoping I could get some help on something I'm working on. You know how google's search is quite powerful and will take stuff inside quotes as a literal phrase and things with a minus sign in front of them as not included.
Example: "this is literal" -donotfindme site:examplesite.com
This example would search for the phrase "this is literal" in sites that don't include the word donotfindme on the webiste examplesite.com.
Obviously I'm not looking for something as complex as Google I just wanted to reference where my project is heading.
Anyway, I first wanted to start with the basics which is the literal phrases inside quotes. With the help of another question on this site I was able to do the following:
(this is php)
$search = 'hello "this" is regular expressions';
$pattern = '/".*"/';
$regex = preg_match($pattern, $search, $matches);
print_r($matches);
But this outputs "this" instead of the desired this, and doesn't work at all for multiple phrases in quotes. Could someone lead me in the right direction?
I don't necessarily need code even a real nice place with tutorials would probably do the job.
Thanks!
Well, for this example at least, if you want to match only the text inside the quotes you'll need to use a capturing group. Write it like this:
$pattern = '/"(.*)"/';
and then $matches will be an array of length 2 that contains the text between the quotes in element 1. (It'll still contain the full text matched in element 0) In general, you can have more than one set of these parentheses; they're numbered from the left starting at 1, and there will be a corresponding element in $matches for the text that each group matched. Example:
$pattern = '/"([a-z]+) ([a-z]+) (.*)"/';
will select all quoted strings which have two lowercase words separated by a single space, followed by anything. Then $matches[1] will be the first word, $matches[2] the second word, and $matches[3] the "anything".
For finding multiple phrases, you'll need to pick out one at a time with preg_match(). There's an optional "offset" parameter you can pass, which indicates where in the string it should start searching, and to find multiple matches you should give the position right after the previous match as the offset. See the documentation for details.
You could also try searching Google for "regular expression tutorial" or something like that, there are plenty of good ones out there.
Sorry, but my php is a bit rusty, but this code will probably do what you request:
$search = 'hello "this" is regular expressions';
$pattern = '/"(.*)"/';
$regex = preg_match($pattern, $search, $matches);
print_r($matches[1]);
$matches1 will contain the 1st captured subexpression; $matches or $matches[0] contains the full matched patterns.
See preg_match in the PHP documentation for specifics about subexpressions.
I'm not quite sure what you mean by "multiple phrases in quotes", but if you're trying to match balanced quotes, it's a bit more involved and tricky to understand. I'd pick up a reference manual. I highly recommend Mastering Regular Expressions, by Jeffrey E. F. Friedl. It is, by far, the best aid to understanding and using regular expressions. It's also an excellent reference.
Here is the complete answer for all the sort of search terms (literal, minus, quotes,..) WITH replacements . (For google visitors at the least).
But maybe it should not be done with only regular expressions though.
Not only will it be hard for yourself or other developers to work and add functionality on what would be a huge and super complex regular expression otherwise
it might even be that it is faster with this approach.
It might still need a lot of improvement but at least here is a working complete solution in a class. There is a bit more in here than asked in the question, but it illustrates some reasons behind some choices.
class mySearchToSql extends mysqli {
protected function filter($what) {
if (isset(what) {
//echo '<pre>Search string: '.var_export($what,1).'</pre>';//debug
//Split into different desires
preg_match_all('/([^"\-\s]+)|(?:"([^"]+)")|-(\S+)/i',$what,$split);
//echo '<pre>'.var_export($split,1).'</pre>';//debug
//Surround with SQL
array_walk($split[1],'self::sur',array('`Field` LIKE "%','%"'));
array_walk($split[2],'self::sur',array('`Desc` REGEXP "[[:<:]]','[[:>:]]"'));
array_walk($split[3],'self::sur',array('`Desc` NOT LIKE "%','%"'));
//echo '<pre>'.var_export($split,1).'</pre>';//debug
//Add AND or OR
$this ->where($split[3])
->where(array_merge($split[1],$split[2]), true);
}
}
protected function sur(&$v,$k,$sur) {
if (!empty($v))
$v=$sur[0].$this->real_escape_string($v).$sur[1];
}
function where($s,$OR=false) {
if (empty($s)) return $this;
if (is_array($s)) {
$s=(array_filter($s));
if (empty($s)) return $this;
if($OR==true)
$this->W[]='('.implode(' OR ',$s).')';
else
$this->W[]='('.implode(' AND ',$s).')';
} else
$this->W[]=$s;
return $this;
}
function showSQL() {
echo $this->W? 'WHERE '. implode(L.' AND ',$this->W).L:'';
}
Thanks for all stackoverflow answers to get here!
You're in luck because I asked a similar question regarding string literals recently. You can find it here: Regex for managing escaped characters for items like string literals
I ended up using the following for searching for them and it worked perfectly:
(?<!\\)(?:\\\\)*(\"|')((?:\\.|(?!\1)[^\\])*)\1
This regex differs from the others as it properly handles escaped quotation marks inside the string.

Categories