php regex / preg_match - php

I'm trying to match the numbers inside ('')
$linkvar ="<a onclick="javascript:open('597967');" class="links">more</a>"
preg_match("^[0-9]$",$linkvar,$result);

Your regex only matches if the entire string is made up of one number because of the ^ and $ modifiers. Your current regex translates in human language to:
^ means "this is the start of the string"
[0-9] means "match a single numeric character"
$ means "this is the end of the string"
Change it to:
preg_match("[0-9]+",$linkvar,$result);
Or alternatively, the shorthand syntax for matching numbers:
preg_match("\d+",$linkvar,$result);
The + modifier means that "one or more" numbers must be found in order for it to be a match.
Additionally, if you want to actually capture the numbers inside the string you'll need to add parentheses to inform preg_match that you actually want to "save" the numbers.

Your regex will only match if the string is exactly one digit. To match only the digits inside the quotes, use:
preg_match("/'(\d+)'/", $linkvar, $result);
var_dump($result[1]);

The ^ and $ match the start and end of the string, which means you are currently searching for a string containing ONLY a single digit. Remove them and add a plus quantifier, leaving just "[0-9]+", and it will find the first group of digits in the string.
preg_match("[0-9]+",$linkvar,$result);

Related

How to extract the last 2 delimitered numbers using regex

I have to extract the first instance of a number-number. For example I want to extract 8236497-234783 from the string bnjdfg/dfg.vom/fdgd3-8236497-234783/dfg8jfg.vofg. The string has no apparent structure besides the number followed by a dash and followed by a number which is the thing I want to extract.
The thing I want to extract may be at the very start of the string, or the middle, or the end, or maybe the entire string itself is just a number-number.
$b = "bnjdfg/dfg.vom/fdgd3-8236497-234783/dfg8jfg.vofg";
preg_match('\d-\d', $b, $matches);
echo($matches[0]);
// Expecting to print 8236497-234783
You're missing the delimiter around the regexp. PHP's preg functions require that the regex begin with a punctuation character, and it looks for the matching character at the end of the regexp (because flags can be put after the second delimiter).
\d just matches a single digit. If you want to match a string of digits, you should write \d+.
You should require that the numbers be surrounded by word boundaries with \b, otherwise it will match the 3 at the end of fdgd3
preg_match('/\b\d+-\d+\b/', $b, $matches);

How do I test if string maches integer:integer with preg_match?

I need a regular expression to test if string matches integer:integer (ex: 9:4).
I have tried
preg_match("[0-9]:[0-9]", $str)
but it's not correct.
You have to mark the start and end of the regular expression, usually with /.
Try this:
preg_match("/[0-9]:[0-9]/", $str)
One hint: you can use \d instead of [0-9].
If you want to make sure that the string only contains digit:digit, use ^ as the marker for the start of the string and $ for the end:
preg_match("/^[0-9]:[0-9]$/", $str)
Also, add + to match numbers of more than one digit:
preg_match("/^[0-9]+:[0-9]+$/", $str)
^[0-9](:[0-9])*$
^ matches the start of the string, and $ matches the end, ensuring that you're examining the entire string. It will match a single digit, plus zero or more instances of a colons followed by a digit after it.

Explain the Regular Expression /^[a-zA-Z ]*/

I understand that the regex pattern must match a string which starts with the combination and the repetition of the following characters:
a-z
A-Z
a white-space character
And there is no limitation to how the string may end!
First Case
So a string such as uoiui897868 (any string that only starts with space, a-z or A-Z) matches the pattern... (Sure it does)
Second Case
But the problem is a string like 76868678jugghjiuh (any string that only starts with a character other than space, a-z or A-Z) matches too! This should not happen!
I have checked using the php function preg_match() too , which returns true (i.e. the pattern matches the string).
Also have used other online tools like regex101 or regexr.com. The string does match the pattern.
Can anybody could help me understand why the pattern matches the string described in the second case?
/^[a-zA-Z ]*/
Your regex will match strings that "begin with" any number (including zero) of letters or spaces.
^ means "start of string" and * means "zero or more".
Both uoiui897868 and 76868678jugghjiuh start with 0 or more letters/spaces, so they both match.
You probably want:
/^[a-zA-Z ]+/
The + means "one or more", so it won't match zero characters.
Your regex is completely useless: it will trivially match any string (empty, non-empty, with numbers, without,...), regardless of its structure.
This because
with ^, you enforce the begin of the string, now every string has a start.
You use a group [A-Za-z ], but you use a * operator, so 0 or more repititions. Thus even if the string does not contain (or begins with) a character from [A-Za-z ], the matcher will simply say: zero matches and parse the remaining of the string.
You need to use + instead of * to enforce "at least one character".
The '*' quantifier on the end means zero or more matches of the character, so all strings will match. Perhaps you want to drop the wildcard quantifier, or change it to a '+' quantifier, and add a '$' on the end to test the whole string.
What you really want is to match one or more of the preceding characters.
For that you use +
/^[a-zA-Z ]+/

Regex to match numbers, # # % signs

I am trying to write a regex that matches all numbers (0-9) and # # % signs.
I have tried ^[0-9#%#]$ , it doesn't work.
I want it to match, for example: 1234345, 2323, 1, 3#, %#, 9, 23743, #####, or whatever...
There must be something missing?
Thank you
You're almost right... All you're missing is something to tell the regular expression there may be more than once of those characters like a * (0 or more) or a + (1 or more).
^[0-9#%#]+$
The ^ and $ are used do indicate the start and end of a string, respectively. Make sure that you string only contains those characters otherwise, it won't work (e.g. "The number is 89#1" wouldn't work because the string begins with something other than 0-9, #, %, or #).
Your pattern ^[0-9#%#]$ only matches strings that are one character long. The [] construct matches a single character, and the ^ and $ anchors mean that nothing can come before or after the character matched by the [].
If you just want to know if the string has one of those characters in it, then [0-9#%#] will do that. If you want to match a string that must have at least one character in it, then use ^[0-9#%#]+$. The "+" means to match one or more of the preceding item. If you also want to match empty strings, then use [0-9#%#]*. The "*" means to match zero or more of the preceding item.
It should be /^[0-9#%#]+$/. The + is a qualifier that means "one or more of the preceding".
The problem with your current regex is that it will only match one character that could either be a number or #, %, or #. This is because the ^ and $ characters match the beginning and the end of the line respectively. By adding the + qualifier, you are saying that you want to match one or more of the preceding character-class, and that the entire line consists of one or more of the characters in the specified character-class.
remove the caret (^), it is used to match from the start of the string.
You forgot "+"
^[0-9#%#]+$ must work

How can I match occurrences of string not in another string using regular expressions?

I'm trying to match all occurances of "string" in something like the following sequence except those inside ##
as87dio u8u u7o #string# ou os8 string os u
i.e. the second occurrence should be matched but not the first
Can anyone give me a solution?
You can use negative lookahead and lookbehind:
(?<!#)string(?!#)
EDIT
NOTE: As per Marks comments below, this would not match #string or string#.
You can try:
(?:[^#])string(?:[^#])
OK,
If you want to NOT match a character you put it in a character class (square brackets) and start it with the ^ character which negates it, for example [^a] means any character but a lowercase 'a'.
So if you want NOT at-sign, followed by string, followed by another NOT at-sign, you want
[^#]string[^#]
Now, the problem is that the character classes will each match a character, so in your example we'd get " string " which includes the leading and trailing whitespace. So, there's another construct that tells you not to match anything, and that is parens with a ?: in the beginning. (?: ). So you surround the ends with that.
(?:[^#])string(?:[^#])
OK, but now it doesn't match at the start of string (which, confusingly, is the ^ character doing double-duty outside a character class) or at the end of string $. So we have to use the OR character | to say "give me a non-at-sign OR start of string" and at the end "give me an non-at-sign OR end of string" like this:
(?:[^#]|^)string(?:[^#]|$)
EDIT: The negative backward and forward lookahead is a simpler (and clever) solution, but not available to all regular expression engines.
Now a follow-up question. If you had the word "astringent" would you still want to match the "string" inside? In other words, does "string" have to be a word by itself? (Despite my initial reaction, this can get pretty complicated :) )

Categories