I have a string with text, numbers, and symbols. I'm trying to extract the numbers, and symbols from the string with limited success. Instead of getting the entire number and symbols, I'm only getting part of it. I will explain my regex below, to make it more clearer, and easier to understand.
\d : any number
[+,-,*,/,0-9]+ : 1 or more of any +,-,*,/, or number
\d : any number
Code:
$string = "text 1+1-1*1/1= text";
$regex = "~\d[+,-,*,/,0-9]+\d~siU";
preg_match_all($regex, $string, $matches);
echo $matches[0][0];
Expected Results
1+1-1*1/1
Actual Results
1+1
Remove the U flag. It's causing the the + to be nongreedy in its matching. Also, you don't need commas between characters in your character list. (You only need 1 , if you're trying match it. You do need to escape - so that it doesn't think you're trying to make a range
The problem here is that your regex does mix up quite a few unescaped metacharacters. In your character class you have [+,-,*,/,0-9]. You do not need to separate different characters with commas, that will only tell the regex-engine to include commas in your expression. Furthermore, you need to escape the -, as it has a special meaning inside the character class. As it is, it will be interpreted as 'characters from "," to "," instead of the literal character "-". A similar problem exists with the "/"-character. The expression \d[+\-*/0-9]+\d should do the trick.
Didn't test it with your code but should work :)
((?:[0-9]+[\+|\-|\*|\/]?)+)
More in details, if you want to understand my pattern : https://regex101.com/r/mF0zO8/2
Related
Could you help me with PHP function/regex that in given text finds all words starting with character ":" ?
..in other words all substrings that start with ":" and are separated with " " (a space)
Since :word should probably be valid, and I guess :word:another should be considered two words, then you cannot say that there is always a space.
Words in natural languages can be followed by dots and other characters.
In digital input, they can be followed by end of line.
I suggest using this regexp:
~:\w+~
It takes any : character followed by at least one alpha character and will end at any character that is not valid letter.
Example: on RegExr.com
You can also try ~:\w+\b~, where \b is word boundary (literally end of word), but I see it not necessary here.
Note: \w stands for [a-zA-Z0-9_] meaning it catches underscores _ and digits 0-9 as well. It works pretty much like variable/function naming in PHP
EDIT (some notes on usage):
You said that in given text (I understand that like input with random things) you want to extract all words prepended with :, for example :word. To do that easily, you should use preg_match_all() function with PREG_PATTERN_ORDER flag.
Example:
$regex = '~(:\w+)~';
if (preg_match_all($regex, $input, $matches, PREG_PATTERN_ORDER)) {
foreach ($matches[1] as $word) {
echo $word .'<br/>';
}
}
regex: /:\w+\s/g
\w Matchs any word character
\s Matchs whitespace character
This would work:
preg_match('/^:\w*\s$/g', $var);
Sorry, because I don't use PHP. But I suppose that your problem is that PHP would have reserved the character ":" for some reason in its regex implementation ?
Well, in that case, you still can catch any word beginning with ":" and ending with some space this way:
(...)
match('^\x3A[.]*[\s]');
("3A" is hexadecimal value for 58, which is the ASCII code for ":")
This should work, I think...
How can i extract https://domain.com/gamer?hid=.115f12756a8641 from the below string ,i.e from url
rrth:'http://www.google.co',cctp:'323',url:'https://domain.com/gamer?hid=.115f12756a8641',rrth:'https://another.com'
P.s :I am new to regular expression, I am learning .But above string seems to be formatted..so some sort of shortcut must be there.
If your input string is called $str:
preg_match('/url:\'(.*?)\'/', $str, $matches);
$url = $matches[1];
(.*?) captures everything between url:' and ' and can later be retrieved with $matches[1].
The ? is particularly important. It makes the repetition ungreedy, otherwise it would consume everything until the very last '.
If your actual input string contains multiple url:'...' section, use preg_match_all instead. $matches[1] will then be an array of all required values.
Simple regex:
preg_match('/url\s*\:\s*\'([^\']+)/i',$theString,$match);
echo $match[1];//should be the url
How it works:
/url\s*\:\s*: matches url + [any number of spaces] + : (colon)+ [any number of spaces]But we don't need this, that's where the second part comes in
\'([^\']+)/i: matches ', then the brackets (()) create a group, that will be stored separately in the $matches array. What will be matches is [^']+: Any character, except for the apostrophe (the [] create a character class, the ^ means: exclude these chars). So this class will match any character up to the point where it reaches the closing/delimiting apostrophe.
/i: in case the string might contain URL:'http://www.foo.bar', I've added that i, which is the case-insensitive flag.
That's about it.Perhaps you could sniff around here to get a better understanding of regex's
note: I've had to escape the single quotes, because the pattern string uses single quotes as delimiters: "/url\s*\:\s*'([^']+)/i" works just as well. If you don't know weather or not you'll be dealing with single or double quotes, you could replace the quotes with another char class:
preg_match('/url\s*\:\s*[\'"]([^\'"]+)/i',$string,$match);
Obviously, in that scenario, you'll have to escape the delimiters you've used for the pattern string...
I got a few keywords, symbols, letters etc I want to remove from my php string. I'm trying to add it but it doesn't work too well.
$string = preg_replace("/(?![=$'%-mp4mp3])\p{P}/u","", $check['title']);
pretty much I want to to remove word mp3, mp4, ./, apples from the string.
Please help guide me, thanks in advance!
First: [] in regular expression introduces a character class. A hyphen is used to represent a character range between two symbols. So the reason your regular expression would make too many erasures (as I suppose) is because [=$'%-mp4mp3] means =, $, ', everything from % to m (72 characters actually!), p, 3, 4.
Second: your regular expression doesn't grab "bad" characters/keywords. Actually, you erase punctuation after bad characters/keywords, as negative lookahead is meta sequence (it is not included in match).
Change your regex to:
"/[=$'%-]|mp3|mp4/u"
You don't need regex for that.
$string = "Your original string here";
$keywords = array('mp3', 'mp4');
echo str_replace($keywords, '', $string);
I'm having a bit of difficulties converting some regex from being used in preg_match_all to being used in preg_replace.
Basically, via regex only, I would like to match uppercase characters that are preceded by either a space, beginning of text, or a hypen. This is not a problem, I have the following for this which works well:
preg_match_all('/(?<= |\A|-)[A-Z]/',$str,$results);
echo '<pre>' . print_r($results,true) . '</pre>';
Now, what I'd like to do, is to use preg_replace to only return the string with the uppercase characters that match my criteria above. If I port the regex straight into preg_replace, then it obviously replaces the characters I want to keep.
Any help would be much appreciated :)
Also, I'm fully aware regex isn't the best solution for this in terms of efficiency, but nonetheless I would like to use preg_replace.
According to De Morgan's laws,
if you want to keep letters that are
A-Z, and
preceded by [space], \A, or -
then you'd want to remove characters that are
not A-Z, or
not preceded by [space], \A, or -
Perhaps this (replace match with empty string)?
/[^A-Z]|(?<! |\A|-)./
See example here.
I think it will be something like this:
$sString = preg_replace('#.*?(?<= |\A|-)([A-Z])([a-z]+)#m',"$1", $sString);
The strings looks like hyperlinks, such as http://somethings. This is what I need :
I need to check them only if they doesnt start with the character "; I mean, only that characters : if before there aren't characters it must check;
That somethings string means that every kind of characters can be used (of course, is a link) except a whitespace (The end marker link); I know, it's permitted by RFC, but is the only way I know to escape;
these string are previously filtered by using htmlentities($str, ENT_QUOTES, "UTF-8"), that's why every kind of characters can be used. Is it secure? Or I risk problems with xss or html broked?
the occurences of this replacement can me multiple, not only 1, and must be case insenstive;
This is my actual regex :
preg_replace('#\b[^"](((http|https|ftp)://).+)#', '<a class="lforum" href="$1">$1</a>', $str);
But it check only those string that START with ", and I want the opposite. Any helps answering to this question would be good, Thanks!
For both of your cases you'll want lookbehind assertions.
\b(?<!")(\w)\b - negative lookbehind to match only if not preceded by "
(?<=ThisShouldBePresent://)(.*) - positive lookbehind to match only if preceded by the your string.
Something like this: preg_match('/\b[^"]/',$input_string);
This looks for a word-break (\b), followed by any character other than a double quote ([^"]).
Something like this: preg_match('~(((ThisShouldBePresent)://).+)~');
I've assumed the brackets you specified in the question (and the plus sign) were intended as part of the regex rather than characters to search for.
I've also taken #ThiefMaster's advice and changed the delimiter to ~ to avoid having to escape the //.