I am looking for a regular expression in php to parse a string of the following pattern. The command are wrapped by double square bracket as
[[a src="" desc=""]]
where a, src and desc are the keywords (won't be changed). src must be given but desc is optional, the value of src or desc can be wrapped by double or single quote. And src and desc could be given in any order. For example, the following patterns are all valid
[[a src="http://a.c.d" desc ="hello"]]
[[a src ="http://a.c.d" desc= 'hello']]
[[a desc ="hello " src= 'http://a.c.d' ]]
[[a src = "http://a.c.d" ]]
[[a src="http://a.c.d" desc ="hello"]]
any space between value and 'a', 'src', 'desc', '=' (without quotation) should be ignored. I am going to replace this command with html tag like
SOMETHING_EXTRACT_FROM_DESC
It seems pretty tough to think of one regex to do the work. Now I have 3 regex setup to handle difference cases separately. It looks like this
$pattern = '/\[\[a[:blank:]+src[:blank:]*=[:blank:]*"(.*?)"[:blank:]+desc[:blank:]*=[:blank:]+"(.*?)"\]\]/i';
$rtn = preg_replace($pattern, '${2}', $src);
$pattern = '/\[\[a[:blank:]+desc[:blank:]*=[:blank:]*"(.*?)"[:blank:]+src[:blank:]*=[:blank:]+"(.*?)"\]\]/i';
$rtn = preg_replace($pattern, '${2}', $rtn);
$pattern = '/\[\[a[:blank:]+src[:blank:]*=[:blank:]+"(.*?)"\]\]/i';
$rtn = preg_replace($pattern, '${2}', $rtn);
But this doesn't work, regular expression is hard to learn :(
I wrote a regular expression that matches everything you requested, but allows a bit of an overhead I''ll explain at the end. But first the regex:
Looks like this:
\[\[a(\s+(src|desc)\s*=\s*('[^']*'|"[^"]*")){1,2}\s*\]\]
I'll brake it down so you can understand it:
\[\[ ... \]\] matches [[ ... ]], the beginning and ending
\s matches any whitespace (space and tab), \s+ expects at least one
(src|desc) matches either the string src or the string desc. It's an OR operator: match src OR desc.
'[^']*' matches two single quotes and anything in between that is not a single quote
"[^"]*" same with double quotes
('[^']*'|"[^"]*") matches one of the above two
(src|desc)\s*=\s*('[^']*'|"[^"]*") matches a token like src='something'
{1,2} matches something once or twice, appending to the above expression, metches one or two of those tokens
And that's pretty much it. The only problem is that it will also match this:
[[a src="http://a.c.d" src="http://a.c.d"]]
Which I think is a mismatch. If it doesn't bother you, you're good to go, otherwise you'll need to change the whole concept of using a big atom with ors (i.e.: |) and take a different approach. You could use look-aheads for example. But it will get real nasty pretty fast.
You can test it online HERE
The regex is much more readable if I remove the backslashes and the \s stuffs. This won't work, but I think it will help you understand it:
[[a ( (src|desc)=('[^']*'|"[^"]*") ){1,2} ]]
Related
Here is my code
<img src="folder/img1.jpg?somestring">
<img src="folder/img2.jpg?somediffstring">
want to replace somestring & somediffstring with another string in whole html. please suggest some regular expression with php.
example
change to using regular expression or anything
First of all, you shouldn't parse HTML with Regular Expressions.
Solution 1
Now, if you are exclusively parsing img tags, you could come up with a satisfying enough solution like this:
(\b\.jpg|\b\.png)\?(.*?)\"
That is:
(\b\.jpg|\b\.png) # 1st Capturing Group
\b\.jpg # 1st Alternative: match ``.jpg`` literally
\b\.png # 2nd Alternative: match ``.png`` literally
\? # Match the character ? literally
(.+?) # 2nd Capturing Group
.+? # Match any character between one and unlimited times,
# as few times as possible, expanding as needed.
\" # Match the character " literally
Problem
What's the problem? We are not checking if we are inside an img tag. This will match everywhere in the HTML.
Solution 2
Let's add the check for img > src:
<img.+?src=\".*?(\b\.jpg|\b\.png)\?(.+?)\"
That is:
<img # Match ``<img`` literally
.+? # Match any character between one and unlimited times,
# as few times as possible, expanding as needed.
# Needed in case there are rel or alt options inside the img tag.
src=\" # Match ``src="`` literally
... # The rest is same as before.
Problem
Does this really do its job? Apparently yes, but in reality no.
Consider the following HTML code
<img src="" />
<div style="background-image: url(../images/test-background.jpg?)">
blah blah
</div>
It shouldn't match right? But it does (if you remove line-breaks). The regular expression above starts the match at <img src=", and will stop at "> of the div tag. The capturing group will contain the characters between ? and ": ), substituting it will break the HTML.
This was just an example, but many other situations will match even if they should not.
Other solutions...?
No matter how many constraints you can add to your RegEx and how sophisticated it becomes... HTML is a Context-Free Language and it can't be captured by a Regular Expression, which only recognizes Regular Languages.
In PHP
Still sure you're gonna use Regular Expressions? Alright, then your PHP function is preg_replace. You only need to keep in mind that it will replace everything that matched, not only the capturing groups. Hence, you need to wrap what you want to "remember" into another capturing group:
$str = '<img src="folder/img1.jpg?foo">';
$pattern = '/(<img.+?src=\".*?(\b\.jpg|\b\.png)\?)(.+?)(\")/';
$replacement = '$1' . 'bar' . '$4';
$str_replaced = preg_replace($pattern, $replacement, $str);
// Now you have $str_replaced = '<img src="folder/img1.jpg?bar">';
With reference to this How can I use the captured group in the same regex
suppose u wanna change img1.jpg?somestring to img1.jpg?somestringAAA
and img2.jpg?somediffstring to img2.jpg?somediffstringAAA
Search for: src="([a-zA-Z.0-9_]*)[?]([a-zA-Z.0-9_]*)">
Replace with: src="$1?$2AAA">
here $1 represents whatever is inside first round paranthesis () , i.e., img1.jpg
and $2 represents second paranthesis
UPDATE:
$string = 'img1.jpg?somestring';
$pattern = '/([a-zA-Z.0-9_]*)[?]([a-zA-Z.0-9_]*)/i';
$replacement = '$1?$2AAA';
echo preg_replace($pattern, $replacement, $string);
You can do it in this way :
<?php
$url_value = "folder/img2.jpg?somediffstring";
echo $url =substr($url_value, 0, strpos($url_value, "?"));
?>
you can use the regex \?(\w*)"
if u want to replace somestring and somediffstring with xx then u can replace it with regex \?(\w*)" and value as ?xx
https://regex101.com/r/S5pPuW/1
I'm trying to make a search string, which can accept a query like this:
$string = 'title -launch category:technology -tag:news -tag:"outer space"$';
Here's a quick explanation of what I want to do:
$ = suffix indicating that the match should be exact
" = double quotes indicate that the multi-word is taken as a single keyword
- = a prefix indicating that the keyword is excluded
Here's my current parser:
$string = preg_replace('/(\w+)\:"(\w+)/', '"${1}:${2}', $string);
$array = str_getcsv($string, ' ');
I was using this above code before, but it doesn't work as intended with the keywords starting on searches like -tag:"outer space". The code above doesn't recognize strings starting with - character and breaks the keyword at the whitespace between the outer and space, despite being enclosed with double quotes.
EDIT: What I'm trying to do with that code is to preg_replace -tag:"outer space" into "-tag:outer space" so that they won't be broken when I pass the string to str_getcsv().
You may use preg_replace like this:
preg_replace('/(-?\w+:)"([^"]+)"/', '"$1$2"', $str);
See the PHP demo online.
The regex matches:
(-?\w+:) - Capturing group 1: an optional - (? matches 1 or 0 occurrences), then 1+ letters/digits/underscores and a :
" - a double quote (it will be removed)
([^"]+) - Capturing group 2: one or more chars other than a double quote
" - a double quote
The replacement pattern is "$1$2": ", capturing group 1 value,
capturing group 2 value, and a ".
See the regex demo here.
Here's how I did it:
$string = preg_replace('/(\-?)(\w+?\:?)"(\w+)/', '"$1$2$3', $string);
$array = str_getcsv($string, ' ');
I considered formats like -"top ten" for quoted multi-word keywords that doesn't have a category/tag + colon prefix.
I'm sorry for being slow, I'm new on regex, php and programming in general and this is also my first post in stackoverflow. I'm trying to learn it as a personal hobby. I'm glad that I learned something new today. I'll be reading more about regex since it looks like it can do a lot of stuff.
I have txt file with content:
fggfhfghfghf
$config['website'] = 'Olpa';
asdasdasdasdasdas
And PHP script for replacing by preg_replace in file:
write_file('tekst.txt', preg_replace('/\$config\[\'website\'] = \'(.*)\';/', 'aaaaaa', file_get_contents('tekst.txt')));
But it doesn't work exactly what I want it to work.
Because this script replace whole match, and after change it looks like this:
fggfhfghfghf
aaaaaa
asdasdasdasdasdas
And that's bad.
All I want is to not change whole match $config['website'] = 'Olpa'; But to just change this Olpa
As you can see it belongs not to Group 2. of match information.
And all I want is to just change this Group 2. one specific thing.
to finally after script it will look like:
fggfhfghfghf
$config['website'] = 'aaaaaa';
asdasdasdasdasdas
You need to change your preg_replace to
preg_replace('/(\$config\[\'website\'] = \').*?(\';)/', '$1aaaaaa$2', file_get_contents('tekst.txt'))
It means, capture what you need to keep (and then use backreferences to restore the text) and just match what you need to replace.
See the regex demo.
Pattern details:
(\$config\[\'website\'] = \') - Group 1 capturing a literal $config['website'] = ' substring (later referenced to with $1)
.*? - any 0+ chars other than line break chars as few as possible
(\';) - Group 2: a ' followed with ; (later referenced to with $2)
In case your aaa actually starts with a digit, you would need a ${1} backreference.
I have a better, faster, leaner solution for you. No capture groups are required, it only requires careful attention to escaping the single quotes:
Pattern: \$config\['website'] = '\K[^']+
\K means "start the fullstring match here", this combined with the negated character class ([^']+) affords the omission of capture groups.
Pattern Demo (just 25 steps)
PHP Implementation:
$txt='fggfhfghfghf
$config[\'website\'] = \'Olpa\';
asdasdasdasdasdas';
print_r(preg_replace('/\$config\[\'website\'\] = \'\K[^\']+/','aaaaaa',$txt));
Using single quotes around the pattern is crucial so that $config isn't interpreted as a variable. As a result, all of the single quotes inside of the pattern must be escaped.
Output:
fggfhfghfghf
$config['website'] = 'aaaaaa';
asdasdasdasdasdas
i have this 2 string and want to change it to html tags
1 : bq. sometext /* bq.+space+sometext+space or return
in this string.i want to convert it to this that start with bq.+space and end with space or return
<blockquote author="author" timestamp="unix time in secs">sometext</blockquote>
in this string
2: [quote author="author" date="unix time in secs"]
some text
[/quote] /* start with [qoute and get the text of author property then get
sometext form between ']' and '[/qoute]
i want to convert them to this :
<blockquote author="author" timestamp="unix time in secs">sometext</blockquote>
this regext not worked!:
#\bq(.| )(.*?)\n#
You've got your escaping a bit mixed up there. Escaping the b makes it a word boundary. Not escaping the . makes it an arbitrary character, and putting . and the space in an alternation means "either... or...". This regex should take care of your first example:
$str = preg_replace(
'#bq\. (\S+)#',
'<blockquote author="author" timestamp="unix time in secs">$1</blockquote>',
$str
);
The second one will cause you trouble if anyone ever nests this with quote markup. But suppose there are noo other quotes between [quote...] and [/quote], you could use something like this:
$str = preg_replace(
'#\[quote(?=[^\]]*author="([^"]*))(?=[^\]]*timestamp="([^"]*))[^\]]*\](.*?)\[/quote\]#s',
'<blockquote author="$1" timestamp="$2">$3</blockquote>',
$str
);
This uses two lookaheads to find the attributes and captures their values in capturing groups $1 and $2. And all that without advancing the actual position in the string. The good thing about lookaheads is that this works independently of the of the two attributes. Then we match the rest of the opening tag, and then capture as little as possible (.*?) until we encounter [/quote].
Working demo.
Recently, I'm playing with something related to BBCode in phpBB3. When I trace back my database, the posts table and for a random post. I found that the image tag is written this way [img:fcjsgy5j]. There are 8 random characters generated between [img: ... ] for each post.
[img:fcjsgy5j]http://imageurl.jpg[/img]
My question is, how can I make use of preg_replace() to replace the random characters into this way..
<img src="http://imageurl.jpg">
$output = preg_replace("`\[img:.+?\](.*?)\[/img\]`i", '<img src="$1"/>', $input);
[ begins a character set. We don't want that; we want to match the literal [ character, so we have to escape it with a \
. matches any character
+ means we match 1 or more of the previous thing (any character)
? makes the previous quantifier ungreedy (.+ would match everything, right to the very end of the string, that's not what we want, we want it to match as little as possible... just up to the next ]
(.*?) matches all the junk between the [img] tags. Ungreedy again. We put () around it to make it mtaching set
The ` (back-tick) at the start and the end could be any character... whatever character you start with, you have to end with. A lot of people use / but I prefer the back-tick because it rarely appears anywhere inside the regular expression, thus I don't need to escape it.
The i at the very end means The expression will be case insensitive. (will match img, IMG, ImG, etc.)
The $1 in the replace refers back to the () section we denoted earlier... it basically takes whatever was matched there, and plops it into the place of $1
$result = preg_replace('%\[img:[^]]+\]([^[]+)\[/img\]%', '<img src="\1">', $subject);
or, as a commented regex:
$result = preg_replace(
'%\[img: # match [img:
[^]]+ # match one or more non-] characters
\] # match ]
([^[]+) # match one or more non-[ characters
\[/img\] # match [/img]
%x',
'<img src="\1">', $subject);
Try this code :
<?php
$search = array(
'\[img:.+?\](.*?)\[\/img\]\'
);
$replace = array(
'<img src="\\2">'
);
$result = preg_replace($search, $replace, $string);
}
?>
I used the array form of preg_replace so that u can add more search and replace patterns in the future. I think you are trying to replace some BBCODE tags. There is plenty of libraries on the net to handle BBCODE correctly.
Edited
Like this one :
http://php.net/manual/en/book.bbcode.php