PHP preg_replace() multiple different matches - php

i am doing a script in php, and i need to use preg_replace or something similar to add some tags in front of and behind matches. For example i have this pattern (regular expression which i am parsing from a file) and text:
$pattern = aa*
$string = "Example, exaaample"
Basicly, what i need is to add some tags in front of and behind all matches, so it will look like this:
"Ex<t>a</t>mple, ex<t>aaa</t>mple
Is there any way how to make this happen? I am pretty sure it's not that complicated but I am stuck on this for quite a while. Thanks

Sure. You can do it like this:
preg_replace("/(aa*)/", "<t>$1</t>")
$1 will be replaced by the matched pattern.

Related

Regular expression to filter links

I am using this regular expression to filter .pdffiles from the webpage:
$regex='|<a.*?href="(.*pdf?)"|';
It does the job if the link is like this:
www.xyz.com/trgrrtr/ghtty.pdf
but if the links are something like this, it is unable to filter:
www.xyz.com/trgrrtr/ghtty.pdf?code=KksRHhdVXAoECBFCVFpeXBsBUgYMDQpxd3J2d3F2fDtzfnFuLiErNXNpIG5kYm16aGhpcmxoa05QV1VKUVFFUxQ%3D
What regular expression I should use to filter out this link from a webpage?
First of all, you need to escape the ? otherwise it just makes the f in front of it optional. Then you could do something like this:
$regex = '|<a.*?href="([^"]*\.pdf\?[^"]*)"|';
The use of the negated character class makes sure that you cannot leave the attribute. (.* could consume the attribute-ending " as well, and go on until " matches another double quote further down the string.)
But I really recommend that you use a DOM parser to find the link-elements first. PHP has a built-in one and there is a very nice and convenient 3rd-party alternative.
The blog post An Improved Liberal, Accurate Regex Pattern for Matching URLs may help.

Problem using regex to remove number formatting in PHP

I'm having this issue with a regular expression in PHP that I can't seem to crack. I've spent hours searching to find out how to get it to work, but nothing seems to have the desired effect.
I have a file that contains lines similar to the one below:
Total','"127','004"','"118','116"','"129','754"','"126','184"','"129','778"','"128','341"','"127','477"','0','0','0','0','0','0
These lines are inserted into INSERT queries. The problem is that values like "127','004" are actually supposed to be 127,004, or without any formatting: 127004. The latter is the actual value I need to insert into the database table, so I figured I'd use preg_replace() to detect values like "127','004" and replace them with 127004.
I played around with a Regular Expression designer and found that I could use the following to get my desired results:
Regular Expression
"(\d+)','(\d{3})"
Replace Expression
$1$2
The line on the top of this post would end up like this: (which is what I am after)
Total','127004','118116','129754','126184','129778','128341','127477','0','0','0','0','0','0
This, however, does not work in PHP. Nothing is being replaced at all.
The code I am using is:
$line = preg_replace("\"(\d+)','(\d{3})\"", '$1$2', $line);
Any help would be greatly appreciated!
There are no delimiters in your regex. Delimiters are required in order for PHP to know what is the pattern to match and what is a pattern modifier (e.g. i - case-insensitive, U - ungreedy, ...). Use a character that doesn't occur in your pattern, typically you'll see a slash '/' used.
Try this:
$line = preg_replace("/\"(\d+)','(\d{3})\"/", '$1$2', $line);
You forgot to wrap your regular expression in front-slashes. Try this instead:
"/\"(\d+)','(\d{3})\"/"
use preg_replace("#\"(\d+)','(\d+)\"#", '$1$2', $s); instead of yours

PHP URL to Link with Regex

I know I've seen this done a lot in places, but I need something a little more different than the norm. Sadly When I search this anywhere it gets buried in posts about just making the link into an html tag link. I want the PHP function to strip out the "http://" and "https://" from the link as well as anything after the .* so basically what I am looking for is to turn A into B.
A: http://www.youtube.com/watch?v=spsnQWtsUFM
B: www.youtube.com
If it helps, here is my current PHP regex replace function.
ereg_replace("[[:alpha:]]+://[^<>[:space:]]+[[:alnum:]/]", "\\0", htmlspecialchars($body, ENT_QUOTES)));
It would probably also be helpful to say that I have absolutely no understanding in regular expressions. Thanks!
EDIT: When I entered a comment like this blahblah https://www.facebook.com/?sk=ff&ap=1 blah I get html like this<a class="bwl" href="blahblah https://www.facebook.com/?sk=ff&ap=1 blah">www.facebook.com</a> which doesn't work at all as it is taking the text around the link with it. It works great if someone only comments a link however. This is when I changed the function to this
preg_replace("#^(.*)//(.*)/(.*)$#",'<a class="bwl" href="\0">\2</a>', htmlspecialchars($body, ENT_QUOTES));
This is the simples and cleanest way:
$str = 'http://www.youtube.com/watch?v=spsnQWtsUFM';
preg_match("#//(.+?)/#", $str, $matches);
$site_url = $matches[1];
EDIT: I assume that the $str had been checked to be a URL in the first place, so I left that out. Also, I assume that all the URLs will contain either 'http://' or 'https://'. In case the url is formatted like this www.youtube.com/watch?v=spsnQWtsUFM or even youtube.com/watch?v=spsnQWtsUFM, the above regexp won't work!
EDIT2: I'm sorry, I didn't realize that you were trying to replace all strings in a whole test. In that case, this should work the way you want it:
$str = preg_replace('#(\A|[^=\]\'"a-zA-Z0-9])(http[s]?://(.+?)/[^()<>\s]+)#i', '\\1\\3', $str);
I am not a regex whizz either,
^(.*)//(.*)/(.*)$
\2
was what worked for me when I tried to use as find and replace in programmer's notepad.
^(.)// should extract the protocol - referred as \1 in the second line.
(.)/ should extract everything till the first / - referred as \2 in the second line.
(.*)$ captures everything till the end of the string. - referred as \3 in the second line.
Added later
^(.*)( )(.*)//(.*)/(.*)( )(.*)$
\1\2\4 \7
This should be a bit better, but will only replace just 1 URL
The \0 is replaced by the entire matched string, whereas \x (where x is a number other than 0 starting at 1) will be replaced by each subpart of your matched string based on what you wrap in parentheses and the order those groups appear. Your solution is as follows:
ereg_replace("[[:alpha:]]+://([^<>[:space:]]+[:alnum:]*)[[:alnum:]/]", "\\1
I haven't been able to test this though so let me know if it works.
I think this should do it (I haven't tested it):
preg_match('/^http[s]?:\/\/(.+?)\/.*/i', $main_url, $matches);
$final_url = ''.$matches[1].'';
I'm surprised no one remembers PHP's parse_url function:
$url = 'http://www.youtube.com/watch?v=spsnQWtsUFM';
echo parse_url($url, PHP_URL_HOST); // displays "www.youtube.com"
I think you know what to do from there.
$result = preg_replace('%(http[s]?://)(\S+)%', '\2', $subject);
The code with regex does not work completely.
I made this code. It is much more comprehensive, but it works:
See the result here: http://cht.dk/data/php-scripts/inc_functions_links.php
See the source code here: http://cht.dk/data/php-scripts/inc_functions_links.txt

Simple PHP Regex

I am setting up a Zend_Route (but it is still just a regex) and I wish to match a url like
/en/experience/this-is-my-name-and-the-last-is-1-of-id-123456.html
So I want to grab the
this-is-my-name-and-the-last-is-1-of
and the
123456
I tried
\w{2}/experience/(.+)?-(\d+)\.html
but that doesn't seem to work.
It would be easy if the other way around e.g. if it was id the name
/en/experience/123456-this-is-my-name-and-the-last-is-1-of-id.html
I could use
\w{2}/experience/(\d+)-(.+)\.html
But that is a cop out - so any advice on how to match original format?
Try this one:
/\w{2}/experience/(.+?)-(\d+)\.html
try this:
/\w{2}/experience/(.+)?-(\d+)\.html
zend route internally does this:
preg_match('#^/\w{2}/experience/(.+)?-(\d+)\.html$#i', '/en/experience/this-is-my-name-and-the-last-is-1-of-id-123456.html', $matches);
so, your pattern only matches with a slash on the beginning.

Regex equals condition except for certain condition

I have written the following Regex in PHP for use within preg_replace().
/\b\S*(.com|.net|.us|.biz|.org|.info|.xxx|.mx|.ca|.fr|.in|.cn|.hk|.ng|.pr|.ph|.tv|.ru|.ly|.de|.my|.ir)\S*\b/i
This regex removes all URLs from a string pretty effectively this far (though I am sure I can write a better one). I need to be able to add an exclusion though from a specific domain. So the pseudo code will look like this:
IF string contains: .com or .net or. biz etc... and does not contain: foo.com THEN execute condition.
Any idea on how to do this?
Just add a negative lookahead assertion:
/(?<=\s|^)(?!\S*foo\.com)\S*\.(com|net|us|biz|org|info|xxx|mx|ca|fr|in|cn|hk|ng|pr|ph|tv|ru|ly|de|my|ir)\S*\b/im
Also, remember that you need to escape the dot - and that you can move it outside the alternation since each of the alternatives starts with a dot.
Use preg_replace_callback instead.
Let your callback decide whether to replace.
It can give more flexibility if the requirements become too complicated for a simple regex.

Categories