Regular expressions in PHP behave strangely - php

So, I've got some kind of database and I use regular expressions to process all those lines. But the problem is there may be no or not single '#' symbol in email section. I decided to put # before domain(there are not many of them) and then just remove all the #'s I don't need.
I use some online regular expressions constructors - like this one http://www.phpliveregex.com/ . I got following regular expression for putting # before domain:
preg_replace("/(dodgit|trashymail|pookmail|spambob|mailinator)/", "#$1", $myline);
But it just doesn't work. For example:
CynthiaELopezdodgit.com
doesn't change after this script.
What can be wrong? I'm new to PHP so sorry if the problem is really stupid :)

You need to get the return value
$newLine = preg_replace("/(dodgit|trashymail|pookmail|spambob|mailinator)/", "#$1", $myline);
$newLine will contain the email with the #, $myline will continue to hold the one without. preg_replace does not mutate the original variable

Your regex works fine. I would check to make sure you're looking at the right variable. preg_replace doesn't overwrite the variable, but instead returns it.
For a working example: http://codepad.org/PSxK7Jtv

Related

Dreamweaver Regex Find and Replace Using Regular Expression

I am using a Regular Expression to perform a find and replace with dreamweaver. I am running into some difficulty. This is what I have in my page (note that there is a syntax error because I need an additional parenthesis at the end of the string).
$email=htmlspecialchars(mysql_real_escape_string($_POST['email']);
$name=htmlspecialchars(mysql_real_escape_string($_POST['name']);
I am trying to performa a find and replace that will produce this:
$email=htmlspecialchars(mysql_real_escape_string($_POST['email']));
$name=htmlspecialchars(mysql_real_escape_string($_POST['name']));
This is what I am using to perform the find. It seems to be replacing too much text (it starts with the $_POST from the $email variable, but continues all the way down to the $_POST for the $name variable)
Find: \$_POST['([^<]*)']
Replace: $_POST['$1'])
I end up with this:
$email=htmlspecialchars(mysql_real_escape_string($_POST['email']);
$name=htmlspecialchars(mysql_real_escape_string($_POST['name']));
As you can see, it only fixes the last instance (this is because the find function is selecting both lines from $_POST['email'] all the way to $_POST['name']). Any ideas on how to fix this? Thank you!
Add a question mark to make it non-greedy. Also, you need to escape the [ and ] characters that you want to match.
Find: \$_POST\['([^<]*?)'\]
Replace: $_POST['$1'])
Or, alternatively, user a ' character instead of a < character to match the value within the quotes:
Find: \$_POST\['([^']*)'\]
Replace: $_POST['$1'])

Regular expressions that removes only first "/"

I'm new to Regular expressions and can't seem to find out how I have to solve this:
I need a regular expressions that "allows" only numbers, letters and /. I wrote this:
/[^a-zA-Z0-9/]/g
I think it's possible to strip the first / off, but don't know how.
so #/register/step1 becomes register/step1
Who knows how I could get this result?
Thanks!
You can use a non-global match, if the pattern is contiguous in the string:
var rx=/(([a-zA-Z0-9]+\/*)+)/;
var s='#/register/step1';
var s1=(s.match(rx) || [])[0];
alert(s1)>>> returned value: (String) "register/step1"
"/register/step1".match(/[a-zA-Z0-9][a-zA-Z0-9/]*/); // ["register/step1"]
\w is Equivalent to [^A-Za-z0-9_], so:
"/register/step1".match(/\w[\w/]*/); // ["register/step1"]
edit: Don't know why i didn't suggest this first, but if you're simply enforcing the pattern rather than replacing, you could just replace that slash (if it exists) before checking the pattern, using strpos(), substr(), or something similar. If you are using a preg_replace() already, then you should look at the examples on the function docs, they are quite relevant

Problem using regex to remove number formatting in PHP

I'm having this issue with a regular expression in PHP that I can't seem to crack. I've spent hours searching to find out how to get it to work, but nothing seems to have the desired effect.
I have a file that contains lines similar to the one below:
Total','"127','004"','"118','116"','"129','754"','"126','184"','"129','778"','"128','341"','"127','477"','0','0','0','0','0','0
These lines are inserted into INSERT queries. The problem is that values like "127','004" are actually supposed to be 127,004, or without any formatting: 127004. The latter is the actual value I need to insert into the database table, so I figured I'd use preg_replace() to detect values like "127','004" and replace them with 127004.
I played around with a Regular Expression designer and found that I could use the following to get my desired results:
Regular Expression
"(\d+)','(\d{3})"
Replace Expression
$1$2
The line on the top of this post would end up like this: (which is what I am after)
Total','127004','118116','129754','126184','129778','128341','127477','0','0','0','0','0','0
This, however, does not work in PHP. Nothing is being replaced at all.
The code I am using is:
$line = preg_replace("\"(\d+)','(\d{3})\"", '$1$2', $line);
Any help would be greatly appreciated!
There are no delimiters in your regex. Delimiters are required in order for PHP to know what is the pattern to match and what is a pattern modifier (e.g. i - case-insensitive, U - ungreedy, ...). Use a character that doesn't occur in your pattern, typically you'll see a slash '/' used.
Try this:
$line = preg_replace("/\"(\d+)','(\d{3})\"/", '$1$2', $line);
You forgot to wrap your regular expression in front-slashes. Try this instead:
"/\"(\d+)','(\d{3})\"/"
use preg_replace("#\"(\d+)','(\d+)\"#", '$1$2', $s); instead of yours

PHP URL to Link with Regex

I know I've seen this done a lot in places, but I need something a little more different than the norm. Sadly When I search this anywhere it gets buried in posts about just making the link into an html tag link. I want the PHP function to strip out the "http://" and "https://" from the link as well as anything after the .* so basically what I am looking for is to turn A into B.
A: http://www.youtube.com/watch?v=spsnQWtsUFM
B: www.youtube.com
If it helps, here is my current PHP regex replace function.
ereg_replace("[[:alpha:]]+://[^<>[:space:]]+[[:alnum:]/]", "\\0", htmlspecialchars($body, ENT_QUOTES)));
It would probably also be helpful to say that I have absolutely no understanding in regular expressions. Thanks!
EDIT: When I entered a comment like this blahblah https://www.facebook.com/?sk=ff&ap=1 blah I get html like this<a class="bwl" href="blahblah https://www.facebook.com/?sk=ff&ap=1 blah">www.facebook.com</a> which doesn't work at all as it is taking the text around the link with it. It works great if someone only comments a link however. This is when I changed the function to this
preg_replace("#^(.*)//(.*)/(.*)$#",'<a class="bwl" href="\0">\2</a>', htmlspecialchars($body, ENT_QUOTES));
This is the simples and cleanest way:
$str = 'http://www.youtube.com/watch?v=spsnQWtsUFM';
preg_match("#//(.+?)/#", $str, $matches);
$site_url = $matches[1];
EDIT: I assume that the $str had been checked to be a URL in the first place, so I left that out. Also, I assume that all the URLs will contain either 'http://' or 'https://'. In case the url is formatted like this www.youtube.com/watch?v=spsnQWtsUFM or even youtube.com/watch?v=spsnQWtsUFM, the above regexp won't work!
EDIT2: I'm sorry, I didn't realize that you were trying to replace all strings in a whole test. In that case, this should work the way you want it:
$str = preg_replace('#(\A|[^=\]\'"a-zA-Z0-9])(http[s]?://(.+?)/[^()<>\s]+)#i', '\\1\\3', $str);
I am not a regex whizz either,
^(.*)//(.*)/(.*)$
\2
was what worked for me when I tried to use as find and replace in programmer's notepad.
^(.)// should extract the protocol - referred as \1 in the second line.
(.)/ should extract everything till the first / - referred as \2 in the second line.
(.*)$ captures everything till the end of the string. - referred as \3 in the second line.
Added later
^(.*)( )(.*)//(.*)/(.*)( )(.*)$
\1\2\4 \7
This should be a bit better, but will only replace just 1 URL
The \0 is replaced by the entire matched string, whereas \x (where x is a number other than 0 starting at 1) will be replaced by each subpart of your matched string based on what you wrap in parentheses and the order those groups appear. Your solution is as follows:
ereg_replace("[[:alpha:]]+://([^<>[:space:]]+[:alnum:]*)[[:alnum:]/]", "\\1
I haven't been able to test this though so let me know if it works.
I think this should do it (I haven't tested it):
preg_match('/^http[s]?:\/\/(.+?)\/.*/i', $main_url, $matches);
$final_url = ''.$matches[1].'';
I'm surprised no one remembers PHP's parse_url function:
$url = 'http://www.youtube.com/watch?v=spsnQWtsUFM';
echo parse_url($url, PHP_URL_HOST); // displays "www.youtube.com"
I think you know what to do from there.
$result = preg_replace('%(http[s]?://)(\S+)%', '\2', $subject);
The code with regex does not work completely.
I made this code. It is much more comprehensive, but it works:
See the result here: http://cht.dk/data/php-scripts/inc_functions_links.php
See the source code here: http://cht.dk/data/php-scripts/inc_functions_links.txt

How can I match everything with a PHP regular expression?

How can I match everything with a PHP regular expression? I tried: /[.\r\n]*/, but it isn't working. Any ideas? Thanks.
This is for a method I made for a PHP class to parse e-mails:
public function getHeader($headerName) {
preg_match('/[\r\n]' . $headerName . '[:][ ](.+)[\r\n][^ \t]/Uis', "\n" . ltrim($this->originalMessage), $matches); return preg_replace('/[\r\n]*/', '', $matches[1]);
}
/.*/s (see perl's docs). The s option means (quoting from that URL):
Treat string as single line. (Make . match a newline)
I assume, based on your inclusion of \n and \r above, that you want to match across multiple lines. In this case, use:
/.*/s
(note the explicit /s modifier, that is, change . to match any character whatsoever, even a newline, which it normally would not match.)
See http://www.perl.com/doc/manual/html/pod/perlre.html
Why do you want to match everything? There's no point in using it as a condition because it's always true. If you want to capture the text you don't need a regex to do it because you just use the entire string. If you're trying to get around taint-checking, then shame on you (and ask a separate question about doing that right).
Note that we have a bit of the XY Problem here. You have some task X in mind, and think Y is part of the solution. You ask about Y but never tell us X. It's hard to answer your real question when we don't know what you are trying to do. :)
What about /.*/s?
In a character class ( the [] ), . just means period.
Does /[\.\r\n]+/ do what you want?
This kludge has also worked for me before:
my $abstract_text = /Abstract:([\s\S]+?)\nReferences/m;
It's useful if you want to capture patterns with arbitrary text included or intervening between multiple captures.

Categories