php regex remove inline comment only - php

I have simple code look like this
function session(){
return 1; // this default value for session
}
I need regex or code to remove the comment // this is default value for session, And only remove this type of comment, which starts by a space or two or more, then //, then a newline after it.
All other types of comment and cases are ignored.

UPDATED (1)
And only remove this type of comment, which starts by a space or two or more, then //, then a newline after it
Try this one:
regex101 1
PHP Fiddle 1 -hit "run" or F9 to see the result
/\s+\/\/[^\n]+/m
\s+ starts by a space or two or more
\/\/ the escaped //
[^\n]+ anything except a new line
UPDATE: to make sure -kinda-this only applied to code lines, we can make use of the lookbehind (2) regex to check if there is a semicolon ; before the space[s] and the comment slashes //, so the regex will be this:
regex101 2
PHP Fiddle 2
/(?<=;)\s+\/\/[^\n]+/m
where (?<=;) is the lookbehind which basically tells the engine to look behind and check if there's a ; before it then match.
-----------------------------------------------------------------------
(1) The preg_replace works globally, no need for the g flag
(2) The lookbehind is not supported in javascript

A purely regex solution would look something like this:
$result = preg_replace('#^(.*?)\s+//.*$#m', '\1', $source);
but that would still be wrong because you could get trapped by something like this:
$str = "This is a string // that has a comment inside";
A more robust solution would be to completely rewrite the php code using token_get_all() to actually parse the PHP code into tokens that you can then selectively remove when you re-emit the code:
foreach(token_get_all($source) as $token)
{
if(is_array($token))
{
if($token[0] != T_COMMENT || substr($token[1] != '//', 0, 3))
echo $token[1];
}
else
echo $token;
}

Related

Allow space into my Regex

I cannot find a way to allow a space in this regex for extract between title tag
<title>my exemple</title>
here is the regex
$pattern = "/<title>(.+)<\/title>/i";
I tried
/<title>(.+)<\/title>/i\s
/<title>(.+)<\/title>/i\S
/<title>\s(.+)<\/title>/i
/<title>(.+)\s<\/title>/i
here is the full fonction
function getSiteTitle(){
$RefURL = (is_null($_SERVER['HTTP_REFERER'])) ? 'Un know' : $_SERVER['HTTP_REFERER'];
if($RefURL != 'Un know'){
$con = file_get_contents($RefURL) or die (" can't open URL referer ");
$pattern = "/<title>(.+)<\/title>/i";
preg_match($pattern,$con,$match);
$result = array($match[1],$RefURL);
return $result;
i have verified that i receive a keyword in my referer , because it work petty well with keywords without space
thx you
If you want to capture HTML on multiple lines (is that what you mean by "spaces"?), you'll need to turn on the s modifier, which allows the . character to match newline characters, as well.
This should work:
/<title>(.+)<\/title>/is
How about
$pattern = "/<title>\s*(.+)\s*<\/title>/i";
then the first capturing group will contain only the keyword, which may contain spaces, like:
<title> key word </title>
// result is "key word"
add the s modifier to the end (/.../is) if you want to allow newlines inside title as well.
If I got what you want right, you could also use this approach:
$pattern = "/<title>(.+)<\/title>/is";
and then trim the first capturing group.
Selecting text between title text and the tags as well:
/<title>(.+)<\/title>/
Doing the same even if they are spread over multiple lines:
/<title>(.+)<\/title>/s
Doing the same as above but ignoring cases (lower or upper case doesn't matter)
/<title>(.+)<\/title>/is
Now we are using lookbehind and lookahead in order to only select the text between the tags:
/(?<=<title>)(.+)(?=<\/title>)/is
Please change the flags (i and s) the way you need them.
If that doesn't solve your problem I don't know what will :)
Here you can see an example of how my last regex would work: http://regexr.com?37ukf
EDIT:
Ok, try to test this code somehere:
<?php
$title = '<title> My Example </title>';
preg_match('/(?<=<title>)(.+)(?=<\/title>)/is', $title, $match);
var_dump($match);
?>
You'll see that it works perfectly fine. Now with this knowledge go ahead and check if $con truly looks the way you think it should. And do a var_dump of your $matches instead of looking for specific indices.

regex to clean up url

I am looking for a way to get a valid url out of a string like:
$string = 'http://somesite.com/directory//sites/9/my_forms/3-895a3e/somefilename.jpg|:||:||:||:|19845';
My original solution was:
preg_match('#^[^:|]*#', str_replace('//', '/', $string), $modifiedPath);
But obviously its going to remove a slash from the http:// instead of the one in the middle of the string.
My expected output that I want from the original is:
http://somesite.com/directory/sites/9/my_forms/3-895a3e/somefilename.jpg
I could always break off the http part of the string first but would like a more elegant solution in the form of regex if possible. Thanks.
This will do exactly what you are asking:
<?php
$string = 'http://somesite.com/directory//sites/9/my_forms/3-895a3e/somefilename.jpg|:||:||:||:|19845';
preg_match('/^([^|]+)/', $string, $m); // get everything up to and NOT including the first pipe (|)
$string = $m[1];
$string = preg_replace('/(?<!:)\/\//', '/' ,$string); // replace all occurrences of // as long as they are not preceded by :
echo $string; // outputs: http://somesite.com/directory/sites/9/my_forms/3-895a3e/somefilename.jpg
exit;
?>
EDIT:
(?<!X) in regular expressions is the syntax for what is called a lookbehind. The X is replaced with the character(s) we are testing for.
The following expression would match every instance of double slashes (/):
\/\/
But we need to make sure that the match we are looking for is NOT preceded by the : character so we need to 'lookbehind' our match to see if the : character is there. If it is then we don't want it to be counted as a match:
(?<!:)\/\/
The ! is what says NOT to match in our lookbehind. If we changed it to (?=:)\/\/ then it would only match the double slashes that did have the : preceding them.
Here is a Quick tutorial that can explain it all better than I can lookahead and lookbehind tutorial
Assuming all your strings are in the form given, you don't need any but the simplest of regexes to do this; if you want an elegant solution, then a regex is definitely not what you need. Also, double slashes are legal in a URL, just like in a Unix path, and mean the same thing a single slash does, so you don't really need to get rid of them at all.
Why not just
$url = array_shift(preg_split('/\|/', $string));
?
If you really, really care about getting rid of the double slashes in the URL, then you can follow this with
$url = preg_replace('/([^:])\/\//', '$1/', $url);
or even combine them into
$url = preg_replace('/([^:])\/\//', '$1/', array_shift(preg_split('/\|/', $string)));
although that last form gets a little bit hairy.
Since this is a quite strictly defined situation, I'd consider just one preg to be the most elegant solution.
From the top of my head:
$sanitizedURL = preg_replace('~((?<!:)/(?=/)|\\|.+)~', '', $rawURL);
Basically, what this does is look for any forward slash that IS NOT preceded by a colon (:), and IS followed bij another forward slash. It also searches for any pipe character and any character following it.
Anything found is removed from the result.
I can explain the RegEx in more detail if you like.

preg_replace returns unexpected results to $1

<?php
$data='123
[test=abc]cba[/test]
321';
$test = preg_replace("(\[test=(.+?)\](.+?)\[\/test\])is","$1",$data);
echo $test;
?>
I expect the above code to return
abc
but instead of returning abc it returns
123 abc 321
Please tell me what I am doing wrong.
You're only replacing the matched part (the BBcode section). You're leaving the rest of the string untouched.
If you also want to remove the leading/trailing text, include those in the expression:
$test = preg_replace("(.*\[test=(.+?)\](.+?)\[\/test\].*)is","$1",$data);
I don't know if you're aware of this, but the outermost set of parentheses in your regex does not form a group (capturing or otherwise). PHP is interpreting them as regex delimiters. If you are aware of that, and you're using them as delimiters on purpose, please don't. It's usually best to use a non-bracketing character that never has any special meaning in regexes (~, %, #, etc.).
I agree with Casimir that preg_match() is the tool you should be using, not preg_replace(). But his solution is trickier than it needs to be. Your original regex works fine; all you have to do is grab the contents of the first capturing group, like so:
if (preg_match('%\[test=(.+?)\](.+?)\[/test\]%si', $data, $match)) {
$test = $match[1];
}
You don't need to use a replace here, all that you need is to take something in the string. To do that preg_match is more useful:
$data='123
[test=abc]cba[/test]
321';
$test = preg_match('~\[test=\K[^\]]++~', $data, $match);
echo $match[0];

Regular Expressions, detecting a text pattern

I'm interacting with my users via SMS, if they send me an SMS with this pattern, I need to perform an action:
Pattern:
*TEXT*TEXT*TEXT#
In TEXT all the characters are allowed, so I have made this regex:
if (preg_match('/^\*([^*]*)\*([^*]*)\*([^#]*)\#$/', $text)){
// perform the action...
}
The above regex works actually good, but it's not allowing next lines after #, for example:
'*hello there*how you doing!?* and blah#' pass the regex, but:
'*hello there*how you doing!?* and blah#
'
is not passed by the above regex(pay attention to the next lines after #)
So I decided to:
$text = str_replace("\n\r", '', $text);
But the above example is still not passed :-(
How should I allow next lines here in the regex? or get rid of them?
Thanks for your help
To allow for optional spaces and/or newlines after the hash:
if (preg_match('/^\*([^*]*)\*([^*]*)\*([^#]*)\#\s*$/', $text)){
I've added \s* as the expression just before the end-of-subject.
You can also use trim beforehand:
if (preg_match('/^\*([^*]*)\*([^*]*)\*([^#]*)\#$/', trim($test))){
Update
As an added requirement that text between the stars cannot be empty:
if (preg_match('/^\*([^*]+)\*([^*]+)\*([^#]+)\#$/', trim($test))){
Oops error, like #Jack said we must add \s* and forget about "m".
I changed * to + so that it returns false when there's nothing between ** :
$text = "*fddsfdsf*dfdfd*5f8ssfdssf8#
";
if (preg_match('/^\*([^*]+)\*([^*]+)\*([^#]+)\#\s*$/', $text)){
echo "yes";
}else{
echo "no";
}

RegEx to match specific words unless it's the last word in a sentence (titleize)

i'm capitalizing all words, and then lowercasing words like a, of, and.
the first and last words should remain capitalized.
i've tried using \s instead of \b, and that caused some strange issues.
i've also tried [^$] but that doesn't seem to mean "not end of string"
function titleize($string){
return ucfirst(
preg_replace("/\b(A|Of|An|At|The|With|In|To|And|But|Is|For)\b/uie",
"strtolower('$1')",
ucwords($string))
);
}
this is the only failing test i'm trying to fix. the "in" at the end should remain capitalized.
titleize("gotta give up, gotta give in");
//Gotta Give Up, Gotta Give In
these tests pass:
titleize('if i told you this was killing me, would you stop?');
//If I Told You This Was Killing Me, Would You Stop?
titleize("we're at the top of the world (to the simple two)");
//We're at the Top of the World (to the Simple Two)
titleize("and keep reaching for those stars");
//And Keep Reaching for Those Stars
You apply ucwords() before sending the string to the regex-replace, and then again ucfirst after returning from regex (for words appearing at the start of the string). This can be shortened by the convention that every word at the start and at the end of your string is not surrounded by whitespaces. Using this convention, we can use a regex like '/(?<=\s)( ... )(?=\s)/'. This will simplify your function somehow:
function titleize2($str) {
$NoUc = Array('A','Of','An','At','The','With','In','To','And','But','Is','For');
$reg = '/(?<=\s)(' # set lowercase only if surrounded by whitespace
. join('|', $NoUc) # add OR'ed list of words
. ')(?=\s)/e'; # set regex-eval mode
return preg_replace( $reg, 'strtolower("\\1")', ucwords($str) );
}
If tested with:
...
$Strings = Array('gotta give up, gotta give in',
'if i told you this was killing me, would you stop?',
'we\'re at the top of the world (to the simple two)',
'and keep reaching for those stars');
foreach ($Strings as $s)
print titleize2($s) . "\n";
...
... this will return the correct results.
Try this regex:
/\b(A|Of|An|At|The|With|In|To|And|But|Is|For)(?!$)\b/uie
The negative lookahead (?!$) excludes matches where a endofline follows.
Adding a negative lookahead for the end of line (?!$) should do what you want
function titleize($string){
return ucfirst(
preg_replace("/\b(A|Of|An|At|The|With|In|To|And|But|Is|For)\b(?!$)/uie",
"strtolower('$1')",
ucwords(inflector::humanize($string)))
);
}

Categories