Regex Match PHP Comment - php

Ive been trying to match PHP comments using regex.
//([^<]+)\r\n
Thats what ive got but it doesn't really work.
Ive also tried
//([^<]+)\r
//([^<]+)\n
//([^<]+)
...to no avail

In what program are you coding this regex? Your final example is a good sanity check if you're worried that the newline chars aren't working. (I have no idea why you don't allow less-than in your comment; I'm assuming that's specific to your application.)
Try
//[^<]+
and see if that works. As Draemon says, you might have to escape the diagonals. You might also have to escape the parentheses. I can't tell if you know this, but parentheses are often used to enclose capturing groups. Finally, check whether there is indeed at least one character after the double slashes.

To match comments, you have to think there are two types of comments in PHP 5 :
comments which start by // and go to the end of the line
comments that start by /* and go to */
Considering you have these two lines first :
$filePath = '/home/squale/developpement/astralblog/website/library/HTMLPurifier.php';
$str = file_get_contents($filePath);
You could match the first ones with :
$matches_slashslash = array();
if (preg_match_all('#//(.*)$#m', $str, $matches_slashslash)) {
var_dump($matches_slashslash[1]);
}
And the second ones with :
$matches_slashstar = array();
if (preg_match_all('#/\*(.*?)\*/#sm', $str, $matches_slashstar)) {
var_dump($matches_slashstar[1]);
}
But you will probably get into troubles with '//' in the middle of string (what about heredoc syntax, btw, did you think about that one ? ), or "toggle comments" like this :
/*
echo 'a';
/*/
echo 'b';
//*/
(Just add a slash at the begining to "toggle" the two blocks, if you don't know the trick)
So... Quite hard to detect comments with only regex...
Another way would be to use the PHP Tokenizer, which, obviously, knows how to parse PHP code and comments.
For references, see :
token_get_all
List of Parser Tokens
With that, you would have to use the tokenizer on your string of PHP code, iterate on all the tokens you get as a result, and detect which ones are comments.
Something like this would probably do :
$tokens = token_get_all($str);
foreach ($tokens as $token) {
if ($token[0] == T_COMMENT
|| $token[0] == T_DOC_COMMENT) {
// This is a comment ;-)
var_dump($token);
}
}
And, as output, you'll get a list of stuff like this :
array
0 => int 366
1 => string '/** Version of HTML Purifier */' (length=31)
2 => int 57
or this :
array
0 => int 365
1 => string '// :TODO: make the config merge in, instead of replace
' (length=55)
2 => int 117
(You "just" might to strip the // and /* */, but that's up to you ; at least, you have extracted the comments ^^ )
If you really want to detect comments without any kind of strange error due to "strange" syntax, I suppose this would be the way to go ;-)

You probably need to escape the "//":
\/\/([^<]+)

This will match comments in PHP (both /* */ and // format)
/(\/\*).*?(\*\/)|(\/\/).*?(\n)/s
To get all matches, use preg_match_all to get array of matches.

Related

PHP Huffman Decode Algorithm

I applied for a job recently and got sent a hackerrank exam with a couple of questions.One of them was a huffman decoding algorithm. There is a similar problem available here which explains the formatting alot better then I can.
The actual task was to take two arguments and return the decoded string.
The first argument is the codes, which is a string array like:
[
"a 00",
"b 101",
"c 0111",
"[newline] 1001"
]
Which is like: single character, two tabs, huffman code.
The newline was specified as being in this format due to the way that hacker rank is set up.
The second argument is a string to decode using the codes. For example:
101000111 = bac
This is my solution:
function decode($codes, $encoded) {
$returnString = '';
$codeArray = array();
foreach($codes as $code) {
sscanf($code, "%s\t\t%s", $letter, $code);
if ($letter == "[newline]")
$letter = "\n";
$codeArray[$code] = $letter;
}
print_r($codeArray);
$numbers = str_split($encoded);
$searchCode = '';
foreach ($numbers as $number) {
$searchCode .= $number;
if (isset($codeArray[$searchCode])) {
$returnString .= $codeArray[$searchCode];
$searchCode = '';
}
}
return $returnString;
}
It passed the two initial tests but there were another five hidden tests which it did not pass and gave no feedback on.
I realize that this solution would not pass if the character was a white space so I tried a less optimal solution that used substr to get the first character and regex matching to get the number but this still passed the first two and failed the hidden five. I tried function in the hacker rank platform with white-space as input and the sandboxed environment could not handle it anyway so I reverted to the above solution as it was more elegant.
I tried the code with special characters, characters from other languages, codes of various sizes and it always returned the desired solution.
I am just frustrated that I could not find the cases that caused this to fail as I found this to be an elegant solution. I would love some feedback both on why this could fail given that there is no white-space and also any feedback on performance increases.
Your basic approach is sound. Since a Huffman code is a prefix code, i.e. no code is a prefix of another, then if your search finds a match, then that must be the code. The second half of your code would work with any proper Huffman code and any message encoded using it.
Some comments. First, the example you provide is not a Huffman code, since the prefixes 010, 0110, 1000, and 11 are not present. Huffman codes are complete, whereas this prefix code is not.
This brings up a second issue, which is that you do not detect this error. You should be checking to see if $searchCode is empty after the end of your loop. If it is not, then the code was not complete, or a code ended in the middle. Either way, the message is corrupt with respect to the provided prefix code. Did the question specify what to do with errors?
The only real issue I would expect with this code is that you did not decode the code description generally enough. Did the question say there were always two tabs, or did you conclude that? Perhaps it was just any amount of space and tabs. Where there other character encodings you neeed to convert like [newline]? I presume you in fact did need to convert them, if one of the examples that worked contained one. Did it? Otherwise, maybe you weren't supposed to convert.
I had the same question for an Coding Challenge. with some modification as the input was a List with (a 111101,b 110010,[newline] 111111 ....)
I took a different approach to solve it,using hashmap but still i too had only 2 sample test case passed.
below is my code:
public static String decode(List<String> codes, String encoded) {
// Write your code here
String result = "";
String buildvalue ="";
HashMap <String,String> codeMap= new HashMap<String,String>();
for(int i=0;i<codes.size();i++){
String S= codes.get(i);
String[] splitedData = S.split("\\s+");
String value=splitedData[0];
String key=(splitedData[1].trim());
codeMap.put(key, value);
}
for(int j=0;j<encoded.length();j++){
buildvalue+=Character.toString(encoded.charAt(j));
if(codeMap.containsKey(buildvalue)){
if(codeMap.get(buildvalue).contains("[newline]")){
result+="\n";
buildvalue="";
}
else{
result+=codeMap.get(buildvalue);
buildvalue="";
}
}
}
return result.toString();
}
}

Php preg_match doesnt work with variable

I have one array which contains multiple strings. I have another array which contain also strings but they are shorter. My goal is to check is there any partial match in the bigger array for every item from the smaller array. However preg_match doesnt work at all with variables. If I put raw input everything seems fine but otherwise results is false. I have tried almost every possible regex combination but without success. Sample code:
//Lets say $needle is 3333 and bigPatern has 10 records with 10 digits each, for example third record is 5125433331. I want to perform the partial match and get true
$needle = $smlPattern[0]; //debugging with first item from smaller array
$needle2 = "/$needle/"; // I tried [$needle], ^..&, to concatenate and etc
foreach ($bigPatern as $val)
{
if (preg_match($needle2, $val))
{
echo "YES";
}
}
Any tips what Im doing wrong?
Please escape your regex input!
$needle2 = "/".preg_quote($needle,'/')."/"; //
Don't blindly add user input to your regex, much for the same reason you need to escape user input in SQL queries. In regex, the biggest issue is usually the ReDoS problem, where a malicious user can create a specially crafted regex that will use hours, or more, to execute, stealing all the CPU from your server.
Main wrong thing in your example is to use regexp for checking the presence of a string. There is a strpos function for that.
if ( strpos($bigOne, $smallOne) !== false ) {
echo "bigOne contains smallOne";
}
You can even use strpos function to achieve the same purpose. It finds the position of the first occurrence of a substring in a string, and returns false if no match is found.
$needle = $smlPattern[0];
$needle2 = "needle";
foreach ($bigPatern as $val){
if (strpos($val, $needle2) !== false){
echo "YES";
}
}

evaluate string with array php

I have a string like
"subscription link :%list:subscription%
unsubscription link :%list:unsubscription%
------- etc"
AND
I have an array like
$variables['list']['subscription']='example.com/sub';
$variables['list']['unsubscription']='example.com/unsub';
----------etc.
I need to replace %list:subscription% with $variables['list']['subscription'],And so on
here list is first index and subscription is the second index from $variable
.Is possible to use eval() for this? I don't have any idea to do this,please help me
Str replace should work for most cases:
foreach($variables as $key_l1 => $value_l1)
foreach($value_l1 as $key_l2 => $value_l2)
$string = str_replace('%'.$key_l1.':'.$key_l2.'%', $value_l2, $string);
Eval forks a new PHP process which is resource intensive -- so unless you've got some serious work cut out for eval it's going to slow you down.
Besides the speed issue, evals can also be exploited if the origin of the code comes from the public users.
You could write the string to a file, enclosing the string in a function definition within the file, and give the file a .php extension.
Then you include the php file in your current module and call the function which will return the array.
I would use regular expression and do it like that:
$stringWithLinks = "";
$variables = array();
// your link pattern, in this case
// the i in the end makes it case insensitive
$pattern = '/%([a-z]+):([a-z]+)%/i';
$matches = array();
// http://cz2.php.net/manual/en/function.preg-replace-callback.php
$stringWithReplacedMarkers = preg_replace_callback(
$pattern,
function($mathces) {
// important fact: $matches[0] contains the whole matched string
return $variables[$mathces[1]][$mathces[2]];
},
$stringWithLinks);
You can obviously write the pattern right inside, I simply want to make it clearer. Check PHP manual for more regular expression possibilities. The method I used is here:
http://cz2.php.net/manual/en/function.preg-replace-callback.php

Using strpos() to match part of an array, PHP

I need some help with strpos().
Need to build a way to match any URL that contains /apple-touch but also need to keep specifics matching, such as "/favicon.gif" etc
At the moment, the matches are listed out individually as part of an array:
<?php
$errorurl = $_SERVER['REQUEST_URI'];
$blacklist = array("/favicon.gif", "/favicon.png", "/apple-touch-icon-precomposed.png", "/apple-touch-icon.png", "/apple-touch-icon-72x72-precomposed.png", "/apple-touch-icon-72x72.png", "/apple-touch-icon-114x114-precomposed.png", "/apple-touch-icon-114x114.png", "/apple-touch-icon-57x57-precomposed.png", "/apple-touch-icon-57x57.png", "/crossdomain.xml");
if (in_array($errorurl, $blacklist)) { // do nothing }
else { // send an email about error }
?>
Any ideas?
Many thanks for help
Instead of a regex, you could also remove all occurrences of your blacklist items with str_replace and compare the new string to the old one:
if ( str_replace($blacklist, '', $errorurl) !== $errorurl )
{
// do nothing
}
else
{
// send an email about error
}
If you want to use regex for this, and you want a single regex string that will capture all the values in your existing blacklist plus match any apple-touch string, then something like this would do it.
if(preg_match('/^\/(favicon|crossdomain|apple-touch.*)\.(gif|png|xml)$/',$_SERVER['REQUEST_URI']) {
//matched the blacklist!
}
To be honest, though, that's far more complex than you need.
I'd say you'd be better off keeping the specific values like favicon.gif etc in the blacklist array you already have; it'd make it a lot easier when you come to adding more items to the list.
I'd only consider using regex for the apple-touch values, since you want to block any variant of them. But even with that, it would likely be simpler if you used strpos().

How to choose between explode and regex

My string is to contain some "hotkeys" of the form [hotkey]. For example:
"This is a sample string [red] [h1]"
When I process this string with a php function, I'd like function to output the original string as follows;
<font color='red'><h1>This is a sample string</h1></font>
I'd like to use this function purely for convenience purposes easing some typing. I may use a font tag or div or whatever, let's not get into that. The point is this; a hotkey will cause the original string to be wrapped into
<something here>original string<and something there>
So the function first needs to determine if there are any hotkeys or not. That's easy; just check to see if there is any existence of [
Then we will need to process the string to determine which hotkeys exist and get into the biz logic as to which wrappers to be deployed.
and finally we will have to clean the original string from the hotkeys and return the results back.
My question is if there is a regex that would make this happen more effectively then the following parsing method that I am planning of implementing the function as.
step 1
explode the string into an array using the [ delimiter
step 2
go thru each array element to see if the closing ] is present and it forms one of the defined hotkeys, and if so, do the necessary.
Obviously, this method is not using any regex power. I'm wondering if regex could be of help here. Or, any better way to do it you may suggest?
If [ and ] are the only delimeters you need to worry about, you could probably use strtok
I don't speak english well but I saw your example :
"This is a sample string [red] [h1]"
<font color='red'><h1>This is a sample string</h1></font>
If I were you :
$red = substr( $chaine, strpos($chaine, '['), strpos($chaine, ']') );

Categories