So I'm writing a PHP script that will read in a CSS file, then put the comments and actual CSS in separate arrays. The script will then build a page with the CSS and comments all nicely formatted.
The basic logic for the script is this:
Read in a new line
If it starts with a forward slash or
ends with an opening bracket, set a
bool for CSS or comments to true
Add that line to the appropriate
element in the appropriate array
If the last character is a backslash
(end of a comment) or the first
character is a closing bracket (end
of a CSS tag), set necessary bool to
false
Rinse, repeat
If someone sees an error in that, feel free to point it out, but I think it should do what I want.
The tricky part is the last if statement, checking if the last character is a backslash. Right now I have:
if ($line{(strlen($line) - 3)} == "\\") {do stuff}
where $line is the last line read in from the file. Not entirely sure why I have to go back 3 characters, but I'm guessing it's because there's a newline at the end of each string when reading it in from a file. However, this if statement is never true, even though there are definitely lines which end with slashes. This
echo "<br />str - 3: " . $line{(strlen($line)-3)};
even returns a backslash, yet the if statement is never trigged.
That would be because $line{(strlen($line) - 3)} in your if statement is returning one backslash, while the if statement is looking for two. Try using
substr($line, -2)
instead. (You might have to change it to -3. The reason for this is because the newline character might be included at the end of the string.)
#mcritelli: CSS comments look like /* comment */ though, so just searching for a backslash won't tell you if it's starting or ending the comment. Here's a very basic script I tested which loops through a 'line' and can do something at the beginning and end of a comment --
<?php
$line = "/* test rule */";
$line .= ".test1 { ";
$line .= " text-decoration: none; ";
$line .= "}/* end of test rule */";
for ($i = 0; $i < strlen($line); $i++)
{
if ($line[$i] . $line[$i + 1] == "/*")
{
// start of a comment, do something
}
elseif ($line[$i] . $line[$i + 1] == "*/")
{
// end of a comment, do something
}
}
?>
Related
When we check:
dir1/dir2/../file.txt ==== this is same as =====> dir1/file.txt
I am interested is something same thing available in PHP, like:
$name= "Hello ". $variable . "World";
if i had $variable = "../Hi" (or anything like that) so, it removed (like backslashing) the previous part, printed Hi World ?
(p.s. I dont control the php file, I ask about how attackers can achieve that).
(p.s.2. I dont have words to downvoters for closing this. I think you have problems with analysing of questions before you close).
In PHP there exist no special ../ (or any other string) that when concatenated to another string generates any string other than the combine original string concatenated with the new string. Concatenation, regardless of content of strings always results in:
"<String1><String2>" = "<String1>"."<String2>";
Nothing will not 'erase' prior tokens in a string or anything like that and is completely harmless.
Caveat!!!! Of course if the string is being used somewhere that interprets it in some specific way where any character or group of characters in the ../ is treated special such as:
In a string used for regex pattern
In a string used as a file path (in that case, when it's evaluated it will do exactly what you'd expect if you'd typed it.
A string used in a SQL query without properly escaping (as with binding params/values via prepared statements)
etc...
Now, if you want to remove the word prior to each occurence of ../ starting a word in a sentence, sort-of replicating how the .. in a path means, go up one level (in effect undoing the step made to the directory in the path prior to it).
Here's a basic algorithm to start you out (if you are able to change the source code) :
Use explode with delimiter " " on the string.
Create a new array
Iterate the returned array, if not ../ insert at end of new array
if entry starts with ../, remove the end element of the 2nd array
insert the the ../somestring with the ../ string replaced with empty string "" on the end of the 2nd array
Once at end of array (all strings processed), implode() with delimiter " "
Here's an example:
<?php
$variable = "../Hi";
$string = "Hello ". $variable . " World"; // Note: I added a space prior to the W
$arr = array();
foreach(explode(" ", $string) as $word) {
if (substr( $word, 0, 3 ) === "../") {
if(!empty($arr)){
array_pop($arr);
}
$arr[] = str_replace("../", "", $word);
} else {
$arr[] = $word;
}
}
echo implode(" ", $arr);
Intro:
I'm fairly new to RegEx so bear with me here. We have a client who has an extremely large CSS file. Verging on 27k lines total - 20k lines or so is pure CSS and the following is written in SCSS. I am attempting to cut this down and despite using more than allotted hours to work on this, I found it extremely interesting - so I wrote a little PHP script to do this for me! Unfortunately it's not quite there due to the RegEx being a little troublesome.
Context
remove.txt - Text file containing selectors, line by line that are redundant on our site and can be removed.
main.scss - The big SASS file.
PHP script - Basically reads the remove.txt file line by line, finds the selector in the main.scss file and adds a "UNUSED" string before each selector, so I can go down line by line and remove the rule.
Issue
So the main reason this is troublesome is because we have to account for lots of occurrences at the start of the CSS rules and towards the end as well. For example -
Example scenarios of .foo-bar (bold indicates what should match) -
.foo-bar {}
.foo-bar, .bar-foo {}
.foo-bar .bar-foo {}
.boo-far, .foo-bar {}
.foo-bar,.bar-foo {}
.bar-foo.foo-bar {}
PHP Script
<?php
$unused = 'main.scss';
if ($file = fopen("remove.txt", "r")) {
// Stop an endless loop if file doesn't exist
if (!$file) {
die('plz no loops');
}
// Begin looping through redundant selectors line by line
while(!feof($file)) {
$line = trim(fgets($file));
// Apply the regex to the selector
$line = $line.'([\s\S][^}]*})';
// Apply the global operators
$line = '/^'.$line.'/m';
// Echo the output for reference and debugging
echo ('<p>'.$line.'</p>');
// Find the rule, append it with UNUSED at the start
$dothings = preg_replace($line,'UNUSED $0',file_get_contents($unused), 1);
}
fclose($file);
} else {
echo ('<p>failed</p>');
}
?>
RegEx
From the above you can gather my RegEx will be -
/^REDUNDANTRULE([\s\S][^}]*})/m
It's currently having a hard time with dealing with indentation that typically occur within media queries and also when there are proceeding selectors applied to the same rule.
From this I tried adding to the start (To accommodate for whitespace and when the selector is used in a longer version of the selector) -
^[0a-zA-Z\s]
And also adding this to the end (to accommodate for commas separating selectors)
\,
Could any RegEx/PHP wizards point me in the right direction? Thank you for reading regardless!
Thanks #ctwheels for the fantastically explained answer. I encountered a couple other issues, one being full stops being used within the received redundant rules not being escaped. I've now updated my script to escape them before doing the find an replace as seen below. This is now my most up to date and working script -
<?php
$unused = 'main.scss';
if ($file = fopen("remove.txt", "r")) {
if (!$file) {
die('plz no loops');
}
while(!feof($file)) {
$line = trim(fgets($file));
if( strpos( $line, '.' ) !== false ) {
echo ". found in $line, escaping characters";
$line = str_replace('.', '\.', $line);
}
$line = '/(?:^|,\s*)\K('.$line.')(?=\s*(?:,|{))/m';
echo ('<p>'.$line.'</p>');
var_dump(preg_match_all($line, file_get_contents($unused)));
$dothings = preg_replace($line,'UNUSED $0',file_get_contents($unused), 1);
var_dump(
file_put_contents($unused,
$dothings
)
);
}
fclose($file);
} else {
echo ('<p>failed</p>');
}
?>
Answer
Brief
Based on the examples you provided, the following regex will work, however, it will not work for all CSS rules. If you add more cases, I can update the regex to accommodate those other situations.
Code
See regex in use here
Regex
(?:^|,\s*)\K(\.foo-bar)(?=\s*(?:,|{))
Replacement
UNUSED $1
Note: The multiline m flag is used.
Usage
The following script is generated by regex101 (by clicking on code generator in regex101): Link here
$re = '/(?:^|,\s*)\K(\.foo-bar)(?=\s*(?:,|{))/m';
$str = '.foo-bar {}
.foo-bar, .bar-foo {}
.foo-bar .bar-foo {}
.boo-far, .foo-bar {}
.foo-bar,.bar-foo {}
.bar-foo.foo-bar {}';
$subst = 'UNUSED $1';
$result = preg_replace($re, $subst, $str);
echo "The result of the substitution is ".$result;
Results
Input
.foo-bar {}
.foo-bar, .bar-foo {}
.foo-bar .bar-foo {}
.boo-far, .foo-bar {}
.foo-bar,.bar-foo {}
.bar-foo.foo-bar {}
Output
UNUSED .foo-bar {}
UNUSED .foo-bar, .bar-foo {}
.foo-bar .bar-foo {}
.boo-far, UNUSED .foo-bar {}
UNUSED .foo-bar,.bar-foo {}
.bar-foo.foo-bar {}
Explanation
(?:^|,\s*) Match either of the following
^ Assert position at the start of the line
,\s* Comma character , literally, followed by any number of whitespace characters
\K Resets starting point of the reported match (any previously consumed characters are no longer included in the final match)
(\.foo-bar) Capture into group 1: The dot character . literally, followed by foo-bar literally
(?=\s*(?:,|{)) Positive lookahead ensuring what follows matches the following
\s* Any whitespace character any number of times
(?:,|{)) Match either of the following
, Comma character , literally
{ Left curly bracket { literally
Edit
The following regex is an update from the previous one and moves \s* outside the first group to match the possibility of whitespace after the caret ^ as well.
(?:^|,)\s*\K(\.foo-bar)(?=\s*(?:,|{))
I need to erase all comments in $string which contains data from some C file.
The thing I need to replace looks like this:
something before that shouldnt be replaced
/*
* some text in between with / or * on many lines
*/
something after that shouldnt be replaced
and the result should look like this:
something before that shouldnt be replaced
something after that shouldnt be replaced
I have tried many regular expressions but neither work the way I need.
Here are some latest ones:
$string = preg_replace("/\/\*(.*?)\*\//u", "", $string);
and
$string = preg_replace("/\/\*[^\*\/]*\*\//u", "", $string);
Note: the text is in UTF-8, the string can contain multibyte characters.
You would also want to add the s modifier to tell the regex that .* should include newlines. I always think of s to mean "treat the input text as a single line"
So something like this should work:
$string = preg_replace("/\\/\\*(.*?)\\*\\//us", "", $string);
Example: http://codepad.viper-7.com/XVo9Tp
Edit: Added extra escape slashes to the regex as Brandin suggested because he is right.
I don't think regexp fit good here. What about wrote a very small parse to remove this? I don't do PHP coding for a long time. So, I will try to just give you the idea (simple alogorithm) I haven't tested this, it's just to you get the idea, as I said:
buf = new String() // hold the source code without comments
pos = 0
while(string[pos] != EOF) {
if(string[pos] == '/') {
pos++;
while(string[pos] != EOF)
{
if(string[pos] == '*' && string[pos + 1] == '/') {
pos++;
break;
}
pos++;
}
}
buf[buf_index++] = string[pos++];
}
where:
string is the C source code
buf a dynamic allocated string which expands as needed
It is very hard to do this perfectly without ending up writing a full C parser.
Consider the following, for example:
// Not using /*-style comment here.
// This line has an odd number of " characters.
while (1) {
printf("Wheee!
(*\/*)
\\// - I'm an ant!
");
/* This is a multiline comment with a // in, and
// an odd number of " characters. */
}
So, from the above, we can see that our problems include:
multiline quote sequences should be ignored within doublequotes. Unless those doublequotes are part of a comment.
single-line comment sequences can be contained in double-quoted strings, and in multiline strings.
Here's one possibility to address some of those issues, but far from perfect.
// Remove "-strings, //-comments and /*block-comments*/, then restore "-strings.
// Based on regex by mauke of Efnet's #regex.
$file = preg_replace('{("[^"]*")|//[^\n]*|(/\*.*?\*/)}s', '\1', $file);
try this:
$string = preg_replace("#\/\*\n?(.*)\*\/\n?#ms", "", $string);
Use # as regexp boundaries; change that u modifier with the right ones: m (PCRE_MULTILINE) and s (PCRE_DOTALL).
Reference: http://php.net/manual/en/reference.pcre.pattern.modifiers.php
It is important to note that my regexp does not find more than one "comment block"... Use of "dot match all" is generally not a good idea.
I have a PHP script that include different pages for special referers:
$ref_found = false;
// get referer if exists
$referer = false;
if ( isset($_SERVER['HTTP_REFERER']) ) {
$referer = $_SERVER['HTTP_REFERER'];
// get content of list.txt
$list = explode(chr(10), file_get_contents('list.txt'));
foreach ( $list as $l ) {
if ( strlen($l) > 0 ) {
if ( strpos( $referer, $l ) ) {
$ref_found = true;
}
}
}
}
// include the correct file
if ( $ref_found ) {
require_once('special_page.html');
} else {
require_once('regular_page.html');
}
Referer DB is in simple txt file (list.txt) and it looks like this:
domain1.com
domain2.com
domain3.com
Unfortunalty this script works only for last domain from the list (domain3.com).
What shoud I add? \n ?
Or it's better idea to create domains DB in different way?
The problem is that when you explode() your list of domain names, you end up with whitespace around each item. At the very least, you will have a newline (\n) somewhere, since the linebreaks in your file are probably \r\n.
So you're checking against something like " domain1.com" or maybe "\ndomain1.com", or maybe "domain1.com\n". Since this extra whitespace doesn't exists in the referrer header, it's not matching when you expect it to.
By calling trim() on each value you find, you'll get a clean domain name that you can use to do a more useful comparison:
$list = explode("\n", file_get_contents('list.txt'));
foreach ($list as $l) {
$l = trim($l);
if ((strlen($l) > 0) && (strpos($referer, $l) !== false)) {
$ref_found = true;
break;
}
}
I made a couple other minor updates to your code as well:
I switched away from using chr() and just used a string literal ("\n"). As long as you use double-quotes, it'll be a literal newline character, instead of an actual \ and n, and the string literal is much easier to understand for somebody reading your code.
I switched from a "\r" character (chr 10) to a "\n" character (chr 13). There's several different newline formats, but the most common are "\n" and "\r\n". By exploding on "\n", your code will work with both formats, where "\r" will only work with the second.
I combined your two if statements. This is a very minor update that doesn't have much effect except to (in my opinion) make the code easier to read.
I updated your strpos() to do a literal comparison to false (!==). It's probably not an issue with this code because the referrer value will start with http://, but it's a good habit to get into. If the substring happens to occur at the beginning of the parent string, strpos() will return 0, which will be interpreted as false in your original code.
I added a break statement in your loop if you found a matching domain name. Once you find one and set the flag, there's no reason to continue checking the rest of the domains in the list, and break allows you to cancel the rest of the foreach loop.
chr(13) == "\n"
chr(10) == "\r"
"\n" is most likely what you want.
Question 1: How can I manually move the fgetc file pointer from its current location to the next line?
I'm reading in data character by character until a specified number of delimiters are counted. Once the delimiter count reaches a certain number, it needs to copy the remainder of the line until a new line (the record delimiter). Then I need to start copying character by character again starting at the next record.
Question 2: Is manually moving the file pointer to the next line the right idea? I would just explode(at "\n") but I have to count the pipe delimiters first because "\n" isn't always the record delimiter.
Here's my code (it puts all the data into the correct record until it reaches the last delimiter '|' in the record. It then puts the rest of the line into the next record because I haven't figured out how to make it correctly look for the '\n' after specified # of | are counted):
$file=fopen("source_data.txt","r") or exit ("File Open Error");
$record_incrementor = 0;
$pipe_counter = 0;
while (!feof($file))
{
$char_buffer = fgetc($file);
$str_buffer[] = $char_buffer;
if($char_buffer == '|')
{
$pipe_counter++;
}
if($pipe_counter == 46) //Maybe Change to 46
{
$database[$record_incrementor] = $str_buffer;
$record_incrementor++;
$str_buffer = NULL;
$pipe_counter = 0;
}
}
Sample Data:
1378|2009-12-13 11:51:45.783000000|"Pro" |"B13F28"||""|1||""|""|""|||False|||""|""|""|""||""||||||2010-12-15 11:51:51.330000000|108||||||""||||||False|""|""|False|""|||False
1379|2009-12-13 12:23:23.327000000|"TLUG"|"TUG"||""|1||""|""|""|||False|||""|""|""|""||""||||||1943-04-19 00:00:00|||||||""||||||False|""|""|False|""|||False
I'd say that doing this via file handling functions is a bit clumsy, when it could be done via regular expression quite easily. Just read the entire file into a string using file_get_contents() and doing a regular expression like /^(([^|]*\|){47}([^\r\n]*))/m with preg_match_all() could find you all the rows (which you can then explode() using | as the delimiter and setting 48 as the limit for number of fields.
Here is a working example function. The function takes the file name, field delimiter and the number of fields per row as the arguments. The function returns 2 dimensional array where first index is the data row number and the second is the field number.
function loadPipeData ($file, $delim = '|', $fieldCount = 48)
{
$contents = file_get_contents($file);
$d = preg_quote($delim, '/');
preg_match_all("/^(([^$d]*$d){" . ($fieldCount - 1) . '}([^\r\n]*))/m', $contents, $match);
$return = array();
foreach ($match[0] as $line)
{
$return[] = explode($delim, $line, $fieldCount);
}
return $return;
}
var_dump(loadPipeData('source_data.txt'));
(Note: this is a solution to the original problem)
You can read to the end of the line like this:
while (!feof($file) && fgetc($file) !== '\n');
As for whether or not fgetc is the right way to do this... your format makes it difficult to use anything else. You can't split on \n, because there may be newlines within a field, and you can't split on |, because the end of the record doesn't have a pipe.
The only other option I can think is to use preg_match_all:
$buffer = file_get_contents('test.txt');
preg_match_all('/((?:[^|]*\|){45}[^\n]*\n)/', $buffer, $matches);
foreach ($matches[0] as $row) {
$fields = explode('|', $row);
}
Answer to the modified question:
To read from the file pointer to the end of the line, you can simply use the file reading function fgets(). It returns everything from the current file pointer position until it reaches the end of the line (and also returns the end of the line character(s)). After the function call, the file reading pointer has been moved to the beginning of the next line.