Strings, regexp and files - php

<?php
$iprange = array(
"^12\.34\.",
"^12\.35\.",
);
foreach($iprange as $var) {
if (preg_match($var, $_SERVER['REMOTE_ADDR'])) {
I'm looking to have a list that will constitute each of the values inside the array. Let's call it iprange.txt, from which I would extract the variable $iprange. I would also be updating the file with new ranges, but I also want to convert those strings to regexp if that's something that's needed in php, as it is in the above example.
If you could help me with the two following issues:
I understand that somehow I would be using an array include, but I'm not sure how to implement it.
I would like to run a cron that would update the text file and turn it into a regexp acceptable for use in the above example, if you think regexp is a good idea and there isn't another option. I know how to apply a cron in a directadmin gui, but I don't know what the cronned file would look like.
edit------------------------
Thanks Mamsaac, very helpful, right now I'm stuck on further issues that have risen that have to do with cases and ob_file_callback, and if I start talking about them here, I won't get anywhere, but they can be followed here: Problems with ob_file_callback and fwrite
As for this thread here, to keep it on topic, I wanted to ask you how would you go about including a whole file in the array you suggested?
I no longer need the cronjob you were thinking about if I don't have to convert strings to regular expression.

I will propose a different approach to the problem, if that's OK with you.
You could try using ip2long() function for this, making comparisons much faster. The advantage of doing it this way is that you can be very specific with each range (and in a natural way, where a range means "between two numbers".
So, you can do it something like this:
$ranges = array('10.20.8.0-10.20.14.254', '192.168.0.2-192.168.0.254');
foreach ($ranges as $iprange) {
list($lowerip, $highip) = explode('-', $iprange);
$remoteip = ip2long($_SERVER['REMOTE_ADDR']);
if (ip2long($lowerip) <= $remoteip && $remoteip <= ip2long($highip)) {
//it is within this range! I don't know what you want to do with it.
}
}
You could also use netmasks, but I will leave that as an exercise for you. To do it, you will play a bit with bitwise operations. Negate the mask, then use and bitwise and operation... Not what you requested! I might update this after I go to sleep.
About the file and cronjob. I am absolutely unsure of why you want a cronjob for this. How are you deciding what new ranges you will be accepting?
You can always read a file (you can use file_get_contents if you so desire and do a split on the string using
$ranges = explode("\n", file_get_contents("filename")) ;
and then you would have your array ready. (notice I even called it the same as in the block code above).
if the file ever gets REALLY big, avoid using the approach above and go ahead with fopen and fgets approach:
$file = #fopen("filename", "r"); //suppressing error messages, probably don't want that
if (!$file) {
//for some reason the file didn't open. Do error reporting or checking
}
$ranges = array();
while (($line = fgets($file)) !== false) {
$ranges[] = $line;
}
Seems like I'm only missing why you want to use a cronjob. Please elaborate on your criteria for deciding to add new IP ranges.

$ranges = array('12.34.1.0-12.34.14.254', '192.168.0.2-192.168.0.254');
foreach ($ranges as $iprange) {
list($lowerip, $highip) = explode('-', $iprange);
$remoteip = ip2long($_SERVER['REMOTE_ADDR']);
if (ip2long($lowerip) <= $remoteip && $remoteip <= ip2long($highip)) {
}
}
header("LOCATION: page1.php");
}
else
{
header("LOCATION: page2.php");
}
?>
I've isolated the if else and that works fine. I've also placed the two '}' at the end of your script both before and after my if else but no luck.

Related

How can I quickly delete a value of less than two characters from a large array?

I want delete values of less than two characters from a my large array which have 9436065 string values. I deleted with preg_grep() using this code:
function delLess($array, $less)
{
return preg_grep('~\A[^qwertyuiopasdfghjklzxcvbnmQWERTYUIOPASDFGHJKLZXCVBNM]{'.$less.',}\z~u', $array);
}
$words = array("ӯ","ӯро","ӯт","ғариб","афтода","даст", "ра");
echo "<pre>";
print_r(delLess($words,2));
echo "</pre>";
But it works slower. Is it possible to optimize this code?
I would go for array_filter function, performance should be better.
function filter($var)
{
return strlen($var) > 2;
}
$newArray = array_filter($array, "filter"));
given the size of the dataset, I'd use a database, so it would probably look like this:
delete from table where length(field) <= 2
maybe something like sqlite?
You could try using the strlen function instead of regular expressions and see if that is faster. (Or mb_strlen for multibyte characters.)
$newArr = array();
foreach($words as $val)
if(strlen($val) > 2)
$newArr[] = $val;
echo "<pre>";
print_r($newArr);
echo "</pre>";
Any work on 10 million strings will take time. In my opinion, this kind of operation is a one timer, so it does not really matter if it is not instantaneous.
Where are the strings coming from? You certainly got them from a database, if so, do the work on the database it will be faster and at least you will not be polluted with them ever. This kind of operation will be faster on a database than PHP, but could still take time.
Again, if it is stored in a database, it has not got there magically... So you could also make sure that no new unwanted entry gets in it, that way you make sure this operation will not need to be redone.
I am aware that this absolutely does not answer your question at all, because we should stick to PHP and you got the best way to do it... Optimizing such a simple function would cost a lot of time and wouldn't bring much if any optimization... The only other suggestion I could make is use another tool, if not database-based, file-based like sed, awk or anything that reads/writes to files... You'd have one string per line and parse the file reducing its size accordingly, but writing the file from PHP, exec the script and load the file back in PHP would make things too complicated for nothing...

PHP Short-Circuit Evaluation (Good/Bad?)

This is a general question of sorts, but to explain it I will use a specific example.
I have a function that loads a document. If that document does not exist it will create it, if it does exist it will convert it to a JSON array. I always want this function to return an array of some sort, whether or not there is an issue with json_decode() or if the file does not exist. Currently I am doing it like so...
function load($file) {
if( ! file_exists($file)) {
$handle = fopen($file, 'w');
fclose($handle);
}
$raw = file_get_contents($file);
$contents = json_decode($raw, TRUE);
return( ! $contents ? array() : $contents);
//cant use ternary shorthand "?:" in PHP 5.2, otherwise this would be shorter
}
Now, there is nothing wrong with the above code (at least I don't think there is and it works fine). However I'm always looking for ways to improve my code and condense it while keeping it perfectly legible. And that return statement has always bothered me because of how inefficient it seems. So today I got to thinking and something occurred to me. I remember seeing mysql tutorials that do something to the effect of connect() or die(); so I thought, why not json_decode() or array();? Would this even work? So I rewrote my function to find out...
function load($file) {
if( ! file_exists($file)) {
$handle = fopen($file, 'w');
fclose($handle);
}
$raw = file_get_contents($file);
return json_decode($raw, TRUE) or array();
}
It seems to, and it even reads pleasantly enough. So on to my next bout of questions. Is this good practice? I understand it, but would anyone else? Does it really work or is this some bug with a happy ending? I got to looking around and found out that what I'm asking about is called short-circuit evaluation and not a bug. That was good to know. I used that new term to refine my search and came up with some more material.
Blog Entry
Wikipedia
There wasn't much and most everything I found that talked about using short-circuiting in the way I'm inquiring about always referred to MySQL connections. Now, I know most people are against using the or die() terminology, but only because it is an inelegant way to deal with errors. This isn't a problem for the method I'm asking about because I'm not seeking to use or die(). Is there any other reason not to use this? Wikipedia seems to think so, but only in reference to C. I know PHP is written in C, so that is definitely pertinent information. But has this issue been wheedled out in the PHP compilation? If not, is it as bad as Wikipedia makes it out to be?
Here's the snippet from Wikipedia.
Wikipedia - "Short-circuiting can lead to errors in branch prediction on modern processors, and dramatically reduce performance (a notable example is highly optimized ray with axis aligned box intersection code in ray tracing)[clarification needed]. Some compilers can detect such cases and emit faster code, but it is not always possible due to possible violations of the C standard. Highly optimized code should use other ways for doing this (like manual usage of assembly code)"
What do you all think?
EDIT
I've polled another forum and gotten some good results there. General consensus appears to be that this form of variable assignment, while valid, is not preferred, and may even be considered bad form in the real world. I'll continue to keep an ear to the ground and will update this if anything new comes around. Thank you Corbin and Matt for your input, especially Corbin for clearing up a few things. Here's a link to the forum post should you be interested.
There's a few different questions you ask, so I'll try to address them all.
Missed branch predictions: Unless you're coding in C or assembly, don't worry about this. In PHP, you're so far from the hardware that thinking about branch predictions isn't going to help you. Either way, this would be a very-micro optimization, especially in a function that does extensive string parsing to begin with.
Is there any other reason not to use this? Wikipedia seems to think so, but only in reference to C. I know PHP is written in C, so that is definitely pertinent information.
PHP likely parses it to a different execution structure. Unless you're planning on running this function millions of times, or you know it's a bottleneck, I wouldn't worry about it. In 2012, I find it very unlikely that using an or to short circuit would cause even a billionth of a second difference.
As for the formatting, I find $a or $b rather ugly. My mind doesn't comprehend the short circuiting the same it sees it in an if clause.
if (a() || b())
Is perfectly clear to my mind that b() will execute only if a() does not evaluate to true.
However:
return a() or b();
Doesn't have the same clarity to me.
That's obviously just an opinion, but I'll offer two alternatives as to how I might write it (which are, in my opinion, a very tiny bit clearer):
function load($file) {
if (!file_exists($file)) {
touch($file);
return array();
}
$raw = file_get_contents($file);
$contents = json_decode($raw, true);
if (is_array($contents)) {
return $contents;
} else {
return array();
}
}
If you don't care if the file actually gets created, you could take it a step farther:
function load($file) {
$raw = file_get_contents($file);
if ($raw !== false) {
$contents = json_decode($raw, true);
if ($contents !== null) {
return $contents;
}
}
return array();
}
I guess really these code snippets come down to personal preference. The second snippet is likely the one I would go with. The critical paths could be a bit clearer in it, but I feel like it maintains brevity without sacrificing comprehensibility.
Edit: If you're a 1-return-per-function type person, the following might be a bit more preferable:
function load($file) {
$contents = array();
$raw = file_get_contents($file);
if ($raw !== false) {
$contents = json_decode($raw, true);
if ($contents === null) {
$contents = array();
}
}
return $contents;
}
Condensing your code into the minimalistic lines possible you can get it isnt always the best method, as usually compacting code looks pretty cool however is usually hard to read. If you have any doubts about your code and the readability, i'd suggest you add some standard comments into your code so any person can understand the code from your comments alone.
In terms of best practice, thats a matter of opinion, and if you are happy with it then go with it, you can always revisit the code later on down the projects life if needs be
I do like short-circuit declarations as it a way to do one-line variables check.
I prefer:
isset($value) or $value = 0;
Rather than:
if (!isset($value)) {
$value = 0;
}
But I haven't used it directly in returns and this post made want to try.
And sadly, it does not work properly, at least for me:
return $data[$key] or $data[1];
Will return the value 1 in all cases while I'm expecting an array.
The following works smoothly:
// Make sure $key is valid.
$data[$key] or $key = 1;
return $data[$key];
But I'm surprised PHP is not throwing any error when $key doesn't exist in $data.

Undefine PHP defines?

I'm working on converting an old define()-based language/translation system to a more flexible one (probably JSON-based, but it's still open).
As part of this conversion, I will need to convert from 42 .php files with several thousand strings each to whatever format I'll be using. Some of the defined strings reference other defines or use PHP code. I don't need to keep this dynamic behaviour (it's never really dynamic anyway), but I will need to have the "current" values at time of conversion.
One define might look like this:
define('LNG_Some_string', 'Page $s of $s of our fine '.LNG_Product_name);
Since all defines have an easily recognizable 'LNG_' prefix, converting a single file is trivial. But I'd like to make a small script which handles all 42 in one run.
Ideally I'd be able to either undefine or redefine the define()'s, but I can't find a simple way of doing that. Is this at all possible?
Alternatively, what would be a good way of handling this conversion? The script will be one-off, so it doesn't need to be maintainable or fast. I just want it fully automated to avoid human error.
if speed is not important, so you can use get_defined_constants function.
$constans = get_defined_constants(true);
$myconst = array();
$myconst = $constans['user'];
$myconst will contain all constants defined by your script:-)
P.S: I'm not a good php coder, it was just a suggestion :-)
You can't undefine constants, but you can generate your new scripts by utiliising them and the constant() function:
<?php
/* presuming all the .php files are in the same directoy */
foreach (glob('/path/*.php') as $file) {
$contents = file_get_contents($file);
$matches = array();
if (!preg_match('/define\(\'LNG_(\w+)\'/', $contents, $matches) {
echo 'No defines found.';
exit;
}
$newContents = '';
include_once $file;
foreach ($matches as $match) {
$newContents .= "SOME OUTPUT USING $match AS NAME AND " . constant($match) . " TO GET VALUE";
}
file_put_contents('new_'.$file, $newContents);
}
?>
Defined constants can't be undefined. They're immutable.
Perhaps what you can do is get in right before they're defined and modify them in certain circumstances.

Handle $_GET safely PHP

I have a code like this:
$myvar=$_GET['var'];
// a bunch of code without any connection to DB where $myvar is used like this:
$local_directory=dirname(__FILE__).'/images/'.$myvar;
if ($myvar && $handle = opendir($local_directory)) {
$i=0;
while (false !== ($entry = readdir($handle))) {
if(strstr($entry, 'sample_'.$language.'-'.$type)) {
$result[$i]=$entry;
$i++;
}
}
closedir($handle);
} else {
echo 'error';
}
I'm a little confused with a number of stripping and escaping functions, so the question is, what do i need to do with $myvar for this code to be safe? In my case i don't make any database connections.
You are trying to prevent directory traversal attacks, so you don't want the person putting in ./../../../ or something, hoping to read out files or filenames, depending on what you are doing.
I often using something like this:
$myvar = preg_replace("/[^a-zA-Z0-9-]/","",$_GET['var']);
This replaces anything that isn't a-zA-Z0-9- with a blank, so if the variable contains say, *, this code would delete that.
I then change the a-zA-Z0-9- to match which characters I want to be allowed in the string. I can then lock it down to only containing numbers or whatever I need.
It's really, really dangerous to do something like: opendir($local_directory) where $local_directory is a value which could come from the outside.
What if someone passes in something like ../../../../../../../../../etc ...or something like that? You risk of compromising security of your host.
You can take a glance here, to start:
http://php.net/manual/en/book.filter.php
IMHO, if you don't create anything on the fly, you should have something like:
$allowed_dirs = array('dir1','dir2', 'dir3');
if (!in_array($myvar, $allowed_dirs)) {
// throw an error and log what has happened
}
You can do this right after you receive your input from "outside". If it's impractical for you to do this because the number of image dirs can vary with time and you're afraid of missing the sync with your codebase, you could also populate the array of valid values making a scan of subdirectories you have into the image folders first.
So, at the end, you could have something like:
$allowed_dirs = array();
if ($handle = opendir(dirname(__FILE__) . '/images')) {
while (false !== ($entry = readdir($handle))) {
$allowed_dirs[] = $entry;
}
closedir($handle);
}
$myvar=$_GET['var'];
// you can deny access to dirs you want to protect like this
unset($allowed_dirs['private_stuff']);
// rest of code
$local_directory = dirname(__FILE__) . "/images/.$myvar";
if (in_array(".$myvar", $allowed_dirs) && $handle = opendir($local_directory)) {
$i=0;
while (false !== ($entry = readdir($handle))) {
if(strstr($entry, 'sample_'.$language.'-'.$type)) {
$result[$i]=$entry;
$i++;
}
}
closedir($handle);
} else {
echo 'error';
}
Code above is NOT optimized. But let's avoid premature optimization in this case (stating this to avoid another "nice" downvote); snippet is just to get you the idea of explicitly allowing values VS alternate approach of allowing everything unless matching a certain pattern. I think the former is more secure.
Let me just note for completeness that, if you can be sure your code will only be run on Unixish systems (such as Linux), the only things you need to ensure are that:
$myvar does not contain any slash ("/", U+002F) or null ("\0", U+0000) characters, and that
$myvar is not empty or equal to "." (or, equivalently, that ".$myvar" is not equal to "." or "..").
That's because, on a Unix filesystem, the only directory separator character (and one of the two characters not allowed in filenames, the other being the null character "\0") is the slash, and the only special directory entries pointing upwards in the directory tree are "." and "..".
However, if your code might someday be run on Windows, then you'll need to disallow more characters (at least the backslash, "\\", and probably others too). I'm not familiar enough with Windows filesystem conventions to say exactly which characters you'd need to disallow there, but the safe approach is to do as Rich Bradshaw suggests and only allow characters that you know are safe.
As with every data that comes from an untrusted source: Validate it before use and encode it properly when passing it to another context.
As for the former, you first need to specify what properties the data must have to be considered valid. This primarily depends on the purpose of its use.
In your case, the value of $myvar should probably be at least a valid directory name but it could also be a valid relative path composed of directory names, depending on your requirements. At this point, you are supposed to specify these requirements.

How can I safely use eval in php?

I know some people may just respond "never" as long as there's user input. But suppose I have something like this:
$version = $_REQUEST['version'];
$test = 'return $version > 3;';
$success = eval($test);
This is obviously a simplified case, but is there anything that a user can input as version to get this to do something malicious? If I restrict the type of strings that $test can take on to comparing the value of certain variables to other variables, is there any way anybody can see to exploit that?
Edit
I've tried running the following script on the server and nothing happens:
<?php
$version = "exec('mkdir test') + 4";
$teststr = '$version > 3;';
$result = eval('return ' . $teststr);
var_dump($result);
?>
all I get is bool(false). No new directory is created. If I have a line that actually calls exec('mkdir test') before that, it actually does create the directory. It seems to be working correctly, in that it's just comparing a string converted to a number to another number and finding out the result is false.
Ohhhh boy!
$version = "exec('rm-rf/...') + 4"; // Return 4 so the return value is "true"
// after all, we're gentlemen!
$test = "return $version > 3";
eval($test);
:)
You would have to do at least a filter_var() or is_numeric() on the input value in this case.
By the way, the way you use eval (assigning its result to $success) doesn't work in PHP. You would have to put the assignment into the eval()ed string.
If you do this. Only accept ints.
If you must accept strings, don't.
If you still think you must. Don't!
And lastly, if you still, after that, think you need strings. JUST DON'T!
yes, anything. I would use $version = (int)$_REQUEST['version']; to validate the data.
You need to be more precise with your definitions of "malicious" or "safe". Consider for example
exec("rm -rf /");
echo "enlarge your rolex!";
while(true) echo "*";
all three snippets are "malicious" from the common sense point of view, however technically they are totally different. Protection techniques that may apply to #1, won't work with other two and vice versa.
The way to make this safe would be to ensure that $version is a number BEFORE you try to eval.
Use this code to remove everything except numbers (0-9): preg_replace('/[^0-9]+/', '', $version);

Categories