Match Array Values to URL String - php

I have an array:
$blacklist = array("asdf.com", "fun.com", "url.com");
I have an input string:
$input = "http://asdf.com/asdf/1234/";
I am trying to see if string $input matches any values in $blacklist.
How do I accomplish this?

Sounds like a decent use for parse_url():
<?php
$blacklist = array("asdf.com", "fun.com", "url.com");
$input = "http://asdf.com/asdf/1234/";
$url = parse_url($input);
echo (in_array($url['host'], $blacklist) ? '(FAIL)' : '(PASS)') . $url ['host'];
?>
Output:
(FAIL)asdf.com

Using foreach is probably the best solution for what you're trying to achieve.
$blacklist = array("/asdf\.com/", "/fun\.com/", "/url\.com/");
foreach($blacklist as $bl) {
if (preg_match($bl, $input)){return true;}
}

One way could be (but I didn't measure performance):
$san = preg_replace($blacklist, '', $input);
if($san !== $input) {
//contained something from the blacklist
}
If the input does not contain any string from the backlist, the string will be returned unchanged.
An other, maybe better suited and definitely more efficient approach could be to extract the host part from the input and create the blacklist as associative array:
$blacklist = array(
"asdf.com" => true,
"fun.com" => true,
"url.com" => true
);
Then testing would be O(1) with:
if($blacklist[$host]) {
//contained something from the blacklist
}

in_array is of no use, as it searches for the exact string.
You have to loop through the array, and search for it
foreach($str in $blacklist)
{
if( stristr($input, $str ) )
{
//found
}
}

This code should work:
$blacklist = array("asdf.com", "fun.com", "url.com");
$input = "http://asdf.com/asdf/1234/";
if (in_array(parse_url($input,PHP_URL_HOST),$blacklist))
{
// The website is in the blacklist.
}

Related

wildcard array comparison - improving efficiency

I have two arrays that I'm comparing and I'd like to know if there is a more efficient way to do it.
The first array is user submitted values, the second array is allowed values some of which may contain a wildcard in the place of numbers e.g.
// user submitted values
$values = array('fruit' => array(
'apple8756apple333',
'banana234banana',
'apple4apple333',
'kiwi435kiwi'
));
//allowed values
$match = array('allowed' => array(
'apple*apple333',
'banana234banana',
'kiwi*kiwi'
));
I need to know whether or not all of the values in the first array, match a value in the second array.
This is what I'm using:
// the number of values to validate
$valueCount = count($values['fruit']);
// the number of allowed to compare against
$matchCount = count($match['allowed']);
// the number of values passed validation
$passed = 0;
// update allowed wildcards to regular expression for preg_match
foreach($match['allowed'] as &$allowed)
{
$allowed = str_replace(array('*'), array('([0-9]+)'), $allowed);
}
// for each value match against allowed values
foreach($values['fruit'] as $fruit)
{
$i = 0;
$status = false;
while($i < $matchCount && $status == false)
{
$result = preg_match('/' . $match['allowed'][$i] . '/', $fruit);
if ($result)
{
$status = true;
$passed++;
}
$i++;
}
}
// check all passed validation
if($passed === $valueCount)
{
echo 'hurray!';
}
else
{
echo 'fail';
}
I feel like I might be missing out on a PHP function that would do a better job than a while loop within a foreach loop. Or am I wrong?
Update: Sorry I forgot to mention, numbers may occur more than 1 place within the values, but there will only ever be 1 wildcard. I've updated the arrays to represent this.
If you don't want to have a loop inside another, it would be better if you grouped your $match regex.
You could get the whole functionality with a lot less code, which might arguably be more efficient than your current solution:
// user submitted values
$values = array(
'fruit' => array(
'apple8756apple',
'banana234banana',
'apple4apple',
'kiwi51kiwi'
)
);
$match = array(
'allowed' => array(
'apple*apple',
'banana234banana',
'kiwi*kiwi'
)
);
$allowed = '('.implode(')|(',$match['allowed']).')';
$allowed = str_replace(array('*'), array('[0-9]+'), $allowed);
foreach($values['fruit'] as $fruit){
if(preg_match('#'.$allowed.'#',$fruit))
$matched[] = $fruit;
}
print_r($matched);
See here: http://codepad.viper-7.com/8fpThQ
Try replacing /\d+/ in the first array with '*', then do array_diff() between the 2 arrays
Edit: after clarification, here's a more refined approach:
<?php
$allowed = str_replace("*", "\d+", $match['allowed']);
$passed = 0;
foreach ($values['fruit'] as $fruit) {
$count = 0;
preg_replace($allowed, "", $fruit, -1, $count); //preg_replace accepts an array as 1st argument and stores the replaces done on $count;
if ($count) $passed++;
}
if ($passed == sizeof($values['fruit']) {
echo 'hurray!';
} else {
echo 'fail';
}
?>
The solution above does not remove the need for a nested loop, but it merely lets PHP do the inner loop, which may be faster (you should actually benchmark it)

Using Regex to find if the string is inside the array and replace it + PHP

$restricted_images = array(
"http://api.tweetmeme.com/imagebutton.gif",
"http://stats.wordpress.com",
"http://entrepreneur.com.feedsportal.com/",
"http://feedads.g.doubleclick.net"
);
This are the list of images that I want to know if a certain string has that kind of string.
For example:
$string = "http://api.tweetmeme.com/imagebutton.gif/elson/test/1231adfa/".
Since "http://api.tweetmeme.com/imagebutton.gif" is in the $restricted_images array and it is also a string inside the variable $string, it will replace the $string variable into just a word "replace".
Do you have any idea how to do that one? I'm not a master of RegEx, so any help would be greatly appreciated and rewarded!
Thanks!
why regex?
$restricted_images = array(
"http://api.tweetmeme.com/imagebutton.gif",
"http://stats.wordpress.com",
"http://entrepreneur.com.feedsportal.com/",
"http://feedads.g.doubleclick.net"
);
$string = "http://api.tweetmeme.com/imagebutton.gif/elson/test/1231adfa/";
$restrict = false;
foreach($restricted_images as $restricted_image){
if(strpos($string,$restricted_image)>-1){
$restrict = true;
break;
}
}
if($restrict) $string = "replace";
maybe this can help
foreach ($restricted_images as $key => $value) {
if (strpos($string, $value) >= 0){
$string = 'replace';
}
}
You don't really need regex because you're looking for direct string matches.
You can try this:
foreach ($restricted_images as $url) // Iterate through each restricted URL.
{
if (strpos($string, $url) !== false) // See if the restricted URL substring exists in the string you're trying to check.
{
$string = 'replace'; // Reset the value of variable $string.
}
}
You don't have to use regex'es for this.
$test = "http://api.tweetmeme.com/imagebutton.gif/elson/test/1231adfa/";
foreach($restricted_images as $restricted) {
if (substr_count($test, $restricted)) {
$test = 'FORBIDDEN';
}
}
// Prepare the $restricted_images array for use by preg_replace()
$func = function($value)
{
return '/'.preg_quote($value).'/';
}
$restricted_images = array_map($func, $restricted_images);
$string = preg_replace($restricted_images, 'replace', $string);
Edit:
If you decide that you don't need to use regular expressions (which is not really needed with your example), here's a better example then all of those foreach() answers:
$string = str_replace($restricted_images, 'replace', $string);

Filter a set of bad words out of a PHP array

I have a PHP array of about 20,000 names, I need to filter through it and remove any name that has the word job, freelance, or project in the name.
Below is what I have started so far, it will cycle through the array and add the cleaned item to build a new clean array. I need help matching the "bad" words though. Please help if you can
$data1 = array('Phillyfreelance' , 'PhillyWebJobs', 'web2project', 'cleanname');
// freelance
// job
// project
$cleanArray = array();
foreach ($data1 as $name) {
# if a term is matched, we remove it from our array
if(preg_match('~\b(freelance|job|project)\b~i',$name)){
echo 'word removed';
}else{
$cleanArray[] = $name;
}
}
Right now it matches a word so if "freelance" is a name in the array it removes that item but if it is something like ImaFreelaner then it does not, I need to remove anything that has the matching words in it at all
A regular expression is not really necessary here — it'd likely be faster to use a few stripos calls. (Performance matters on this level because the search occurs for each of the 20,000 names.)
With array_filter, which only keeps elements in the array for which the callback returns true:
$data1 = array_filter($data1, function($el) {
return stripos($el, 'job') === FALSE
&& stripos($el, 'freelance') === FALSE
&& stripos($el, 'project') === FALSE;
});
Here's a more extensible / maintainable version, where the list of bad words can be loaded from an array rather than having to be explicitly denoted in the code:
$data1 = array_filter($data1, function($el) {
$bad_words = array('job', 'freelance', 'project');
$word_okay = true;
foreach ( $bad_words as $bad_word ) {
if ( stripos($el, $bad_word) !== FALSE ) {
$word_okay = false;
break;
}
}
return $word_okay;
});
I'd be inclined to use the array_filter function and change the regex to not match on word boundaries
$data1 = array('Phillyfreelance' , 'PhillyWebJobs', 'web2project', 'cleanname');
$cleanArray = array_filter($data1, function($w) {
return !preg_match('~(freelance|project|job)~i', $w);
});
Use of the preg_match() function and some regular expressions should do the trick; this is what I came up with and it worked fine on my end:
<?php
$data1=array('JoomlaFreelance','PhillyWebJobs','web2project','cleanname');
$cleanArray=array();
$badWords='/(job|freelance|project)/i';
foreach($data1 as $name) {
if(!preg_match($badWords,$name)) {
$cleanArray[]=$name;
}
}
echo(implode($cleanArray,','));
?>
Which returned:
cleanname
Personally, I would do something like this:
$badWords = ['job', 'freelance', 'project'];
$names = ['JoomlaFreelance', 'PhillyWebJobs', 'web2project', 'cleanname'];
// Escape characters with special meaning in regular expressions.
$quotedBadWords = array_map(function($word) {
return preg_quote($word, '/');
}, $badWords);
// Create the regular expression.
$badWordsRegex = implode('|', $quotedBadWords);
// Filter out any names that match the bad words.
$cleanNames = array_filter($names, function($name) use ($badWordsRegex) {
return preg_match('/' . $badWordsRegex . '/i', $name) === FALSE;
});
This should be what you want:
if (!preg_match('/(freelance|job|project)/i', $name)) {
$cleanArray[] = $name;
}

Finding tags in query string with regular expression

I have to set some routing rules in my php application, and they should be in the form
/%var/something/else/%another_var
In other words i beed a regex that returns me every URI piece marked by the % character, String marked by % represent var names so they can be almost every string.
another example:
from /%lang/module/controller/action/%var_1
i want the regex to extract lang and var_1
i tried something like
/.*%(.*)[\/$]/
but it doesn't work.....
Seeing as it's routing rules, and you may need all the pieces at some point, you could also split the string the classical way:
$path_exploded = explode("/", $path);
foreach ($path_exploded as $fragment) if ($fragment[0] == "%")
echo "Found $fragment";
$str='/%var/something/else/%another_var';
$s = explode("/",$str);
$whatiwant = preg_grep("/^%/",$s);
print_r($whatiwant);
I don’t see the need to slow down your script with a regex … trim() and explode() do everything you need:
function extract_url_vars($url)
{
if ( FALSE === strpos($url, '%') )
{
return $url;
}
$found = array();
$parts = explode('/%', trim($url, '/') );
foreach ( $parts as $part )
{
$tmp = explode('/', $part);
$found[] = ltrim( array_shift($tmp), '%');
}
return $found;
}
// Test
print_r( extract_url_vars('/%lang/module/controller/action/%var_1') );
// Result:
Array
(
[0] => lang
[1] => var_1
)
You can use:
$str = '/%lang/module/controller/action/%var_1';
if(preg_match('#/%(.*?)/[^%]*%(.*?)$#',$str,$matches)) {
echo "$matches[1] $matches[2]\n"; // prints lang var_1
}

How to recursively create a multidimensional array?

I am trying to create a multi-dimensional array whose parts are determined by a string. I'm using . as the delimiter, and each part (except for the last) should be an array
ex:
config.debug.router.strictMode = true
I want the same results as if I were to type:
$arr = array('config' => array('debug' => array('router' => array('strictMode' => true))));
This problem's really got me going in circles, any help is appreciated. Thanks!
Let’s assume we already have the key and value in $key and $val, then you could do this:
$key = 'config.debug.router.strictMode';
$val = true;
$path = explode('.', $key);
Builing the array from left to right:
$arr = array();
$tmp = &$arr;
foreach ($path as $segment) {
$tmp[$segment] = array();
$tmp = &$tmp[$segment];
}
$tmp = $val;
And from right to left:
$arr = array();
$tmp = $val;
while ($segment = array_pop($path)) {
$tmp = array($segment => $tmp);
}
$arr = $tmp;
I say split everything up, start with the value, and work backwards from there, each time through, wrapping what you have inside another array. Like so:
$s = 'config.debug.router.strictMode = true';
list($parts, $value) = explode(' = ', $s);
$parts = explode('.', $parts);
while($parts) {
$value = array(array_pop($parts) => $value);
}
print_r($parts);
Definitely rewrite it so it has error checking.
Gumbo's answer looks good.
However, it looks like you want to parse a typical .ini file.
Consider using library code instead of rolling your own.
For instance, Zend_Config handles this kind of thing nicely.
I really like JasonWolf answer to this.
As to the possible errors: yes, but he supplied a great idea, now it is up to the reader to make it bullet proof.
My need was a bit more basic: from a delimited list, create a MD array. I slightly modified his code to give me just that. This version will give you an array with or without a define string or even a string without the delimiter.
I hope someone can make this even better.
$parts = "config.debug.router.strictMode";
$parts = explode(".", $parts);
$value = null;
while($parts) {
$value = array(array_pop($parts) => $value);
}
print_r($value);
// The attribute to the right of the equals sign
$rightOfEquals = true;
$leftOfEquals = "config.debug.router.strictMode";
// Array of identifiers
$identifiers = explode(".", $leftOfEquals);
// How many 'identifiers' we have
$numIdentifiers = count($identifiers);
// Iterate through each identifier backwards
// We do this backwards because we want the "innermost" array element
// to be defined first.
for ($i = ($numIdentifiers - 1); $i >=0; $i--)
{
// If we are looking at the "last" identifier, then we know what its
// value is. It is the thing directly to the right of the equals sign.
if ($i == ($numIdentifiers - 1))
{
$a = array($identifiers[$i] => $rightOfEquals);
}
// Otherwise, we recursively append our new attribute to the beginning of the array.
else
{
$a = array($identifiers[$i] => $a);
}
}
print_r($a);

Categories