preg_match must end with "/"?

preg_match must end with "/"? - php

In the preg_match below, I'm comparing against two static strings, $url and $my_folder...
$url = get_bloginfo('url')
//$url = 'http://site.com'
$my_folder = get_option('my_folder');
//$my_folder = 'http://site.com/somefolder;
I'm getting a match when the $my_folder string has a trailing slash
http://somefolder/go/
But this does not create a match...
http://somefolder/go
However, another problem is that this also matches...
http://somefolder/gone
Code is...
$my_folder = get_option('rseo_nofollow_folder');
if($my_folder !=='') $my_folder = trim($my_folder,'/');
$url = trim(get_bloginfo('url'),'/');
preg_match_all('~<a.*>~isU',$content["post_content"],$matches);
for ( $i = 0; $i <= sizeof($matches[0]); $i++){
if($my_folder !=='')
{
//HERES WHERE IM HAVING PROBLEMS
if ( !preg_match( '~nofollow~is',$matches[0][$i])
&& (preg_match('~' . $my_folder . '/?$~', $matches[0][$i])
|| !preg_match( '~'. $url .'/?$~',$matches[0][$i])))
{
$result = trim($matches[0][$i],">");
$result .= ' rel="nofollow">';
$content["post_content"] = str_replace($matches[0][$i], $result, $content["post_content"]);
}
}
else
{
//THIS WORKS FINE, NO PROBLEMS HERE
if ( !preg_match( '~nofollow~is',$matches[0][$i]) && (!preg_match( '~'.$url.'~',$matches[0][$i])))
{
$result = trim($matches[0][$i],">");
$result .= ' rel="nofollow">';
$content["post_content"] = str_replace($matches[0][$i], $result, $content["post_content"]);
}
}
}
return $content;

~^http://somefolder/go(?:/|$)~

You need to first remove the trailing slash and add '/?' at the end of your regexp
$my_folder = trim($my_folder,'/');
$url = trim(get_bloginfo('url'),'/');
if ( !preg_match( '~nofollow~is',$matches[0][$i])
&& (preg_match('~' . $my_folder . '/?$~', $matches[0][$i])
|| !preg_match( '~'. $url .'/?$~',$matches[0][$i])))

This is a shot in the dark, but try:
preg_match( '/' . preg_quote( get_bloginfo('url'), '/' ) . '?/', $matches[0][$i] )
You can use whatever char you want in place of the / chars. I'm guessing that you're using wordpress and guessing that get_bloginfo('url') is normalized to always have a trailing slash. If that is the case, the last slash will be selected optionally by the ? at the end of the regex.

You should just use strstr() or strpos() if it's fixed strings anyway.
Your example rewritten:
if (!strstr($matches[0][$i], "nofollow")
and strstr($matches[0][$i], $my_folder)
or !strstr($matches[0][$i], $url) )
strpos works similarly, but you need an extra boolean check:
if (strpos($matches, "nofollow") === FALSE
or strpos($matches, $my_folder) !== FALSE)

Related

Php preg_match with wild card characters

I have some zip code in an array which includes some wild card characters like this
$zip_codes = array( '12556', '765547', '234*', '987*' );
$target_zip = '2347890';
So to check whether the target zip is already present in the array. I am doing like this
foreach( $zip_codes as $zip ) {
if ( preg_match( "/{$target_zip}.*$/i", $zip ) ) {
echo 'matched';
break;
}
else {
echo 'not matched';
}
}
But its not matching the zip at all. Can someone tell me whats the issue here?

You need to turn your $zip values into valid regular expressions by converting * into .* (or perhaps \d*); then you can test them against $target_zip:
$zip_codes = array( '12556', '765547', '234*', '987*' );
$target_zip = '2347890';
foreach( $zip_codes as $zip ) {
echo $zip;
if (preg_match('/' . str_replace('*', '.*', $zip) . '/', $target_zip)) {
echo ' matched'. PHP_EOL;
break;
}
else {
echo ' not matched' . PHP_EOL;
}
}
Output:
12556 not matched
765547 not matched
234* matched
Demo on 3v4l.org
You haven't indicated whether you want the value in $zip_codes to match the entire $target_zip value or just part of it. The code above will work for just part (i.e. 234 will match 12345); if you don't want that, change the regex construction to:
if (preg_match('/^' . str_replace('*', '.*', $zip) . '$/', $target_zip)) {
The anchors will ensure that $zip matches the entirety of $target_zip.

One of the problems is that using 234* in a regex will match any number of 4's.
The other problem is that you match to the end (using $) but not the start, so 789 (with the .* appended) will also match (as it's in the middle). In this code, I use ^{$zip}$, with * replaced with .* to match any trailing characters...
$zip_codes = array( '12556', '765547', '789', '234', '234*', '987*' );
$target_zip = '2347890';
foreach( $zip_codes as $zip ) {
$zip = str_replace("*", ".*", $zip);
if ( preg_match( "/^{$zip}$/i", $target_zip ) ) {
echo $zip.' matched'.PHP_EOL;
break;
}
else {
echo $zip.' not matched'.PHP_EOL;
}
}

You don't need regex here.
You can look for * and based on that look for an exact on the number of characters -1.
$zip_codes = array( '12556', '765547', '234*', '987*' );
$target_zip = '2347890';
foreach($zip_codes as $zip){
if(strpos($zip, "*") !== false){
//if "234* -1 = "234" == substr("2347890",0,3)
if(substr($zip, 0, -1) == substr($target_zip, 0, strlen($zip)-1)){
echo "wildcard match";
}
}else{
if($zip == $target_zip){
echo "excat match";
}
}
}
https://3v4l.org/hIGER

The issue is that in the loop, the pattern is always the same:
if ( preg_match( "/2347890.*$/i", $zip ) ) {
I think you meant to use the value of $zip as part of the pattern, which causes the issue repeating the last digit 0+ times in:
if ( preg_match( "/234*.*$/i", $zip ) ) {
^^
As an alternative, you could also extract the digits from $zip_codes using a capturing group and match optional following *
^(\d+)\**$
Regex demo
Then use strpos to check if the target_zip start with the extracted digits.
$zip_codes = array( '12556', '765547', '234*', '987*', '237' );
$target_zip = '2347890';
foreach($zip_codes as $zip ) {
$digitsOnly = preg_replace("~^(\d+)\**$~", "$1", $zip);
if (strpos($target_zip, $digitsOnly) === 0) {
echo "$zip matched $target_zip" . PHP_EOL;
break;
}
else {
echo "$zip not matched $target_zip" . PHP_EOL;
}
}
Output
12556 not matched 2347890
765547 not matched 2347890
234* matched 2347890
Php demo

How to remove path after domain name from string

I have the following code :
function removeFilename($url)
{
$file_info = pathinfo($url);
return isset($file_info['extension'])
? str_replace($file_info['filename'] . "." . $file_info['extension'], "", $url)
: $url;
}
$url1 = "http://website.com/folder/filename.php";
$url2 = "http://website.com/folder/";
$url3 = "http://website.com/";
echo removeFilename($url1); //outputs http://website.com/folder/
echo removeFilename($url2);//outputs http://website.com/folder/
echo removeFilename($url3);//outputs http:///
Now my problem is that when there is only only a domain without folders or filenames my function removes website.com too.
My idea is there is any way on php to tell my function to do the work only after third slash or any other solutions you think useful.

UPDATED : ( working and tested )
<?php
function removeFilename($url)
{
$parse_file = parse_url($url);
$file_info = pathinfo($parse_file['path']);
return isset($file_info['extension'])
? str_replace($file_info['filename'] . "." . $file_info['extension'], "", $url)
: $url;
}
$url1 = "http://website.com/folder/filename.com";
$url2 = "http://website.org/folder/";
$url3 = "http://website.com/";
echo removeFilename($url1); echo '<br/>';
echo removeFilename($url2); echo '<br/>';
echo removeFilename($url3);
?>
Output:
http://website.com/folder/
http://website.org/folder/
http://website.com/

Sounds like you are wanting to replace a substring and not the whole thing. This function might help you:
http://php.net/manual/en/function.substr-replace.php

Since filename is at last slash you can use substr and str_replace to remove file name from path.
$PATH = "http://website.com/folder/filename.php";
$file = substr( strrchr( $PATH, "/" ), 1) ;
echo $dir = str_replace( $file, '', $PATH ) ;
OUTPUT
http://website.com/folder/

pathinfo cant recognize only domain and file name. But if without filename url is ended by slash
$a = array(
"http://website.com/folder/filename.php",
"http://website.com/folder/",
"http://website.com",
);
foreach ($a as $item) {
$item = explode('/', $item);
if (count($item) > 3)
$item[count($item)-1] ='';;
echo implode('/', $item) . "\n";
}
result
http://website.com/folder/
http://website.com/folder/
http://website.com

Close to the answer of splash58
function getPath($url) {
$item = explode('/', $url);
if (count($item) > 3) {
if (strpos($item[count($item) - 1], ".") === false) {
return $url;
}
$item[count($item)-1] ='';
return implode('/', $item);
}
return $url;
}

How to format a URL in php?

For example, I enter this URL:
http://www.example.com/
And I want it to return me:
http://www.example.com
How can I format the URL like so? Is there a built-in PHP function for doing this?

This should do it:
$url = 'http://parkroo.com/';
if ( substr ( $url, 0, 11 ) !== 'http://www.' )
$url = str_replace ( 'http://', 'http://www.', $url );
$url = rtrim ( $url, '/' );
Ok this one should work better:
$urlInfo = parse_url ( $url );
$newUrl = $urlInfo['scheme'] . '://';
if ( substr ( $urlInfo['host'], 0, 4 ) !== 'www.' )
$newUrl .= 'www.' . $urlInfo['host'];
else
$newUrl .= $urlInfo['host'];
if ( isset ( $urlInfo['path'] ) && isset ( $urlInfo['query'] ) )
$newUrl .= $urlInfo['path'] . '?' . $urlInfo['query'];
else
{
if ( isset ( $urlInfo['path'] ) && $urlInfo['path'] !== '/' )
$newUrl .= $urlInfo['path'];
if ( isset ( $urlInfo['query'] ) )
$newUrl .= '?' . $urlInfo['query'];
}
echo $newUrl;

You can parse_url to get parts of the URL and then build the URL.
Or even easier, trim('http://example.com/', '/');

Live DEMO
<?php
function changeURL($url){
if(empty($url)){
return false;
}
else{
$u = parse_url($url);
/*
possible keys are:
scheme
host
user
pass
path
query
fragment
*/
foreach($u as $k => $v){
$$k = $v;
}
//start rebuilding the URL
if(!empty($scheme)){
$newurl = $scheme.'://';
}
if(!empty($user)){
$newurl.= $user;
}
if(!empty($pass)){
$newurl.= ':'.$pass.'#';
}
if(!empty($host)){
if(substr($host, 0, 4) != 'www.'){
$host = 'www.'. $host;
}
$newurl.= $host;
}
if(empty($path) && empty($query) && empty($fragment)){
$newurl.= '/';
}else{
if(!empty($path)){
$newurl.= $path;
}
if(!empty($query)){
$newurl.= '?'.$query;
}
if(!empty($fragment)){
$newurl.= '#'.$fragment;
}
}
return $newurl;
}
}
echo changeURL('http://yahoo.com')."<br>";
echo changeURL('http://username:password#yahoo.com/test/?p=2')."<br>";
echo changeURL('ftp://username:password#yahoo.com/test/?p=2')."<br>";
/*
http://www.yahoo.com/
http://username:password#www.yahoo.com/test/?p=2
ftp://username:password#www.yahoo.com/test/?p=2
*/
?>

How to escape url for fopen

It looks like fopen can't open files with spaces.
For example:
$url = 'http://gatewaypeople.com/images/articles/cntrbutnssttmnts12_main 616x200.jpg';
fopen($url, 'r');
returns false (mind the space in the url), but file is accessible by browsers.
I've also tried to escape the url by urlencode and rawurlencode with no luck. How to properly escape the spaces?

You can use this code:
$arr = parse_url ( 'http://gatewaypeople.com/images/articles/cntrbutnssttmnts12_main 616x200.jpg' );
$parts = explode ( '/', $arr['path'] );
$fname = $parts[count($parts)-1];
unset($parts[count($parts)-1]);
$url = $arr['scheme'] . '://' . $arr['host'] . join('/', $parts) . '/' . urlencode ( $fname );
var_dump( $url );
Alternative & Shorter Answer (Thanks to #Dziamid)
$url = 'http://gatewaypeople.com/images/articles/cntrbutnssttmnts12_main 616x200.jpg';
$parts = pathinfo($url);
$url = $parts['dirname'] . '/' . urlencode($parts['basename']);
var_dump( $url );
OUTPUT:
string(76) "http://gatewaypeople.com/images/articles/cntrbutnssttmnts12_main+616x200.jpg"

rawurlencodeis the way to go, but no not escape the full URL. Only escape the filename. So you will end up in http://gatewaypeople.com/images/articles/cntrbutnssttmnts12_main%20616x200.jpg

All solutions proposed here are wrong because they don't escape the query string part and the base directory part. Additionally they don't take in consideration user, pass and fragment url parts.
To correctly escape a valid URL you have to separately escape the path parts and the query parts.
So the solution is to extract the url parts, escape each part and rebuild the url.
Here is a simple code snippet:
function safeUrlEncode( $url ) {
$urlParts = parse_url($url);
$urlParts['path'] = safeUrlEncodePath( $urlParts['path'] );
$urlParts['query'] = safeUrlEncodeQuery( $urlParts['query'] );
return http_build_url($urlParts);
}
function safeUrlEncodePath( $path ) {
if( strlen( $path ) == 0 || strpos($path, "/") === false ){
return "";
}
$pathParts = explode( "/" , $path );
return implode( "/", $pathParts );
}
function safeUrlEncodeQuery( $query ) {
$queryParts = array();
parse_str($query, $queryParts);
$queryParts = urlEncodeArrayElementsRecursively( $queryParts );
return http_build_query( $queryParts );
}
function urlEncodeArrayElementsRecursively( $array ){
if( ! is_array( $array ) ) {
return urlencode( $array );
} else {
foreach( $array as $arrayKey => $arrayValue ){
$array[ $arrayKey ] = urlEncodeArrayElementsRecursively( $arrayValue );
}
}
return $array;
}
Usage would simply be:
$encodedUrl = safeUrlEncode( $originalUrl );
Side note
In my code snippet i'm making use of http://php.net/manual/it/function.http-build-url.php which is available under PECL extension. If you don't have PECL extension on your server you can simply include the pure PHP implementation: http://fuelforthefire.ca/free/php/http_build_url/
Cheers :)

$url = 'http://gatewaypeople.com/images/articles/cntrbutnssttmnts12_main 616x200.jpg';
fopen(urlencode($url), 'r');

Regex help to add rel="nofollow" to links matching a predefined URL pattern

The function below parses through the content passed from the filter and adds rel="nofollow" to all external links it finds and currently skips over all internal links.
Assuming that I have a folder path defined in a variable, say...
$my_folder = "http://mysite.com/recommends/";
How would I augment the function so that it also ads the nofollow to links that match this pattern as well? These will likely be internal links, and these would be the only internal links I would want to nofollow in this example, so they need to be exceptioned from the internal link regex bits somehow.
add_filter('wp_insert_post_data', 'new_content' );
function new_content($content) {
preg_match_all('~<a.*>~isU',$content["post_content"],$matches);
for ( $i = 0; $i <= sizeof($matches[0]); $i++){
if ( !preg_match( '~nofollow~is',$matches[0][$i])
&& !preg_match( '~'.get_bloginfo('url').'~',$matches[0][$i]) ){
$result = trim($matches[0][$i],">");
$result .= ' rel="nofollow">';
$content["post_content"] = str_replace($matches[0][$i], $result, $content["post_content"]);
}
}
return $content;
}
PS: Thanks to Backie from WSX for the current function code...

I have no way to try this out since I'm not a WP user, but assuming your inner loop defines $matches[0][$i] as the current url to compare:
add_filter('wp_insert_post_data', 'new_content' );
function new_content($content) {
$my_folder = "http://mysite.com/recommends/";
preg_match_all('~<a.*>~isU',$content["post_content"],$matches);
for ( $i = 0; $i <= sizeof($matches[0]); $i++){
if ( !preg_match( '~nofollow~is',$matches[0][$i])
&& (preg_match('~' . $my_folder . '~', $matches[0][$i])
|| !preg_match( '~'.get_bloginfo('url').'~',$matches[0][$i]))){
$result = trim($matches[0][$i],">");
$result .= ' rel="nofollow">';
$content["post_content"] = str_replace($matches[0][$i], $result, $content["post_content"]);
}
}
return $content;
}
Edit: Might have to add a preg_quote around $my_folder, I'm just a bit confused since it's not done for get_bloginfo('url')

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

preg_match must end with "/"? - php

~^http://somefolder/go(?:/|$)~

Related

Php preg_match with wild card characters

How to remove path after domain name from string

How to format a URL in php?

How to escape url for fopen

Regex help to add rel="nofollow" to links matching a predefined URL pattern

Categories

Resources