PHP Check if url in username and replace it

PHP Check if url in username and replace it - php

i want to check if a user has a domain or website in his name.
Much times, user will make advertise for own or other sites on that way.
So i want to replace the URL than with *
I found that
$url = 'Testusername google.com';
$regex = "((https?|ftp)\:\/\/)?"; // SCHEME
$regex .= "([a-z0-9+!*(),;?&=\$_.-]+(\:[a-z0-9+!*(),;?&=\$_.-]+)?#)?"; // User and Pass
$regex .= "([a-z0-9-.]*)\.([a-z]{2,3})"; // Host or IP
$regex .= "(\:[0-9]{2,5})?"; // Port
$regex .= "(\/([a-z0-9+\$_-]\.?)+)*\/?"; // Path
$regex .= "(\?[a-z+&\$_.-][a-z0-9;:#&%=+\/\$_.-]*)?"; // GET Query
$regex .= "(#[a-z_.-][a-z0-9+\$_.-]*)?"; // Anchor
if(preg_match("/^$regex$/i", $url)) // `i` flag for case-insensitive
{
$url = str_replace($url, $url, '*');
echo $url;
return true;
} else {
echo $url;
}
It works good to find the url, but how to replace only the domain in that name to * and not everything?
So i want
Testusername *

To replace a regular expression, use preg_replace instead of preg_match and use groups to extract only the valid parts of the username.
$username = 'Testusername google.com';
$regex = '/^(.*?)';
$regex .= "((https?|ftp)\:\/\/)?"; // SCHEME
$regex .= "([a-z0-9+!*(),;?&=\$_.-]+(\:[a-z0-9+!*(),;?&=\$_.-]+)?#)?"; // User and Pass
$regex .= "([a-z0-9-.]*)\.([a-z]{2,3})"; // Host or IP
$regex .= "(\:[0-9]{2,5})?"; // Port
$regex .= "(\/([a-z0-9+\$_-]\.?)+)*\/?"; // Path
$regex .= "(\?[a-z+&\$_.-][a-z0-9;:#&%=+\/\$_.-]*)?"; // GET Query
$regex .= "(#[a-z_.-][a-z0-9+\$_.-]*)?"; // Anchor
$regex .= '(.*?)$/i';
$username = preg_replace($regex, '$1 * $13', $username);
echo $username; // Testusername *
If you need to know that the username contained a url and a replacement was made, you can use the $count argument to determine how many replacements occurred.
$count = 0;
$username = preg_replace($regex, '$1 * $13', $username, -1, $count);
if ($count > 0) {
// url replaced in username
}

Try using preg_replace() like so:
$url = 'Testusername google.com';
$regex = "((https?|ftp)\:\/\/)?"; // SCHEME
$regex .= "([a-z0-9+!*(),;?&=\$_.-]+(\:[a-z0-9+!*(),;?&=\$_.-]+)?#)?"; // User and Pass
$regex .= "([a-z0-9-.]*)\.([a-z]{2,3})"; // Host or IP
$regex .= "(\:[0-9]{2,5})?"; // Port
$regex .= "(\/([a-z0-9+\$_-]\.?)+)*\/?"; // Path
$regex .= "(\?[a-z+&\$_.-][a-z0-9;:#&%=+\/\$_.-]*)?"; // GET Query
$regex .= "(#[a-z_.-][a-z0-9+\$_.-]*)?"; // Anchor
echo preg_replace("/$regex$/i", '*', $url);
more about preg_replace can be found here

this should do it - you need to separate the username and url, as well as a few other changes.....
$username = 'Testusername google.com';
$regex = "((https?|ftp)\:\/\/)?"; // SCHEME
$regex .= "([a-z0-9+!*(),;?&=\$_.-]+(\:[a-z0-9+!*(),;?&=\$_.-]+)?#)?"; // User and Pass
$regex .= "([a-z0-9-.]*)\.([a-z]{2,3})"; // Host or IP
$regex .= "(\:[0-9]{2,5})?"; // Port
$regex .= "(\/([a-z0-9+\$_-]\.?)+)*\/?"; // Path
$regex .= "(\?[a-z+&\$_.-][a-z0-9;:#&%=+\/\$_.-]*)?"; // GET Query
$regex .= "(#[a-z_.-][a-z0-9+\$_.-]*)?"; // Anchor
if(preg_match("/^$regex$/i", $username,$matches)) // `i` flag for case-insensitive
{
foreach($matches as $key => $url){
$username = str_replace($url, '*', $username);
}
echo $username;
return true;
} else {
echo $username;
}

Related

Could any one say what is the my mistake in this regex

I want to use this regex for validating my urls in php with preg_match function but when i use it it says "Unknown modifier '&'"
what is the problem ?
$urlregex = "/^(http|ftp|https)\:\/\/";
// USER AND PASS (optional)
$urlregex .= "([a-z0-9+!*(),;?&=\$_.-]+(\:[a-z0-9+!*(),;?&=\$_.-]+)?#)?";
// HOSTNAME OR IP
$urlregex .= "[a-z0-9+\$_-]+(\.[a-z0-9+\$_-]+)+"; // http://x.x = minimum
// PORT (optional)
$urlregex .= "(\:[0-9]{2,5})?";
// PATH (optional)
$urlregex .= "(\/([a-z0-9+\$_-]\.?)+)*\/?";
// GET Query (optional)
$urlregex .= "(\?[a-z+&\$_.-][a-z0-9;:#/&%=+\$_.-]*)?";
// ANCHOR (optional)
$urlregex .= "(#[a-z_.-][a-z0-9+\$_.-]*)?\$/";
if(preg_match($urlregex, $url) === 1)
{
$errors[] = "URL_ISNOTVALID";
$ok = false;
}

Looks like you forgot to escape a forward slash:
$urlregex .= "(\?[a-z+&\$_.-][a-z0-9;:#/&%=+\$_.-]*)?";
should be
$urlregex .= "(\?[a-z+&\$_.-][a-z0-9;:#\/&%=+\$_.-]*)?";

The / (slash) in your GET Query is seen as the termination of the regex. And not the / at the end of your regex added in the ANCHOR line.
So you need to escape that / in front of the &.
$urlregex .= "(\?[a-z+&\$_.-][a-z0-9;:#/&%=+\$_.-]*)?";
bvecomes
$urlregex .= "(\?[a-z+&\$_.-][a-z0-9;:#\/&%=+\$_.-]*)?";
thats all.

You could save some trouble by using filter_var instead.
if (false !== ($url = filter_var($url, FILTER_VALIDATE_URL))) {
echo "$url is a valid url";
}
You can optionally add these options as the third parameter (use binary or to combine them):
FILTER_FLAG_PATH_REQUIRED
FILTER_FLAG_QUERY_REQUIRED

Preg-replace - replace all URLs except a domain and its subdomains

I've a Glype proxy and I want not parse external URLs. All URLs on the page are automatically converted to: http://proxy.com/browse.php?u=[URL HERE]. Example: If I visit The Pirate Bay on my proxy, then I want not to parse the following URLs:
ByteLove.com (Not to: http://proxy.com/browse.php?u=http://bytelove.com&b=0)
BayFiles.com (Not to: http://proxy.com/browse.php?u=http://bayfiles.com&b=0)
BayIMG.com (Not to: http://proxy.com/browse.php?u=http://bayimg.com&b=0)
PasteBay.com (Not to: http://proxy.com/browse.php?u=http://pastebay.com&b=0)
Ipredator.com (Not to: http://proxy.com/browse.php?u=https://ipredator.se&b=0)
etc.
Of course I want to keep the internal URLs, so:
thepiratebay.se/browse (To: http://proxy.com/browse.php?u=http://thepiratebay.se/browse&b=0)
thepiratebay.se/top (To: http://proxy.com/browse.php?u=http://thepiratebay.se/top&b=0)
thepiratebay.se/recent (To: http://proxy.com/browse.php?u=http://thepiratebay.se/recent&b=0)
etc.
Is there a preg_replace to replace all URL's except thepiratebay.se and there subdomains (as in the example)? An other function is also welcome. (Such as domdocument, querypath, substr or strpos. Not str_replace because then I should define all URLs)
I've found something, but I'm not familiar with preg_replace:
$exclude = '.thepiratebay.se';
$pattern = '(https?\:\/\/.*?\..*?)(?=\s|$)';
$message= preg_replace("~(($exclude)?($pattern))~i", '$2$5$6', $message);

I'll guess you would need to provide a whitelist to tell which domains should be proxied:
$whitelist = array();
$whitelist[] = "internal1.se";
$whitelist[] = "internal2.no";
$whitelist[] = "internal3.com";
// and so on...
$string = 'External link 1<br>';
$string .= 'Internal link 1<br>';
$string .= 'Internal link 2<br>';
$string .= 'External link 2<br>';
//Assuming the URL always is inside '' or "" you can use this pattern:
$pattern = '#(https?://proxy\.org/browse\.php\?u=(https?[^&|\"|\']*)(&?[^&|\"|\']*))#i';
$string = preg_replace_callback($pattern, "my_callback", $string);
//I had only PHP 5.2 on my server, so I decided to use a callback function.
function my_callback($match) {
global $whitelist;
// set return bypass proxy URL
$returnstring = urldecode($match[2]);
foreach ($whitelist as $white) {
// check if URL matches whitelist
if (stripos($match[2], $white) > 0) {
$returnstring = $match[0];
break; } }
return $returnstring;
}
echo "NEW STRING[:\n" . $string . "\n]\n";

you can use preg_replace_callback() to execute a callback function for every match. In that function you can determine if the matched string should be converted or not.
<?php
$string = 'http://foobar.com/baz and http://example.org/bumm';
$pattern = '#(https?\:\/\/.*?\..*?)(?=\s|$)#i';
$string = preg_replace_callback($pattern, function($match) {
if (stripos($match[0], 'example.org/') !== false) {
// exclude all URLs containing example.org
return $match[0];
} else {
return 'http://proxy.com/?u=' . urlencode($match[0]);
}
}, $string);
echo $string, "\n";
(Example is using PHP 5.3 closure notation)

Php link active

Well this is the function which active the link if someone put a link in message box with text.
My question is it doesn't show more than 1 link if someone put many link ex: www.yahoo.com www.gmail.com www.facebook.com, Then it's show only first link www.yahoo.com
function txt2link($text){
// force http: on www.
$text = ereg_replace( "www\.", "http://www.", $text );
// eliminate duplicates after force
$text = ereg_replace( "http://http://www\.", "http://www.", $text );
$text = ereg_replace( "https://http://www\.", "https://www.", $text );
// The Regular Expression filter
$reg_exUrl = "/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";
// Check if there is a url in the text
if(preg_match($reg_exUrl, $text, $url)) {
// make the urls hyper links
$text = preg_replace($reg_exUrl, ''.$url[0].'', $text);
} // if no urls in the text just return the text
return ($text);
}
$url = "Alter pot waer it your pot http://css-tricks.com/snippets/php/find-urls-in-
text-make-links/ you may click the link www.yahoo.com or you may see what is the
http://www.youtube.com say, is it right?";
echo txt2link($url);
you can run this code to see the result.
Any Idea ?

This is one that someone here helped me build up a while back, I dunno where it is on stack anymore but I still use this function to date.. This handles many urls in one shot, with or without http: with or without www. in most cases, its true this could us a bit of refining, but in all does the job really nice.
//for finding URLs within body of text and converting them to a clickable link
//checks DNS to see if its valid and if the link HTML already exists ignores it.
function titleHyper($text){
$text = preg_replace( "/(www\.)/is", "http://", $text);
$text = str_replace(array("http://http://","http://https://"), "http://", $text);
$text = str_replace(array("<a href='", "<a href=\"", "</a>", "'>", "\">"), "", $text);
$reg_exUrl = "/(http|https|ftp|ftps|)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";
preg_match_all($reg_exUrl, $text, $matches);
$usedPatterns = array();
$context = stream_context_create(array(
'http' => array(
'timeout' => 5
)
));
foreach($matches[0] as $pattern){
if(!array_key_exists($pattern, $usedPatterns)){
$usedPatterns[$pattern]=true;
$the_contents = #file_get_contents($pattern, 0, $context);
if(substr(trim($pattern), 0, 8) != "https://"){
$color = "#FF0000";
}
if (empty($the_contents)) {
$title = $pattern;
} else {
preg_match("/<title>(.*)<\/title>/Umis", $the_contents, $title);
$title = $title[1];
$color = "#00FF00";
//$title = htmlspecialchars($title, ENT_QUOTES); //saving data to database
}
$text = str_ireplace($pattern, "<a style='font-size: 14px; background-color: #FFFFFF; color: $color;' href='$pattern' rel='nofollow' TARGET='_blank'> $title </a>", $text);
}
}
return $text;
}
// titleHyper() in action example:
//$text = "Some sample text with WWW.AOL.com<br />http://www.youtube.com/watch?v=YaxKiZfQcX8 <br />Anyone use www.myspace.com? <br />Some people are nuts, look at this stargate link at http://www.youtube.com/watch?v=ZKoUm6z5SzU&feature=grec_index , like aliens exist or something. http://www.youtube.com/watch?v=sfN-7HczmOU&feature=grec_index and here's a secure site https://familyhistory.hhs.gov, unless you use curl or allow secure connections it will never get a title. <br /> This is a not valid site http://zzzzzzz and this is a dead site http://zwzwzwxzw.com.<br /> Lastly lets try an already made hyperlink and see what it does <a href='http://tacobell.com'>taco bell</a>";
//echo titleHyper($text);

Remove protocl and subdomain from URL

I have a string like this:
http://www.downlinegoldmine.com/viralmarketing
I need to remove http://www. from the string if it exists, as well as http:// if www is not included.
In few words I just need the domain name without any protocol.

parse_url is the perfect tool for the job. You would first call it to split the url in parts, then check the hostname part to see if it starts with www. and strip it, then assemble the url back.
Update: code
echo normalize_url('http://www.downlinegoldmine.com/viralmarketing');
function normalize_url($url) {
$parts = parse_url($url);
unset($parts['scheme']);
if (substr($parts['hostname'], 0, 4) == 'www.') {
$parts['hostname'] = substr($parts['hostname'], 4);
}
if (function_exists('http_build_url')) {
// This PECL extension makes life a lot easier
return http_build_url($parts);
}
// Otherwise it's the hard way
$result = null;
if (!empty($parts['username'])) {
$result .= $parts['username'];
if (!empty($parts['password'])) {
$result .= ':'.$parts['password'];
}
$result .= '#';
}
$result .= $parts['host'].$parts['path'];
if (!empty($parts['query'])) {
$result .= '?'.$parts['query'];
}
if (!empty($parts['fragment'])) {
$result .= '#'.$parts['fragment'];
}
return $result;
}
See it in action.

Just use parse_url (see: http://php.net/manual/de/function.parse-url.php ). It will also incorporate different protocols and paths etc.

$nvar = preg_replace("#http://(www\.)?#i", "", "http://www.downlinegoldmine.com/viralmarketing");
Test:
php> echo preg_replace("#http://(www\.)?#i", "", "http://www.downlinegoldmine.com/viralmarketing");
downlinegoldmine.com/viralmarketing
php> echo preg_replace("#http://(www\.)?#i", "", "http://downlinegoldmine.com/viralmarketing");
downlinegoldmine.com/viralmarketing

There's probably a better way, but:
$url = preg_replace("#^(http://)?(www\\.)?#i", "", $url);

$url = strncmp('http://', $url, 7) ? $url : substr($url, 7);
$url = strncmp('www.', $url, 4) ? $url : substr($url, 4);

You can use the following to remove the https://, http://, and www. from a url.
$url = 'http://www.downlinegoldmine.com/viralmarketing';
echo preg_replace('/https?:\/\/|www./', '', $url);
above returns downlinegoldmine.com/viralmarketing
and you can use the following to remove the urls path as well as the https://, http://, and www..
$url = 'http://www.downlinegoldmine.com/viralmarketing';
echo implode('/', array_slice(explode('/',preg_replace('/https?:\/\/|www./', '', $url)), 0, 1));
above returns downlinegoldmine.com

How to separate possible URI from other content in PHP?

What is the simplest and fastest way to check if string is single URL or TEXT (that might contain urls)
possible scenarios:
// successful scenario
$example[] = 'http://sub-domain.my-domain.com/folder/file.php?some=param';
// successful scenario
$example[] = '/assets/scripts/jquery.min.js?v=1.4';
// successful scenario
$example[] = 'jquery.min.js';
// this scenario should fail validation
$example[] = "http://www.domain.com welcome text\n and some other http://www.domain.com";
// this scenario should fail validation
$example[] = "scriptVar=50;";
I have tried to use native php functions like parse_url, filter_var but non of them work as expected.
UPDATE 1
To make it more clear, I'm trying to separate possible URI from script content that would be inserted as DOM element. All urls would go as SRC attribute and rest as content, example:
<script type="text/javascript" src="{$string}"></script>
<script type="text/javascript">{$string}</script>
UPDATE 2
By analysing possible content I come to conclusion that string containing white space character or semicolon mean that string could not be URI, I presume that this pattern could solve my problem:
preg_match('/[\s]|[;]/', $string);
would it cover all possible javascript/css code?

$exampleData = Array(
'http://sub-domain.my-domain.com/folder/file.php?some=param',
'/assets/scripts/jquery.min.js?v=1.4',
'<a href="/assets/scripts/jquery.min.js?v=1.4">',
'<a href="assets/scripts/jquery.min.js?v=1.4">',
'http://www.domain.com welcome text\n and some other http://www.domain.com',
);
foreach($exampleData as $example)
{
echo "Trying \"" . $example . "\" -> ";
echo (preg_match('%((http(s)?://|www\.)[^ \r\n]+|<a.+?href=(\'|")(http(s)?://|www\.|[^#])[^\4\r\n]*?\4.*?>)%i', $example)) ?
"Match" : "No match";
echo "\r\n";
}
This would produce:
Trying "http://sub-domain.my-domain.com/folder/file.php?some=param" -> Match
Trying "/assets/scripts/jquery.min.js?v=1.4" -> No match
Trying "<a href="/assets/scripts/jquery.min.js?v=1.4">" -> Match
Trying "<a href="assets/scripts/jquery.min.js?v=1.4">" -> Match
Trying "http://www.domain.com welcome text\n and some other http://www.domain.com" -> Match
Update:
After reading your last update. If you want to parse HTML. Use a DOM-parser like:
http://simplehtmldom.sourceforge.net/
Example:
include_once('simple_html_dom.php');
$dom = file_get_html('http://www.stackoverflow.com/');
foreach($dom->find('script') as $scriptElement)
{
if(strlen(trim($scriptElement->src)) > 0)
{
// Script with URI set
echo "<strong>Found script with URI</strong>";
echo "<p>" . $scriptElement->src . "</p>";
}
else
{
// Script with content
echo "<strong>Found script with content</strong>";
echo("<p>" . nl2br(htmlspecialchars($scriptElement->innertext)) . "</p>");
}
}
Would output something like(HTML stripped):
Found script with URI
http://ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.min.js
Found script with URI
http://sstatic.net/js/master.min.js?v=afc76d4deac3
Found script with content
var imagePath='http://sstatic.net/stackoverflow/img/';
var inboxUnviewedCount = -1;
...etc

This function will return true if the passed text is an URL. It is based on a regex seen here on SO.
function validate_url ($url)
{
$regex = '/^(https?|ftp):\/\/'; //protocol
$regex .= '(([a-z0-9$_\.\+!\*\'\(\),;\?&=-]|%[0-9a-f]{2})+'; //username
$regex .= '(:([a-z0-9$_\.\+!\*\'\(\),;\?&=-]|%[0-9a-f]{2})+)?'; //password
$regex .= '#)?'; //auth requires #
$regex .= '((([a-z0-9][a-z0-9-]*[a-z0-9]\.)*'; //domain segments AND
$regex .= '[a-z][a-z0-9-]*[a-z0-9]'; //top level domain OR
$regex .= '|((\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])\.){3}';
$regex .= '(\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])'; //IP address
$regex .= ')(:\d+)?'; //port
$regex .= ')(((\/+([a-z0-9$_\.\+!\*\'\(\),;:#&=-]|%[0-9a-f]{2})*)*'; //path
$regex .= '(\?([a-z0-9$_\.\+!\*\'\(\),;:#&=-]|%[0-9a-f]{2})*)'; //query string
$regex .= '?)?)?'; //path and query string optional
$regex .= '(#([a-z0-9$_\.\+!\*\'\(\),;:#&=-]|%[0-9a-f]{2})*)?'; //fragment
$regex .= '$/i';
return (preg_match($regex, $url) ? true : false);
}
You can try it here: http://www.exorithm.com/algorithm/view/validate_url
EDIT in response to comment, this function will validate URL fragments like /index.php or index.php
function validate_url_fragment ($url)
{
$regex = '/^(((\/?([a-z0-9$_\.\+!\*\'\(\),;:#&=-]|%[0-9a-f]{2})*)*'; //path
$regex .= '(\?([a-z0-9$_\.\+!\*\'\(\),;:#&=-]|%[0-9a-f]{2})*)'; //query string
$regex .= '?)?)?'; //path and query string optional
$regex .= '(#([a-z0-9$_\.\+!\*\'\(\),;:#&=-]|%[0-9a-f]{2})*)?'; //fragment
$regex .= '$/i';
return (preg_match($regex, $url) ? true : false);
}
if (validate_url_fragment($url) || validate_url($url)) {
//is url
} else {
//not url
}
(note that the empty string is valid, so you may want a special case for that)

filter_var should do what you want for a single URL:
<?php
$safe_url = filter_var( $unsafe_url, FILTER_SANITIZE_URL );
?>

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP Check if url in username and replace it - php

Related

Could any one say what is the my mistake in this regex

Preg-replace - replace all URLs except a domain and its subdomains

Php link active

Remove protocl and subdomain from URL

How to separate possible URI from other content in PHP?

Categories

Resources