How to find url using preg_match in php [duplicate] - php

I was wondering how I could check a string broken into an array against a preg_match to see if it started with www. I already have one that check for http://www.
function isValidURL($url)
{
return preg_match('|^http(s)?://[a-z0-9-]+(.[a-z0-9-]+)*(:[0-9]+)?(/.*)?$|i', $url);
}
$stringToArray = explode(" ",$_POST['text']);
foreach($stringToArray as $key=>$val){
$urlvalid = isValidURL($val);
if($urlvalid){
$_SESSION["messages"][] = "NO URLS ALLOWED!";
header("Location: http://www.domain.com/post/id/".$_POST['postID']);
exit();
}
}
Thanks!
Stefan

You want something like:
%^((https?://)|(www\.))([a-z0-9-].?)+(:[0-9]+)?(/.*)?$%i
this is using the | to match either http:// or www at the beginning. I changed the delimiter to % to avoid clashing with the |

John Gruber of Daring Fireball has posted a very comprehensive regex for all types of URLs that may be of interest. You can find it here:
http://daringfireball.net/2010/07/improved_regex_for_matching_urls

I explode the string at first as the url might be half way through it e.g. hello how are you www.google.com
Explode the string and use a foreach statement.
Eg:
$string = "hello how are you www.google.com";
$string = explode(" ", $string);
foreach ($string as $word){
if ( (strpos($word, "http://") === 0) || (strpos($word, "www.") === 0) ){
// Code you want to excute if string is a link
}
}
Note you have to use the === operator because strpos can return, will return a 0 which will appear to be false.

I used this below which allows you to detect url's anywhere in a string. For my particular application it's a contact form to combat spam so no url's are allowed. Works very well.
Link to resource: https://css-tricks.com/snippets/php/find-urls-in-text-make-links/
My implementation;
<?php
// Validate message
if(isset($_POST['message']) && $_POST['message'] == 'Include your order number here if relevant...') {
$messageError = "Required";
} else {
$message = test_input($_POST["message"]);
}
if (strlen($message) > 1000) {
$messageError = "1000 chars max";
}
$reg_exUrl = "/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";
if (preg_match($reg_exUrl, $message)) {
$messageError = "Url's not allowed";
}
// Validate data
function test_input($data) {
$data = trim($data);
$data = stripslashes($data);
$data = htmlspecialchars($data);
return $data;
}
?>

Try implode($myarray, '').strstr("www.")==0. That implodes your array into one string, then checks whether www. is at the beginning of the string (index 0).

Related

Find URL in string and turn into a link

I'm using the code given on this page to look through a string and turn the URL into an HTML link.
It works quite well, but there is a little issue with the "replace" part of it.
The problem occurs when I have almost identical links. For example:
https://example.com/page.php?goto=200
and
https://example.com/page.php
Everything will be fine with the first link, but the second will create a <a> tag in the first <a> tag.
First run
https://example.com/page.php?goto=200
Second
https://example.com/page.php?goto=200">https://example.com/page.php?goto=200</a>
Because it's also replacing the html link just created.
How do I avoid this?
<?php
function turnUrlIntoHyperlink($string){
//The Regular Expression filter
$reg_exUrl = "/(?i)\b((?:https?:\/\/|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))/";
// Check if there is a url in the text
if(preg_match_all($reg_exUrl, $string, $url)) {
// Loop through all matches
foreach($url[0] as $newLinks){
if(strstr( $newLinks, ":" ) === false){
$link = 'http://'.$newLinks;
}else{
$link = $newLinks;
}
// Create Search and Replace strings
$search = $newLinks;
$replace = ''.$link.'';
$string = str_replace($search, $replace, $string);
}
}
//Return result
return $string;
}
?>
You need to add a whitespace identifier \s in your regex at the start, also remove \b because \b only returns the last match.
You regex can written as:
$reg_exUrl = "/(?i)\s((?:https?:\/\/|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))/"
check this one: https://regex101.com/r/YFQPlZ/1
I have change the replace part a bit, since I couldn't get the suggested regex to work.
Maybe it can be done better, but I'm still learning :)
function turnUrlIntoHyperlink($string){
//The Regular Expression filter
$reg_exUrl = "/(?i)\b((?:https?:\/\/|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))/";
// Check if there is a url in the text
if(preg_match_all($reg_exUrl, $string, $url)) {
// Loop through all matches
foreach($url[0] as $key => $newLinks){
if(strstr( $newLinks, ":" ) === false){
$url = 'https://'.$newLinks;
}else{
$url = $newLinks;
}
// Create Search and Replace strings
$replace .= ''.$url.',';
$newLinks = '/'.preg_quote($newLinks, '/').'/';
$string = preg_replace($newLinks, '{'.$key.'}', $string, 1);
}
$arr_replace = explode(',', $replace);
foreach ($arr_replace as $key => $link) {
$string = str_replace('{'.$key.'}', $link, $string);
}
}
//Return result
return $string;
}

PHP code to create a negative word dictionary and search if a post has negative words

I'm trying to develop a PHP application where it takes comments from users and then match the string to check if the comment is positive or negative. I have list of negative words in negative.txt file. If a word is matched from the word list, then I want a simple integer counter to increment by 1. I tried the some links and created the a code to check if the comment has is negative or positive but it is only matching the last word of the file.Here's the code what i have done.
<?php
function teststringforbadwords($comment)
{
$file="BadWords.txt";
$fopen = fopen($file, "r");
$fread = fread($fopen,filesize("$file"));
fclose($fopen);
$newline_ele = "\n";
$data_split = explode($newline_ele, $fread);
$new_tab = "\t";
$outoutArr = array();
//process uploaded file data and push in output array
foreach ($data_split as $string)
{
$row = explode($new_tab, $string);
if(isset($row['0']) && $row['0'] != ""){
$outoutArr[] = trim($row['0']," ");
}
}
//---------------------------------------------------------------
foreach($outoutArr as $word) {
if(stristr($comment,$word)){
return false;
}
}
return true;
}
if(isset($_REQUEST["submit"]))
{
$comments = $_REQUEST["comments"];
if (teststringforbadwords($comments))
{
echo 'string is clean';
}
else
{
echo 'string contains banned words';
}
}
?>
Link Tried : Check a string for bad words?
I added the strtolower function around both your $comments and your input from the file. That way if someone spells STUPID, instead of stupid, the code will still detect the bad word.
I also added trim to remove unnecessary and disruptive whitespace (like newline).
Finally, I changed the way how you check the words. I used a preg_match to split about all whitespace so we are checking only full words and don't accidentally ban incorrect strings.
<?php
function teststringforbadwords($comment)
{
$comment = strtolower($comment);
$file="BadWords.txt";
$fopen = fopen($file, "r");
$fread = strtolower(fread($fopen,filesize("$file")));
fclose($fopen);
$newline_ele = "\n";
$data_split = explode($newline_ele, $fread);
$new_tab = "\t";
$outoutArr = array();
//process uploaded file data and push in output array
foreach ($data_split as $bannedWord)
{
foreach (preg_split('/\s+/',$comment) as $commentWord) {
if (trim($bannedWord) === trim($commentWord)) {
return false;
}
}
}
return true;
}
1) Your storing $row['0'] only why not others index words. So problem is your ignoring some of word in text file.
Some suggestion
1) Insert the text in text file one by one i.e new line like this so you can access easily explode by newline to avoiding multiple explode and loop.
Example: sss.txt
...
bad
stupid
...
...
2) Apply trim and lowercase function to both comment and bad string.
Hope it will work as expected
function teststringforbadwords($comment)
{
$file="sss.txt";
$fopen = fopen($file, "r");
$fread = fread($fopen,filesize("$file"));
fclose($fopen);
foreach(explode("\n",$fread) as $word)
{
if(stristr(strtolower(trim($comment)),strtolower(trim($word))))
{
return false;
}
}
return true;
}

PHP Search for a string in remote webpage

I am trying to write a script with PHP where it'll open up a text file ./urls.txt and check each domain for a specific word. I am really new to PHP.
Example:
Look for the word "Hello" in the following domains.
List:
Domain1.LTD
Domain2.LTD
Domain3.LTD
and just simply print out domain name + valid/invalid.
<?PHP
$link = "http://yahoo.com"; //not sure how to loop to read each line from a file.
$linkcontents = file_get_contents($link);
$needle = "Hello";
if (strpos($linkcontents, $needle) == false) {
echo "Valid";
} else {
echo "Invalid";
}
?>
$arrayOfLinks = array(
"http://example.com/file.txt",
"https://www.example-site-2.com/files/file.txt"
);
$needle = "Hello";
foreach($arrayOfLinks as $link){ // loop through the array
$linkcontents = file_get_contents($link);
if (stripos($linkcontents , $needle) !== false) { // stripos is case-insensitive search
// the needle exists in $linkcontents
// !== false instead of != false since stripos can return 0 meaning the needle is the first word of the contents
echo "Valid";
} else {
// the word does not exist in the given text
echo "Invalid";
}
}
First of all use CURL in favor of file_get_contents() because of security.
This would be a correct strpos example:
//your previous code
if (strpos($linkcontents, $needle) !== false) { // see the !==
echo "Valid"; //Needle found
} else {
echo "Invalid"; //Needle not found
}
For more complex crawling you could use some regexp instead of strpos.

Check and get the text after "near " in php

When someone writes:
"Near Tokyo"
I would like to check first if the $search contains "near" and if it does then take the "Tokyo" into a variable $location.
I tried this:
if(strpos($search, 'near') == true){
$search = explode("near ", $location);
echo $location;
exit();
}
did not work, it does not execute the if statement
You have multiple bugs here:
strpos may return 0, which signifies a match but will not compare equal to true
strpos is case-sensitive, which would make your example not work (look into stripos instead)
explode is also case-sensitive
It would probably be easiest to use a regex for this:
$input = "Near Tokyo";
if (preg_match('/near\s+(\w+)/i', $input, $matches)) {
echo "Near: ".$matches[1]."\n";
}
else {
echo "No match.\n";
}
See it in action.
This particular regex will only match the next word after "near", but this can be modified to suit your requirements.
returntype of strpos is int, not bool
http://php.net/manual/en/function.strpos.php
so use (according to manual pages) this:
if(strpos($search, 'near') !== false)
change it to:
if(strpos($search, 'near') !== false){
$search = explode("near ", $location);
echo $location;
exit();
}
just take a look at the documentation where this behaviour is explained:
It is easy to mistake the return values for "character found at
position 0" and "character not found". Here's how to detect the
difference:
<?php
$pos = strrpos($mystring, "b");
if ($pos === false) { // note: three equal signs
// not found...
}
?>
Yes, strpos, cumbersome boolean result handling. You probably should or should want to use stristr instead, which is also case-insensitive:
if (stristr($search, "Near")) {
And since you are extracting text anyway, why not use a regex? (People are using the awful explode workaround way too often.)
if (preg_match("'Near (\S+)'i", $search, $match)) {
echo $match[1];
}
use this to get a result:
if(strpos($search, 'near') !== false){
$location = explode("near ", $search);
print_r($location);
exit();
}
$search = "Near Tokyo";
if(strpos($search, 'Near') === 0){
$location = explode("Near ", $search);
echo $location[1];
exit();
}
<?php
$search = "Near Tokoyo";
if(preg_match("/near ([a-z]+)/i", $search, $match))
{
$location = $match[1];
echo $location;
}
?>
EDIT:
Fixed some bugs in your code.
if(stripos($search, 'near') !== false){
$location = explode("near ", $search);
echo $location[0];
exit();
}

Detecting a url using preg_match? without http:// in the string

I was wondering how I could check a string broken into an array against a preg_match to see if it started with www. I already have one that check for http://www.
function isValidURL($url)
{
return preg_match('|^http(s)?://[a-z0-9-]+(.[a-z0-9-]+)*(:[0-9]+)?(/.*)?$|i', $url);
}
$stringToArray = explode(" ",$_POST['text']);
foreach($stringToArray as $key=>$val){
$urlvalid = isValidURL($val);
if($urlvalid){
$_SESSION["messages"][] = "NO URLS ALLOWED!";
header("Location: http://www.domain.com/post/id/".$_POST['postID']);
exit();
}
}
Thanks!
Stefan
You want something like:
%^((https?://)|(www\.))([a-z0-9-].?)+(:[0-9]+)?(/.*)?$%i
this is using the | to match either http:// or www at the beginning. I changed the delimiter to % to avoid clashing with the |
John Gruber of Daring Fireball has posted a very comprehensive regex for all types of URLs that may be of interest. You can find it here:
http://daringfireball.net/2010/07/improved_regex_for_matching_urls
I explode the string at first as the url might be half way through it e.g. hello how are you www.google.com
Explode the string and use a foreach statement.
Eg:
$string = "hello how are you www.google.com";
$string = explode(" ", $string);
foreach ($string as $word){
if ( (strpos($word, "http://") === 0) || (strpos($word, "www.") === 0) ){
// Code you want to excute if string is a link
}
}
Note you have to use the === operator because strpos can return, will return a 0 which will appear to be false.
I used this below which allows you to detect url's anywhere in a string. For my particular application it's a contact form to combat spam so no url's are allowed. Works very well.
Link to resource: https://css-tricks.com/snippets/php/find-urls-in-text-make-links/
My implementation;
<?php
// Validate message
if(isset($_POST['message']) && $_POST['message'] == 'Include your order number here if relevant...') {
$messageError = "Required";
} else {
$message = test_input($_POST["message"]);
}
if (strlen($message) > 1000) {
$messageError = "1000 chars max";
}
$reg_exUrl = "/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";
if (preg_match($reg_exUrl, $message)) {
$messageError = "Url's not allowed";
}
// Validate data
function test_input($data) {
$data = trim($data);
$data = stripslashes($data);
$data = htmlspecialchars($data);
return $data;
}
?>
Try implode($myarray, '').strstr("www.")==0. That implodes your array into one string, then checks whether www. is at the beginning of the string (index 0).

Categories