php script to recognize text from link

php script to recognize text from link - php

I was trying to create a script in php, for displaying messages. If the messages includes a web address, then this address I wanted to be displayed as a link. This is my code that works successfuly:
<?php
if( (substr( $message, 0, 8 ) === "https://") || (substr( $message, 0, 7 ) === "http://") ){
echo "<a href='$message' target='_blank'> $message </a>";
}else{
echo " $message ";
}
?>
It is working perfect if the user inserts in message a web address only like: "http://google.com" The problem starts if the users inserts a text before or after the web address. For example if writes: "visit http://google.com site" then it is making all the phrase as a link and it does not recognises the words with the web address. Any idea how to fix this problem?

You may use filter_var with FILTER_VALIDATE_URL:
$words = explode(" ", $message);
$_words = array();
foreach($words as $word){
if(filter_var($word, FILTER_VALIDATE_URL) === false){
$_words[] = $word;
}
else{
$_words[] = "$word";
}
}
echo implode(" ", $_words);
Demo: http://phpfiddle.org/main/code/mu8-vg5

I use this within a class:
public static function CreateLinks($text) {
return preg_replace('#(https?://([-\w.]+[-\w])+(:\d+)?(/([\w-.~:/?#\[\]\#!$&\'()*+,;=%]*)?)?)#', '$1', $text);
}
To use it without a class, do this:
$message = preg_replace('#(https?://([-\w.]+[-\w])+(:\d+)?(/([\w-.~:/?#\[\]\#!$&\'()*+,;=%]*)?)?)#', '$1', $message);
So in a test case this:
$message = "Hello, take a look at http://www.google.com or wait! Maybe you where looking for http://www.bing.com";
$message = preg_replace('#(https?://([-\w.]+[-\w])+(:\d+)?(/([\w-.~:/?#\[\]\#!$&\'()*+,;=%]*)?)?)#', '$1', $message);
echo $message;
Will output:
Hello, take a look at http://www.google.com or wait! Maybe you where looking for http://www.bing.com
So, at the end your code can be replaced by just one single line! Replace this:
if( (substr( $message, 0, 8 ) === "https://") || (substr( $message, 0, 7 ) === "http://") ){
echo "<a href='$message' target='_blank'> $message </a>";
}else{
echo " $message ";
}
by this:
$message = preg_replace('#(https?://([-\w.]+[-\w])+(:\d+)?(/([\w-.~:/?#\[\]\#!$&\'()*+,;=%]*)?)?)#', '$1', $message);

Jan Goyvaerts, Regex Guru, describe pretty well in his blog
http://www.regexguru.com/2008/11/detecting-urls-in-a-block-of-text/
To find all matches in a multi-line string, use
preg_match_all('/\b(?:(?:https?|ftp|file):\/\/|www\.|ftp\.)[-A-Z0-9+&##\/%=~_|$?!:,.]*[A-Z0-9+&##\/%=~_|$]/i', $subject, $result, PREG_PATTERN_ORDER);
$result = $result[0];
you can use preg_match function to get single match

Related

preg_replace replaces url with hyperlink between [link] and [/link] but link can't cotain ? or ()

My code:
preg_match_all('(\[(link)\](.*?)\[/(link)\])', $message, $matches);
$matches = $matches[2];
foreach($matches as $match){
//CHECK LINK AND VERIFY
$message = preg_replace('(\[(link)\]('.$match.')\[/(link)\])', ''.$match.'', $message);
}
As you can see here https://mcskripts.dk/forum/id/286
The script works, but it can't replace links containing () or ?
Anyway i can fix that?
Sry, if i make a repost, just don't know if i can comment on old posts, and get a response.

You could use a single preg_replace:
$message = preg_replace('~\[link\](.+?)\[/link\]~', '$1', $message);
If you want to validate the links before replace, use preg_replace_callback:
$message = preg_replace_callback(
'~\[link\](.+?)\[/link\]~',
function($match) {
# call your function to validate the link
if (validate_link($match[1])) {
return ''.$match[1].'';
} else {
return 'What you want when validation fail!';
}
},
$message
);

Regular expressions in PHP / find "<a> </a>"

I want to check a textarea. If the user enter some links in the textarea, php should automatically tag the links. I'm using this code:
$message = "text with some link within";
$url = '#(?!<a[^>]*?>)(http)?(s)?(://)?(([a-zA-Z])([-\w]+\.)+([^\s\.]+[^\s]*)+[^,.\s])(?![^<]*?</a>)#';
if(preg_match($url, $message) == 1){
$message = preg_replace($url, '$0', $message);
}
The problem is, when there's already a tagged link (with an "a" tag), regex is destroying the link.
Here is an example:
first input from textarea: Hello .... test.com
changed by regex: Hello ... test.com
this is working fine, but if you update this:
Hello ... http://test.com" target="_blank" rel="nofollow" title="test.com" target="_blank" rel="nofollow" title="test.com">test.com">test.com">test.com
Thanks for your help!

I'm not familiar with PHP and maybe this is not a good pattern for url validation, but the point is if there is already an "a" tag, the text is not replaced.
<?php
$message = array(
'Hello ... test.com',
"Hello .... http://www.test.com ..."
);
$url = '#(<a[^>]*>[^<]+</a>|((https?://)?[\w\.-]+\.[a-zA-Z]{2,3}[^\s\W]*))#';
foreach ($message as $msg) {
preg_match($url, $msg, $matches);
if(preg_match($url, $msg) == 1 && count($matches) > 2) {
$msg = preg_replace($url, '$0', $msg);
}
echo $msg.PHP_EOL;
}
// Output:
// Hello ... test.com
// Hello .... http://www.test.com ...
Hope it helps.

Php Regex - How to pick if equals something

class Something
{
public static function compile(&$subject, $replace, $with) {
$placeholders = array_combine($replace, $with);
$condition = '{[a-z0-9\_\- ]+:[a-z_]+}';
$inner = '((?:(?!{/?if).)*?)';
$pattern = '#{if\s?('.$condition.')}'.$inner.'{/if}#is';
while (preg_match($pattern, $subject, $match)) {
$placeholder = $match[1];
$content = $match[2];
// if empty value remove whole line
// else show line but remove pseudo-code
$subject = preg_replace($pattern,
empty($placeholders[$placeholder]) ? '' : addcslashes($content, '$'),
$subject,
1);
}
}
}
I have a html code area to play on. Dompdf handles the rest for converting my form to pdf. This class code was almost premade. It gives me freedom to use ;
{if {dil:value}} <div class="english">ENGLISH</div> {/if}
something like that in html area. But it was only checking if radio button is empty or not. But I want to learn which option is selected. So I want to use a code like ;
{if {dil:value}=='ENGLISH'} <div class="english">ENGLISH</div> {/if}
I converted empty check line with manual equality check to see if it works;
// if empty value remove whole line
// else show line but remove pseudo-code
$subject = preg_replace($pattern,
$placeholders[$placeholder]=='english' ? '' : addcslashes($content, '$'),
$subject,
1);
And it worked but without a freedom of course(only if radio equals to english). I'm so new to regex, so I couldn't figure it out. I tried to add $equality variable so I would use my code many times with different checks;
class Something
{
public static function compile(&$subject, $replace, $with) {
$placeholders = array_combine($replace, $with);
$condition = '{[a-z0-9\_\- ]+:[a-z_]+}';
$inner = '((?:(?!{/?if).)*?)';
$equality = '(?<=~)[^}]*(?=~)';
$pattern = '#{if\s?('.$condition.')'.$equality.'}'.$inner.'{/if}#is';
while (preg_match($pattern, $subject, $match)) {
$placeholder = $match[1];
$content = $match[2];
// if empty value remove whole line
// else show line but remove pseudo-code
$subject = preg_replace($pattern,
$placeholders[$placeholder]==$equality ? '' : addcslashes($content, '$'),
$subject,
1);
}
}
}
with this code on html area;
{if {dil:value}~'ENGLISH'~} <div class="english">ENGLISH</div> {/if}
I really believed that it would work, but it didn't :) Even pattern didn't get erase after converting to pdf so I can read my code under the pdf file.
I may be missing something about $match too. With equality variable included so $match might get third array I guess. So I also tried to add this but that didn't work too;
while (preg_match($pattern, $subject, $match)) {
$placeholder = $match[1];
$equality = $match[2];
$content = $match[3];
Well my goal is to put check icons on to specific box areas on premade designed form image. I handled all textboxes with absolute positions except getting which radio button is checked.
Thank you for all your help.

I solved it like this;
class Something
{
public static function compile(&$subject, $replace, $with) {
$placeholders = array_combine($replace, $with);
$condition = '{[a-z0-9\_\- ]+:[a-z_]+}';
$inner = '((?:(?!{\/?if).)*?)';
$equality = '(~[A-ZÇŞĞÜÖİçşğüöı]+~)';
$pattern = '#{if\s?('.$condition.')}'.$equality.''.$inner.'{\/if}#is';
while (preg_match($pattern, $subject, $match)) {
$placeholder = $match[1];
$equality = $match[2];
$content = $match[3];
$equality = substr($equality, 1, -1);
// if empty value remove whole line
// else show line but remove pseudo-code
$subject = preg_replace($pattern,
$placeholders[$placeholder] == $equality ? '' : addcslashes($content, '$'),
$subject,
1);
}
}
}
I changed my equality regex to support Utf-8 characters plus changed other regexes a bit. There were non excaped slashes. And finally I can use it at html like this;
{if {language:value}}~FRENCH~<div class="english">ENGLISH</div>{/if}
So first match gets radio button value, second match is my equality check, and third match is what will be written if equality is wrong. Yep I know it sounds a little bit reversed mode :) So it probably won't work with radio buttons which have more than 2 options. But in my situations my all radio buttons have 2 options.
So far so good, I kinda learnt regex today :)

PHP find all links in the text

I want to find all links in the text like this:
Test text http://hello.world Test text
http://google.com/file.jpg Test text https://hell.o.wor.ld/test?qwe=qwe Test text
test text http://test.test/test
I know i need to use preg_match_all, but have only idea in the head: start search from http|https|ftp and end search where space or end of the text appears, thats all i need really, so all links wiil be found properly.
Anyone can help me with php regexp pattern?
I think i need to use assertions in the end of pattern, but can`t understand their properly usage for now.
Any ideas? Thanx!

I'd go with something simple like ~[a-z]+://\S+~i
starts with protocol [a-z]+://
\S+ followed by one or more non-whitespaces where \S is a shorthand for [^ \t\r\n\f]
used modifier i (PCRE_CASELESS) (possibly not really necessery)
So it could look like this:
$pattern = '~[a-z]+://\S+~';
$str = 'Test text http://hello.world Test text
http://google.com/file.jpg Test text https://hell.o.wor.ld/test?qwe=qwe Test text
test text http://test.test/test';
if($num_found = preg_match_all($pattern, $str, $out))
{
echo "FOUND ".$num_found." LINKS:\n";
print_r($out[0]);
}
outputs:
FOUND 4 LINKS:
Array
(
[0] => http://hello.world
[1] => http://google.com/file.jpg
[2] => https://hell.o.wor.ld/test?qwe=qwe
[3] => http://test.test/test
)
Test on eval.in

function turnUrlIntoHyperlink($string){
//The Regular Expression filter
$reg_exUrl = "/(?i)\b((?:https?:\/\/|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))/";
// Check if there is a url in the text
if(preg_match_all($reg_exUrl, $string, $url)) {
// Loop through all matches
foreach($url[0] as $newLinks){
if(strstr( $newLinks, ":" ) === false){
$link = 'http://'.$newLinks;
}else{
$link = $newLinks;
}
// Create Search and Replace strings
$search = $newLinks;
$replace = ''.$link.'';
$string = str_replace($search, $replace, $string);
}
}
//Return result
return $string;
}

<?php
// The Regular Expression filter
$reg_exUrl = "/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";
// The Text you want to filter for urls
$text = "The text you want to filter goes here. http://google.com";
// Check if there is a url in the text
if(preg_match($reg_exUrl, $text, $url)) {
// make the urls hyper links
echo preg_replace($reg_exUrl, "{$url[0]} ", $text);
} else {
// if no urls in the text just return the text
echo $text;
}
?>
Reference:http://css-tricks.com/snippets/php/find-urls-in-text-make-links/

Works like a charm. use this.
$str= "Test text http://hello.world";
preg_match_all('/\b(?:(?:https?|ftp|file):\/\/|www\.|ftp\.)[-A-Z0-9+&##\/%=~_|$?!:,.]*[A-Z0-9+&##\/%=~_|$]/i', $str, $result, PREG_PATTERN_ORDER);
print_r($result[0]);

The suggested answers are great, but one of them miss www. case, the other http://
So, let's combine all of those:
$text = Test text http://hello.world Test text
http://google.com/file.jpg Test text https://hell.o.wor.ld/test?qwe=qwe Test text
test text http://test.test/test
preg_match_all('/(((http|https|ftp|ftps)\:\/\/)|(www\.))[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\:[0-9]+)?(\/\S*)?/', $text, $results, PREG_PATTERN_ORDER);
print_r($results[0]);
The return value for PREG_PATTERN_ORDER will be Array of Arrays (results) so that $results[0] is an array of full pattern matches, $results[1] is an array of strings matched by the first parenthesized subpattern, and so on.

function turnUrlIntoHyperlink($string)
{
// The Regular Expression filter
$reg_exUrl = "/(http|https|ftp|ftps)://[a-zA-Z0-9-.]+.[a-zA-Z]{2,3}(/\S*)?/";
// Check if there is a url in the text
if (preg_match($reg_exUrl, $string, $url)) {
// make the urls hyper links
echo preg_replace($reg_exUrl, "<a target='_blank' href='{$url[0]}'>{$url[0]}</a>", $string);
} else {
// if no urls in the text just return the text
echo $string;
}
}

For converting URLs to tags, and recognizing URLs without http/https, try the below. It uses preg_replace_callback to avoid the issue in one of the other answers with the same URL appearing multiple times:
private function convertUrls($string) {
$url_pattern = '/(((http|https)\:\/\/)|(www\.))[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,}(\:[0-9]+)?(\/\S*)?/';
return preg_replace_callback($url_pattern,
function($matches) {
$match = $matches[0];
if (strstr($match, ":") === false) {
$url = "https://$match";
} else {
$url = $match;
}
return '' . $url . '';
},
$string);
}

Alternative to regexp it´s use this library
Works very good, butnot for very complex codes.
foreach($html->find('a') as $element)
echo $element->href . '<br>';
And easy to use. No regular expressions skills required:-)

Not regexp, but finds it all and makes sure that they are not already encompassed in a tag already. It also checks to make sure that the link isn't encapsulated in (), [], "" or anything else with an open and close.
$txt = "Test text http://hello.world Test text
http://google.com/file.jpg Test text https://hell.o.wor.ld/test?qwe=qwe Test text
test text http://test.test/test I am already linked up
It was also done in 1927 (http://test.com/reference) Also check this out:http://test/index&t=27";
$holder = explode("http",$txt);
for($i = 1; $i < (count($holder));$i++) {
if (substr($holder[$i-1],-6) != 'href="') { // this means that the link is not alread in an a tag.
if (strpos($holder[$i]," ")!==false) //if the link is not the last item in the text block, stop at the first space
$href = substr($holder[$i],0,strpos($holder[$i]," "));
else //else it is the last item, take it
$href = $holder[$i];
if (ctype_punct(substr($holder[$i-1],strlen($holder[$i-1])-1)) && ctype_punct(substr($holder[$i],strlen($holder[$i])-1)))
$href = substr($href,0,-1); //if both the fron and back of the link are encapsulated in punctuation, truncate the link by one
$holder[$i] = implode("$href\" target=\"_blank\" class=\"link\">http$href</a>",explode($href,$holder[$i]));
$holder[$i-1] .= "<a href=\"";
}
}
$txt = implode("http",$holder);
echo $txt;
Output:
Test text http://hello.world Test text
http://google.com/file.jpg Test text https://hell.o.wor.ld/test?qwe=qwe Test text
test text http://test.test/test I am already linked up
It was also done in 1927 (http://test.com/reference) Also check this out:http://test/index&t=27

i use this function
<?php
function deteli($string){
$pos = strpos($string, 'http');
$spos = strpos($string, ' ', $pos);
$lst = $spos - $pos;
$bef = substr($string, 0, $pos);
$aft = substr($string, $spos);
if ($pos == true || $pos == 0) {
$link = substr($string, $pos, $lst);
$res = $bef . "<a href='" . $link . "' class='link' target='_blank'>link</a>" . $aft . "";
return $res;
}
else{
return $string;
}
}?>

How to separate possible URI from other content in PHP?

What is the simplest and fastest way to check if string is single URL or TEXT (that might contain urls)
possible scenarios:
// successful scenario
$example[] = 'http://sub-domain.my-domain.com/folder/file.php?some=param';
// successful scenario
$example[] = '/assets/scripts/jquery.min.js?v=1.4';
// successful scenario
$example[] = 'jquery.min.js';
// this scenario should fail validation
$example[] = "http://www.domain.com welcome text\n and some other http://www.domain.com";
// this scenario should fail validation
$example[] = "scriptVar=50;";
I have tried to use native php functions like parse_url, filter_var but non of them work as expected.
UPDATE 1
To make it more clear, I'm trying to separate possible URI from script content that would be inserted as DOM element. All urls would go as SRC attribute and rest as content, example:
<script type="text/javascript" src="{$string}"></script>
<script type="text/javascript">{$string}</script>
UPDATE 2
By analysing possible content I come to conclusion that string containing white space character or semicolon mean that string could not be URI, I presume that this pattern could solve my problem:
preg_match('/[\s]|[;]/', $string);
would it cover all possible javascript/css code?

$exampleData = Array(
'http://sub-domain.my-domain.com/folder/file.php?some=param',
'/assets/scripts/jquery.min.js?v=1.4',
'<a href="/assets/scripts/jquery.min.js?v=1.4">',
'<a href="assets/scripts/jquery.min.js?v=1.4">',
'http://www.domain.com welcome text\n and some other http://www.domain.com',
);
foreach($exampleData as $example)
{
echo "Trying \"" . $example . "\" -> ";
echo (preg_match('%((http(s)?://|www\.)[^ \r\n]+|<a.+?href=(\'|")(http(s)?://|www\.|[^#])[^\4\r\n]*?\4.*?>)%i', $example)) ?
"Match" : "No match";
echo "\r\n";
}
This would produce:
Trying "http://sub-domain.my-domain.com/folder/file.php?some=param" -> Match
Trying "/assets/scripts/jquery.min.js?v=1.4" -> No match
Trying "<a href="/assets/scripts/jquery.min.js?v=1.4">" -> Match
Trying "<a href="assets/scripts/jquery.min.js?v=1.4">" -> Match
Trying "http://www.domain.com welcome text\n and some other http://www.domain.com" -> Match
Update:
After reading your last update. If you want to parse HTML. Use a DOM-parser like:
http://simplehtmldom.sourceforge.net/
Example:
include_once('simple_html_dom.php');
$dom = file_get_html('http://www.stackoverflow.com/');
foreach($dom->find('script') as $scriptElement)
{
if(strlen(trim($scriptElement->src)) > 0)
{
// Script with URI set
echo "<strong>Found script with URI</strong>";
echo "<p>" . $scriptElement->src . "</p>";
}
else
{
// Script with content
echo "<strong>Found script with content</strong>";
echo("<p>" . nl2br(htmlspecialchars($scriptElement->innertext)) . "</p>");
}
}
Would output something like(HTML stripped):
Found script with URI
http://ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.min.js
Found script with URI
http://sstatic.net/js/master.min.js?v=afc76d4deac3
Found script with content
var imagePath='http://sstatic.net/stackoverflow/img/';
var inboxUnviewedCount = -1;
...etc

This function will return true if the passed text is an URL. It is based on a regex seen here on SO.
function validate_url ($url)
{
$regex = '/^(https?|ftp):\/\/'; //protocol
$regex .= '(([a-z0-9$_\.\+!\*\'\(\),;\?&=-]|%[0-9a-f]{2})+'; //username
$regex .= '(:([a-z0-9$_\.\+!\*\'\(\),;\?&=-]|%[0-9a-f]{2})+)?'; //password
$regex .= '#)?'; //auth requires #
$regex .= '((([a-z0-9][a-z0-9-]*[a-z0-9]\.)*'; //domain segments AND
$regex .= '[a-z][a-z0-9-]*[a-z0-9]'; //top level domain OR
$regex .= '|((\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])\.){3}';
$regex .= '(\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])'; //IP address
$regex .= ')(:\d+)?'; //port
$regex .= ')(((\/+([a-z0-9$_\.\+!\*\'\(\),;:#&=-]|%[0-9a-f]{2})*)*'; //path
$regex .= '(\?([a-z0-9$_\.\+!\*\'\(\),;:#&=-]|%[0-9a-f]{2})*)'; //query string
$regex .= '?)?)?'; //path and query string optional
$regex .= '(#([a-z0-9$_\.\+!\*\'\(\),;:#&=-]|%[0-9a-f]{2})*)?'; //fragment
$regex .= '$/i';
return (preg_match($regex, $url) ? true : false);
}
You can try it here: http://www.exorithm.com/algorithm/view/validate_url
EDIT in response to comment, this function will validate URL fragments like /index.php or index.php
function validate_url_fragment ($url)
{
$regex = '/^(((\/?([a-z0-9$_\.\+!\*\'\(\),;:#&=-]|%[0-9a-f]{2})*)*'; //path
$regex .= '(\?([a-z0-9$_\.\+!\*\'\(\),;:#&=-]|%[0-9a-f]{2})*)'; //query string
$regex .= '?)?)?'; //path and query string optional
$regex .= '(#([a-z0-9$_\.\+!\*\'\(\),;:#&=-]|%[0-9a-f]{2})*)?'; //fragment
$regex .= '$/i';
return (preg_match($regex, $url) ? true : false);
}
if (validate_url_fragment($url) || validate_url($url)) {
//is url
} else {
//not url
}
(note that the empty string is valid, so you may want a special case for that)

filter_var should do what you want for a single URL:
<?php
$safe_url = filter_var( $unsafe_url, FILTER_SANITIZE_URL );
?>

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

php script to recognize text from link - php

Related

preg_replace replaces url with hyperlink between [link] and [/link] but link can't cotain ? or ()

Regular expressions in PHP / find "<a> </a>"

Php Regex - How to pick if equals something

PHP find all links in the text

How to separate possible URI from other content in PHP?

Categories

Resources