Firstly, sorry my bad English.
I'm taking text from a site with Curl. After that, I am searching in text some words with strpos().
But; when text coming with quotes, my function is not working. For example. Text is coming with curl without quotes;
$text = "This is my text.";
$intext = "This is";
if (strpos($text, $intext) !== false) {
echo "OK";
}
Okey, page give me "OK", now my codes working.
But, when text coming with quotes like this:
$text = "This's my text.";
$intext = "This's my";
if (strpos($text, $intext) !== false) {
echo "OK";
} else {
echo "NO";
}
The page gives me: "NO"!
Why? I think the quotation mark data from the website is different. How can I fix this problem? I need to compare without clearing punctuation.
I fixed problem with this code;
$str = str_replace(' ', ' ', $text);
$str = html_entity_decode($str, ENT_QUOTES | ENT_COMPAT , 'UTF-8');
$str = html_entity_decode($str, ENT_HTML5, 'UTF-8');
$str = html_entity_decode($str);
$str = htmlspecialchars_decode($str);
$text = strip_tags($str);
Thanks.
I want to find all links in the text like this:
Test text http://hello.world Test text
http://google.com/file.jpg Test text https://hell.o.wor.ld/test?qwe=qwe Test text
test text http://test.test/test
I know i need to use preg_match_all, but have only idea in the head: start search from http|https|ftp and end search where space or end of the text appears, thats all i need really, so all links wiil be found properly.
Anyone can help me with php regexp pattern?
I think i need to use assertions in the end of pattern, but can`t understand their properly usage for now.
Any ideas? Thanx!
I'd go with something simple like ~[a-z]+://\S+~i
starts with protocol [a-z]+://
\S+ followed by one or more non-whitespaces where \S is a shorthand for [^ \t\r\n\f]
used modifier i (PCRE_CASELESS) (possibly not really necessery)
So it could look like this:
$pattern = '~[a-z]+://\S+~';
$str = 'Test text http://hello.world Test text
http://google.com/file.jpg Test text https://hell.o.wor.ld/test?qwe=qwe Test text
test text http://test.test/test';
if($num_found = preg_match_all($pattern, $str, $out))
{
echo "FOUND ".$num_found." LINKS:\n";
print_r($out[0]);
}
outputs:
FOUND 4 LINKS:
Array
(
[0] => http://hello.world
[1] => http://google.com/file.jpg
[2] => https://hell.o.wor.ld/test?qwe=qwe
[3] => http://test.test/test
)
Test on eval.in
function turnUrlIntoHyperlink($string){
//The Regular Expression filter
$reg_exUrl = "/(?i)\b((?:https?:\/\/|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))/";
// Check if there is a url in the text
if(preg_match_all($reg_exUrl, $string, $url)) {
// Loop through all matches
foreach($url[0] as $newLinks){
if(strstr( $newLinks, ":" ) === false){
$link = 'http://'.$newLinks;
}else{
$link = $newLinks;
}
// Create Search and Replace strings
$search = $newLinks;
$replace = ''.$link.'';
$string = str_replace($search, $replace, $string);
}
}
//Return result
return $string;
}
<?php
// The Regular Expression filter
$reg_exUrl = "/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";
// The Text you want to filter for urls
$text = "The text you want to filter goes here. http://google.com";
// Check if there is a url in the text
if(preg_match($reg_exUrl, $text, $url)) {
// make the urls hyper links
echo preg_replace($reg_exUrl, "{$url[0]} ", $text);
} else {
// if no urls in the text just return the text
echo $text;
}
?>
Reference:http://css-tricks.com/snippets/php/find-urls-in-text-make-links/
Works like a charm. use this.
$str= "Test text http://hello.world";
preg_match_all('/\b(?:(?:https?|ftp|file):\/\/|www\.|ftp\.)[-A-Z0-9+&##\/%=~_|$?!:,.]*[A-Z0-9+&##\/%=~_|$]/i', $str, $result, PREG_PATTERN_ORDER);
print_r($result[0]);
The suggested answers are great, but one of them miss www. case, the other http://
So, let's combine all of those:
$text = Test text http://hello.world Test text
http://google.com/file.jpg Test text https://hell.o.wor.ld/test?qwe=qwe Test text
test text http://test.test/test
preg_match_all('/(((http|https|ftp|ftps)\:\/\/)|(www\.))[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\:[0-9]+)?(\/\S*)?/', $text, $results, PREG_PATTERN_ORDER);
print_r($results[0]);
The return value for PREG_PATTERN_ORDER will be Array of Arrays (results) so that $results[0] is an array of full pattern matches, $results[1] is an array of strings matched by the first parenthesized subpattern, and so on.
function turnUrlIntoHyperlink($string)
{
// The Regular Expression filter
$reg_exUrl = "/(http|https|ftp|ftps)://[a-zA-Z0-9-.]+.[a-zA-Z]{2,3}(/\S*)?/";
// Check if there is a url in the text
if (preg_match($reg_exUrl, $string, $url)) {
// make the urls hyper links
echo preg_replace($reg_exUrl, "<a target='_blank' href='{$url[0]}'>{$url[0]}</a>", $string);
} else {
// if no urls in the text just return the text
echo $string;
}
}
For converting URLs to tags, and recognizing URLs without http/https, try the below. It uses preg_replace_callback to avoid the issue in one of the other answers with the same URL appearing multiple times:
private function convertUrls($string) {
$url_pattern = '/(((http|https)\:\/\/)|(www\.))[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,}(\:[0-9]+)?(\/\S*)?/';
return preg_replace_callback($url_pattern,
function($matches) {
$match = $matches[0];
if (strstr($match, ":") === false) {
$url = "https://$match";
} else {
$url = $match;
}
return '' . $url . '';
},
$string);
}
Alternative to regexp it´s use this library
Works very good, butnot for very complex codes.
foreach($html->find('a') as $element)
echo $element->href . '<br>';
And easy to use. No regular expressions skills required:-)
Not regexp, but finds it all and makes sure that they are not already encompassed in a tag already. It also checks to make sure that the link isn't encapsulated in (), [], "" or anything else with an open and close.
$txt = "Test text http://hello.world Test text
http://google.com/file.jpg Test text https://hell.o.wor.ld/test?qwe=qwe Test text
test text http://test.test/test I am already linked up
It was also done in 1927 (http://test.com/reference) Also check this out:http://test/index&t=27";
$holder = explode("http",$txt);
for($i = 1; $i < (count($holder));$i++) {
if (substr($holder[$i-1],-6) != 'href="') { // this means that the link is not alread in an a tag.
if (strpos($holder[$i]," ")!==false) //if the link is not the last item in the text block, stop at the first space
$href = substr($holder[$i],0,strpos($holder[$i]," "));
else //else it is the last item, take it
$href = $holder[$i];
if (ctype_punct(substr($holder[$i-1],strlen($holder[$i-1])-1)) && ctype_punct(substr($holder[$i],strlen($holder[$i])-1)))
$href = substr($href,0,-1); //if both the fron and back of the link are encapsulated in punctuation, truncate the link by one
$holder[$i] = implode("$href\" target=\"_blank\" class=\"link\">http$href</a>",explode($href,$holder[$i]));
$holder[$i-1] .= "<a href=\"";
}
}
$txt = implode("http",$holder);
echo $txt;
Output:
Test text http://hello.world Test text
http://google.com/file.jpg Test text https://hell.o.wor.ld/test?qwe=qwe Test text
test text http://test.test/test I am already linked up
It was also done in 1927 (http://test.com/reference) Also check this out:http://test/index&t=27
i use this function
<?php
function deteli($string){
$pos = strpos($string, 'http');
$spos = strpos($string, ' ', $pos);
$lst = $spos - $pos;
$bef = substr($string, 0, $pos);
$aft = substr($string, $spos);
if ($pos == true || $pos == 0) {
$link = substr($string, $pos, $lst);
$res = $bef . "<a href='" . $link . "' class='link' target='_blank'>link</a>" . $aft . "";
return $res;
}
else{
return $string;
}
}?>
I have a sample code:
<?php
$url = 'http://www.youtube.com/watch?v=KTRPVo0d90w';
$pattern = '/http:\/\/www\.youtube\.com\/watch\?(.*?)v=([a-zA-Z0-9_\-]+)(\S*)/i';
$replace = $pattern.'&w=550';
$string = preg_replace($pattern, $replace, $url);
?>
How to result is http://www.youtube.com/watch?v=KTRPVo0d90w&w=550
You can just append using the . operator:
<?php
$url = 'http://www.youtube.com/watch?v=KTRPVo0d90w';
$string = $url.'&w=550';
?>
Use preg_match instead:
<?php
$url = 'http://www.youtube.com/watch?v=KTRPVo0d90w&s=222';
$pattern = '/v=[^&]+/i';
preg_match($pattern, $url, $match);
echo 'http://www.youtube.com/watch?'.$match[0].'&w=550';
?>
Like below?
$url = 'http://www.youtube.com/watch?v=KTRPVo0d90w';
$bit = '&w=550';
echo "${url}${bit}";
Don't get me wrong, I'm not looking to gain any points here, but just thought I would add to this question and include a few options. I love toying with ideas like this every once in a while.
Using jh314's idea to concatenate the strings, thought that this could be used for future use, to actually replace a string inside the video's YouTube number, should the occasion ever present itself.
Such as $number for instance.
<?php
$url = 'http://www.youtube.com/watch?v=';
$number = 'KTRPVo0d90w';
$string = $url.$number.'&w=550';
// Output to screen
echo $string;
echo "<br>";
// Link to video
echo "Click for the video";
?>
The same could easily be done for the video's width.
See i have an url in a html code
play
Now i want to print this url as it is written in a php page
http://b48.ve.vc/b/data/48/3746/05 Dabangg Reloaded_-_www.DjPunjab.Com.mp3
You can see that between the url 05 Dabangg Reloaded their is space. I made this program to print url from this html code..
$str = "play";
$pattern = '`.*?((http|ftp)://[\w#$&+,\/:;=?#.-]+)[^\w#$&+,\/:;=?#.-]*?`i';
if (preg_match_all($pattern,$str,$matches))
foreach($matches[1] as $data)
{
$str=$data;
echo $str;
}
Then i am getting this
http://b48.ve.vc/b/data/48/3746/05
please do not mention on foreach($matches[1] as $data) line bcoz i am using it with so many urls.. I just want to know how to print the whole url in this format.
http://b48.ve.vc/b/data/48/3746/05 Dabangg Reloaded_-_www.DjPunjab.Com.mp3
Spaces are become a huge matter.. Do not know how to fix it..
What i need to add inside
$pattern = '`.*?((http|ftp)://[\w#$&+,\/:;=?#.-]+)[^\w#$&+,\/:;=?#.-]*?`i';
For making it completely workable.
Please suggest me any idea.
$str = 'play';
$arr = explode("\"", $str);
$pattern = '`.*?((http|ftp)://[\w#$&+,\/:;=?#.-]+)[^\w#$&+,\/:;=?#.-]*?`i';
$url = preg_grep($pattern,$arr);
$url = implode('',$url);
Output: $url = 'http://b48.ve.vc/b/data/48/3746/05 Dabangg Reloaded_-_www.DjPunjab.Com.mp3'
Update: 2nd Solution [Reference-DOMElement].
$str = 'play';
$DOM = new DOMDocument;
$DOM->loadHTML($str);
$search_item = $DOM->getElementsByTagName('a');
foreach($search_item as $search_item) {
$url = $search_item->getAttribute('href');
}
echo $url; //Output: http://b48.ve.vc/b/data/48/3746/05 Dabangg Reloaded_-_www.DjPunjab.Com.mp3
You can str_replace each one -space- with %20 for encoding your URL
<?php
$url_org = 'http://b48.ve.vc/b/data/48/3746/05 Dabangg Reloaded_-_www.DjPunjab.Com.mp3';
$url_edited = str_replace(" ", '%20', $url_org);
?>
HERE
This will work.
What is the simplest and fastest way to check if string is single URL or TEXT (that might contain urls)
possible scenarios:
// successful scenario
$example[] = 'http://sub-domain.my-domain.com/folder/file.php?some=param';
// successful scenario
$example[] = '/assets/scripts/jquery.min.js?v=1.4';
// successful scenario
$example[] = 'jquery.min.js';
// this scenario should fail validation
$example[] = "http://www.domain.com welcome text\n and some other http://www.domain.com";
// this scenario should fail validation
$example[] = "scriptVar=50;";
I have tried to use native php functions like parse_url, filter_var but non of them work as expected.
UPDATE 1
To make it more clear, I'm trying to separate possible URI from script content that would be inserted as DOM element. All urls would go as SRC attribute and rest as content, example:
<script type="text/javascript" src="{$string}"></script>
<script type="text/javascript">{$string}</script>
UPDATE 2
By analysing possible content I come to conclusion that string containing white space character or semicolon mean that string could not be URI, I presume that this pattern could solve my problem:
preg_match('/[\s]|[;]/', $string);
would it cover all possible javascript/css code?
$exampleData = Array(
'http://sub-domain.my-domain.com/folder/file.php?some=param',
'/assets/scripts/jquery.min.js?v=1.4',
'<a href="/assets/scripts/jquery.min.js?v=1.4">',
'<a href="assets/scripts/jquery.min.js?v=1.4">',
'http://www.domain.com welcome text\n and some other http://www.domain.com',
);
foreach($exampleData as $example)
{
echo "Trying \"" . $example . "\" -> ";
echo (preg_match('%((http(s)?://|www\.)[^ \r\n]+|<a.+?href=(\'|")(http(s)?://|www\.|[^#])[^\4\r\n]*?\4.*?>)%i', $example)) ?
"Match" : "No match";
echo "\r\n";
}
This would produce:
Trying "http://sub-domain.my-domain.com/folder/file.php?some=param" -> Match
Trying "/assets/scripts/jquery.min.js?v=1.4" -> No match
Trying "<a href="/assets/scripts/jquery.min.js?v=1.4">" -> Match
Trying "<a href="assets/scripts/jquery.min.js?v=1.4">" -> Match
Trying "http://www.domain.com welcome text\n and some other http://www.domain.com" -> Match
Update:
After reading your last update. If you want to parse HTML. Use a DOM-parser like:
http://simplehtmldom.sourceforge.net/
Example:
include_once('simple_html_dom.php');
$dom = file_get_html('http://www.stackoverflow.com/');
foreach($dom->find('script') as $scriptElement)
{
if(strlen(trim($scriptElement->src)) > 0)
{
// Script with URI set
echo "<strong>Found script with URI</strong>";
echo "<p>" . $scriptElement->src . "</p>";
}
else
{
// Script with content
echo "<strong>Found script with content</strong>";
echo("<p>" . nl2br(htmlspecialchars($scriptElement->innertext)) . "</p>");
}
}
Would output something like(HTML stripped):
Found script with URI
http://ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.min.js
Found script with URI
http://sstatic.net/js/master.min.js?v=afc76d4deac3
Found script with content
var imagePath='http://sstatic.net/stackoverflow/img/';
var inboxUnviewedCount = -1;
...etc
This function will return true if the passed text is an URL. It is based on a regex seen here on SO.
function validate_url ($url)
{
$regex = '/^(https?|ftp):\/\/'; //protocol
$regex .= '(([a-z0-9$_\.\+!\*\'\(\),;\?&=-]|%[0-9a-f]{2})+'; //username
$regex .= '(:([a-z0-9$_\.\+!\*\'\(\),;\?&=-]|%[0-9a-f]{2})+)?'; //password
$regex .= '#)?'; //auth requires #
$regex .= '((([a-z0-9][a-z0-9-]*[a-z0-9]\.)*'; //domain segments AND
$regex .= '[a-z][a-z0-9-]*[a-z0-9]'; //top level domain OR
$regex .= '|((\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])\.){3}';
$regex .= '(\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])'; //IP address
$regex .= ')(:\d+)?'; //port
$regex .= ')(((\/+([a-z0-9$_\.\+!\*\'\(\),;:#&=-]|%[0-9a-f]{2})*)*'; //path
$regex .= '(\?([a-z0-9$_\.\+!\*\'\(\),;:#&=-]|%[0-9a-f]{2})*)'; //query string
$regex .= '?)?)?'; //path and query string optional
$regex .= '(#([a-z0-9$_\.\+!\*\'\(\),;:#&=-]|%[0-9a-f]{2})*)?'; //fragment
$regex .= '$/i';
return (preg_match($regex, $url) ? true : false);
}
You can try it here: http://www.exorithm.com/algorithm/view/validate_url
EDIT in response to comment, this function will validate URL fragments like /index.php or index.php
function validate_url_fragment ($url)
{
$regex = '/^(((\/?([a-z0-9$_\.\+!\*\'\(\),;:#&=-]|%[0-9a-f]{2})*)*'; //path
$regex .= '(\?([a-z0-9$_\.\+!\*\'\(\),;:#&=-]|%[0-9a-f]{2})*)'; //query string
$regex .= '?)?)?'; //path and query string optional
$regex .= '(#([a-z0-9$_\.\+!\*\'\(\),;:#&=-]|%[0-9a-f]{2})*)?'; //fragment
$regex .= '$/i';
return (preg_match($regex, $url) ? true : false);
}
if (validate_url_fragment($url) || validate_url($url)) {
//is url
} else {
//not url
}
(note that the empty string is valid, so you may want a special case for that)
filter_var should do what you want for a single URL:
<?php
$safe_url = filter_var( $unsafe_url, FILTER_SANITIZE_URL );
?>