Firstly, sorry my bad English.
I'm taking text from a site with Curl. After that, I am searching in text some words with strpos().
But; when text coming with quotes, my function is not working. For example. Text is coming with curl without quotes;
$text = "This is my text.";
$intext = "This is";
if (strpos($text, $intext) !== false) {
echo "OK";
}
Okey, page give me "OK", now my codes working.
But, when text coming with quotes like this:
$text = "This's my text.";
$intext = "This's my";
if (strpos($text, $intext) !== false) {
echo "OK";
} else {
echo "NO";
}
The page gives me: "NO"!
Why? I think the quotation mark data from the website is different. How can I fix this problem? I need to compare without clearing punctuation.
I fixed problem with this code;
$str = str_replace(' ', ' ', $text);
$str = html_entity_decode($str, ENT_QUOTES | ENT_COMPAT , 'UTF-8');
$str = html_entity_decode($str, ENT_HTML5, 'UTF-8');
$str = html_entity_decode($str);
$str = htmlspecialchars_decode($str);
$text = strip_tags($str);
Thanks.
I have a script written in PHP but there is a problem
this code does not accept Unicode text. when I pass Unicode text to this code, it returns some invalid characters.
public static function compile(&$subject, $replace, $with) {
$placeholders = array_combine($replace, $with);
$condition = '{[a-z0-9\_\- ]+:[a-z_]+}';
$inner = '((?:(?!{/?if).)*)';
$pattern = '#{if ('.$condition.')}'.$inner.'{/if}#is';
while (preg_match($pattern, $subject, $match)) {
$placeholder = $match[1];
$content = $match[2];
// if empty value remove whole line
// else show line but remove pseudo-code
$subject = preg_replace($pattern,
empty($placeholders[$placeholder]) ? '' : addcslashes($content, '$'),
$subject,
1);
}
}
please help me.
AFAIK, for unicode you need to use multibyte string methods:
http://php.net/manual/en/ref.mbstring.php
I want to find all links in the text like this:
Test text http://hello.world Test text
http://google.com/file.jpg Test text https://hell.o.wor.ld/test?qwe=qwe Test text
test text http://test.test/test
I know i need to use preg_match_all, but have only idea in the head: start search from http|https|ftp and end search where space or end of the text appears, thats all i need really, so all links wiil be found properly.
Anyone can help me with php regexp pattern?
I think i need to use assertions in the end of pattern, but can`t understand their properly usage for now.
Any ideas? Thanx!
I'd go with something simple like ~[a-z]+://\S+~i
starts with protocol [a-z]+://
\S+ followed by one or more non-whitespaces where \S is a shorthand for [^ \t\r\n\f]
used modifier i (PCRE_CASELESS) (possibly not really necessery)
So it could look like this:
$pattern = '~[a-z]+://\S+~';
$str = 'Test text http://hello.world Test text
http://google.com/file.jpg Test text https://hell.o.wor.ld/test?qwe=qwe Test text
test text http://test.test/test';
if($num_found = preg_match_all($pattern, $str, $out))
{
echo "FOUND ".$num_found." LINKS:\n";
print_r($out[0]);
}
outputs:
FOUND 4 LINKS:
Array
(
[0] => http://hello.world
[1] => http://google.com/file.jpg
[2] => https://hell.o.wor.ld/test?qwe=qwe
[3] => http://test.test/test
)
Test on eval.in
function turnUrlIntoHyperlink($string){
//The Regular Expression filter
$reg_exUrl = "/(?i)\b((?:https?:\/\/|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))/";
// Check if there is a url in the text
if(preg_match_all($reg_exUrl, $string, $url)) {
// Loop through all matches
foreach($url[0] as $newLinks){
if(strstr( $newLinks, ":" ) === false){
$link = 'http://'.$newLinks;
}else{
$link = $newLinks;
}
// Create Search and Replace strings
$search = $newLinks;
$replace = ''.$link.'';
$string = str_replace($search, $replace, $string);
}
}
//Return result
return $string;
}
<?php
// The Regular Expression filter
$reg_exUrl = "/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";
// The Text you want to filter for urls
$text = "The text you want to filter goes here. http://google.com";
// Check if there is a url in the text
if(preg_match($reg_exUrl, $text, $url)) {
// make the urls hyper links
echo preg_replace($reg_exUrl, "{$url[0]} ", $text);
} else {
// if no urls in the text just return the text
echo $text;
}
?>
Reference:http://css-tricks.com/snippets/php/find-urls-in-text-make-links/
Works like a charm. use this.
$str= "Test text http://hello.world";
preg_match_all('/\b(?:(?:https?|ftp|file):\/\/|www\.|ftp\.)[-A-Z0-9+&##\/%=~_|$?!:,.]*[A-Z0-9+&##\/%=~_|$]/i', $str, $result, PREG_PATTERN_ORDER);
print_r($result[0]);
The suggested answers are great, but one of them miss www. case, the other http://
So, let's combine all of those:
$text = Test text http://hello.world Test text
http://google.com/file.jpg Test text https://hell.o.wor.ld/test?qwe=qwe Test text
test text http://test.test/test
preg_match_all('/(((http|https|ftp|ftps)\:\/\/)|(www\.))[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\:[0-9]+)?(\/\S*)?/', $text, $results, PREG_PATTERN_ORDER);
print_r($results[0]);
The return value for PREG_PATTERN_ORDER will be Array of Arrays (results) so that $results[0] is an array of full pattern matches, $results[1] is an array of strings matched by the first parenthesized subpattern, and so on.
function turnUrlIntoHyperlink($string)
{
// The Regular Expression filter
$reg_exUrl = "/(http|https|ftp|ftps)://[a-zA-Z0-9-.]+.[a-zA-Z]{2,3}(/\S*)?/";
// Check if there is a url in the text
if (preg_match($reg_exUrl, $string, $url)) {
// make the urls hyper links
echo preg_replace($reg_exUrl, "<a target='_blank' href='{$url[0]}'>{$url[0]}</a>", $string);
} else {
// if no urls in the text just return the text
echo $string;
}
}
For converting URLs to tags, and recognizing URLs without http/https, try the below. It uses preg_replace_callback to avoid the issue in one of the other answers with the same URL appearing multiple times:
private function convertUrls($string) {
$url_pattern = '/(((http|https)\:\/\/)|(www\.))[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,}(\:[0-9]+)?(\/\S*)?/';
return preg_replace_callback($url_pattern,
function($matches) {
$match = $matches[0];
if (strstr($match, ":") === false) {
$url = "https://$match";
} else {
$url = $match;
}
return '' . $url . '';
},
$string);
}
Alternative to regexp it´s use this library
Works very good, butnot for very complex codes.
foreach($html->find('a') as $element)
echo $element->href . '<br>';
And easy to use. No regular expressions skills required:-)
Not regexp, but finds it all and makes sure that they are not already encompassed in a tag already. It also checks to make sure that the link isn't encapsulated in (), [], "" or anything else with an open and close.
$txt = "Test text http://hello.world Test text
http://google.com/file.jpg Test text https://hell.o.wor.ld/test?qwe=qwe Test text
test text http://test.test/test I am already linked up
It was also done in 1927 (http://test.com/reference) Also check this out:http://test/index&t=27";
$holder = explode("http",$txt);
for($i = 1; $i < (count($holder));$i++) {
if (substr($holder[$i-1],-6) != 'href="') { // this means that the link is not alread in an a tag.
if (strpos($holder[$i]," ")!==false) //if the link is not the last item in the text block, stop at the first space
$href = substr($holder[$i],0,strpos($holder[$i]," "));
else //else it is the last item, take it
$href = $holder[$i];
if (ctype_punct(substr($holder[$i-1],strlen($holder[$i-1])-1)) && ctype_punct(substr($holder[$i],strlen($holder[$i])-1)))
$href = substr($href,0,-1); //if both the fron and back of the link are encapsulated in punctuation, truncate the link by one
$holder[$i] = implode("$href\" target=\"_blank\" class=\"link\">http$href</a>",explode($href,$holder[$i]));
$holder[$i-1] .= "<a href=\"";
}
}
$txt = implode("http",$holder);
echo $txt;
Output:
Test text http://hello.world Test text
http://google.com/file.jpg Test text https://hell.o.wor.ld/test?qwe=qwe Test text
test text http://test.test/test I am already linked up
It was also done in 1927 (http://test.com/reference) Also check this out:http://test/index&t=27
i use this function
<?php
function deteli($string){
$pos = strpos($string, 'http');
$spos = strpos($string, ' ', $pos);
$lst = $spos - $pos;
$bef = substr($string, 0, $pos);
$aft = substr($string, $spos);
if ($pos == true || $pos == 0) {
$link = substr($string, $pos, $lst);
$res = $bef . "<a href='" . $link . "' class='link' target='_blank'>link</a>" . $aft . "";
return $res;
}
else{
return $string;
}
}?>
This is a followup from another post at here.
Problem: The code below works good with the exception of strings that contain double quotes which will render strange characters
Sample string:
“Walter Isaacson http://t.co/vaLxVduA”
Rendered as:
“Walter Isaacson http://t.co/vaLxVduA���
t.co/vaLxVduA���
I believe the problem is in the regex. What could I try to make this work?
Code:
function makeLink($match) {
// Parse link.
$substr = substr($match, 0, 6);
if ($substr != 'http:/' && $substr != 'https:' && $substr != 'ftp://' && $substr != 'news:/' && $substr != 'file:/') {
$url = 'http://' . $match;
} else {
$url = $match;
}
return '' . $match . '';
}
function makeHyperlinks($text) {
// Find links and call the makeLink() function on them.
return preg_replace('/((www\.|http|https|ftp|news|file):\/\/[\w.-]+\.[\w\/:#=.+?,#%&~-]*[^.\'# !(?,><;\)])/e', "makeLink('$1')", $text);
}
The problem is die unicode character ”. When you add the u modifier, to treat every string as UTF-8, it works, but also catches the quote as part of the URL. You would need to exclude this quote also:
preg_replace('/((www\.|http|https|ftp|news|file):\/\/[\w.-]+\.[\w\/:#=.+?,#%&~-]*[^.\'# !(?,>”<;\)])/eu', "makeLink('$1')", $text);
But your regex looks kinda huge, I did a quick search for a URL regex and found this one, it seems to work also, and don't need all the exclusions
preg_replace('#(https?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?)#eu', "makeLink('$1')", $text);
Im using prepared statements to descramble BBcode but for some reason it puts \ before ' when posting. I dont know what causes it, but Im sure it happens when i change the BBcode to html to be put in the database, the code looks like this:
$text = $membership->remove_HTML($text);
//convert line breaks to <br /> tags.
$text = nl2br($text);
//cleans up by removing white space.
$text = trim($text);
//now lets replace things BASIC EDITOR
$text = preg_replace("/\[b\](.*)\[\/b\]/", "<strong>\\1</strong>", $text);
$text = preg_replace("/\[i\](.*)\[\/i\]/", "<em>\\1</em>", $text);
$text = preg_replace("/\[u\](.*)\[\/u\]/", "<span style='text-decoration:underline;'>\\1</span>", $text);
$text = preg_replace("/\[s\](.*)\[\/s\]/", "<del>\\1</del>", $text);
$text = preg_replace("/\[url\](.*)\[\/url\]/", "<a target='_blank' href='\\1'>\\1</a>", $text);
$text = preg_replace("/\[url=(.*)\](.*)\[\/url\]/", "<a target='_blank' rel='\\1' href='\\1'>\\2</a>", $text);
//now lets replace MORE things EXPANDED EDITOR
$text = preg_replace("/\[img\](.*)\[\/img\]/", "<img>\\1</img>", $text);
$text = str_ireplace("[hr]","<hr>", $text);
$text = preg_replace("/\[justify\](.*)\[\/justify\]/", "<p style='text-align:justify;'>\\1</p>", $text);
$text = preg_replace("/\[center\](.*)\[\/center\]/", "<p style='text-align:center;'>\\1</p>", $text);
$text = preg_replace("/\[left\](.*)\[\/left\]/", "<p style='text-align:left;'>\\1</p>", $text);
$text = preg_replace("/\[right\](.*)\[\/right\]/", "<p style='text-align:right;'>\\1</p>", $text);
$text = preg_replace("/\[h1\](.*)\[\/h1\]/", "<h4>\\1</h4>", $text);
$text = preg_replace("/\[h2\](.*)\[\/h2\]/", "<h5>\\1</h5>", $text);
$text = preg_replace("/\[h3\](.*)\[\/h3\]/", "<h6>\\1</h6>", $text);
$updatenews = $mysql->add_news($_SESSION['user'][0], $headline, $text, $time);
The quickest thing to check is making sure PHP's Magic Quotes are disabled.
If you don't want to get right into mucking around with your PHP configuration, then check to see if the slashes are present before you roll $text through that series of preg_replaces().