replace same url in text with regex - php

I am using the following code to add links to urls in text...
if (preg_match_all("#((http(s?)://)|www\.)?([a-zA-Z0-9\-\.])(\w+[^\s\)\<]+)#i", $str, $matches))
{
?><pre><?php
print_r($matches);
?></pre><?php
for ($i = 0; $i < count($matches[0]); $i++)
{
$url = $matches[0][$i];
$parsed = parse_url($url);
$prefix = '';
if (!isset($parsed["scheme"])){
$prefix = 'http://';
}
$url = $prefix.$url;
$replace = ''.$matches[0][$i].'';
$str = str_replace($matches[0][$i], ''.$matches[0][$i].'', $str);
}
}
the problem comes when i enter twice the same url in the text at any place..
for example.
google.com text text google.com
it will add a link on the first one and then search for google.com which is inside the link and try to add again in there..
how can i make sure it will add the links separately without problems?

You can use preg_replace_callback() to reliably work on individual matches.

Related

Find URL in string and turn into a link

I'm using the code given on this page to look through a string and turn the URL into an HTML link.
It works quite well, but there is a little issue with the "replace" part of it.
The problem occurs when I have almost identical links. For example:
https://example.com/page.php?goto=200
and
https://example.com/page.php
Everything will be fine with the first link, but the second will create a <a> tag in the first <a> tag.
First run
https://example.com/page.php?goto=200
Second
https://example.com/page.php?goto=200">https://example.com/page.php?goto=200</a>
Because it's also replacing the html link just created.
How do I avoid this?
<?php
function turnUrlIntoHyperlink($string){
//The Regular Expression filter
$reg_exUrl = "/(?i)\b((?:https?:\/\/|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))/";
// Check if there is a url in the text
if(preg_match_all($reg_exUrl, $string, $url)) {
// Loop through all matches
foreach($url[0] as $newLinks){
if(strstr( $newLinks, ":" ) === false){
$link = 'http://'.$newLinks;
}else{
$link = $newLinks;
}
// Create Search and Replace strings
$search = $newLinks;
$replace = ''.$link.'';
$string = str_replace($search, $replace, $string);
}
}
//Return result
return $string;
}
?>
You need to add a whitespace identifier \s in your regex at the start, also remove \b because \b only returns the last match.
You regex can written as:
$reg_exUrl = "/(?i)\s((?:https?:\/\/|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))/"
check this one: https://regex101.com/r/YFQPlZ/1
I have change the replace part a bit, since I couldn't get the suggested regex to work.
Maybe it can be done better, but I'm still learning :)
function turnUrlIntoHyperlink($string){
//The Regular Expression filter
$reg_exUrl = "/(?i)\b((?:https?:\/\/|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))/";
// Check if there is a url in the text
if(preg_match_all($reg_exUrl, $string, $url)) {
// Loop through all matches
foreach($url[0] as $key => $newLinks){
if(strstr( $newLinks, ":" ) === false){
$url = 'https://'.$newLinks;
}else{
$url = $newLinks;
}
// Create Search and Replace strings
$replace .= ''.$url.',';
$newLinks = '/'.preg_quote($newLinks, '/').'/';
$string = preg_replace($newLinks, '{'.$key.'}', $string, 1);
}
$arr_replace = explode(',', $replace);
foreach ($arr_replace as $key => $link) {
$string = str_replace('{'.$key.'}', $link, $string);
}
}
//Return result
return $string;
}

How to change href property checking urls

I need to verify a text to show it in the page of a website. I need to transform all urls links of the the same website(not others urls of other websites) in links. I need to involve all them with the tag <a>. The problem is is the property href, that I need to put the correct url inside it. I am trying to verify all the the text and if I find a url, I need to verify if it contains the substring "http://". If not, I must put it in the href property. I did some attempt, but all their aren't working yet :( . Any idea how can I do this?
My function is below:
$string = "This is a url from my website: http://www.mysite.com.br and I have a article interesting there, the link is http://www.mysite.com.br/articles/what-is-psychology/205967. I need that the secure url link works too https://www.mysite.com.br/articles/what-is-psychology/205967. the following urls must be valid too: www.mysite.com.br and mysite.com.br";
function urlMySite($string){
$verifyUrl = '';
$urls = array("mysite.com.br");
$text = explode(" ", $string);
$alltext = "";
for($i = 0; $i < count($texto); $i++){
foreach ($urls as $value){
$pos = strpos($text[$i], $value);
if (!($pos === false)){
$verifyUrl = " <a href='".$text[$i]."' target='_blank'>".$text[$i]."</a> ";
if (strpos($verifyUrl, 'http://') !== true) {
$verifyUrl = " <a href='http://".$text[$i]."' target='_blank'>".$text[$i]."</a> ";
}
$alltext .= $verifyUrl;
} else {
$alltext .= " ".$text[$i]." ";
}
}
}
return $alltext;
}
You should use PREG_MATCH_ALL to find all occurances of the URL and replace each of the Matches with a clickable Link.
You could use this function:
function augmentText($text){
$pattern = "~(https?|file|ftp)://[a-z0-9./&?:=%-_]*~i";
preg_match_all($pattern, $text, $matches);
if( count($matches[0]) > 0 ){
foreach($matches[0] as $match){
$text = str_replace($match, "<a href='" . $match . "' target='_blank'>" . $match . "</a>", $text);
}
}
return $text;
}
Change the reguylar expression pattern to match only the URL's you want to make clickable.
Good luck

preg_match_all How to get all links?

I'm trying to get all images links with preg_match_all those that begin with http://i.ebayimg.com/ and ends with .jpg , from page that I'm scraping.. I Can not do it correctly... :( I tried this but this is not what i need...:
preg_match_all('/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/', $contentas, $img_link);
Same problem is with normal links... I don't know how to write preg_match_all to this:
<a class="link--muted" href="http://suchen.mobile.de/fahrzeuge/details.html?id=218930381&daysAfterCreation=7&isSearchRequest=true&withImage=true&scopeId=C&categories=Limousine&damageUnrepaired=NO_DAMAGE_UNREPAIRED&zipcode=&fuels=DIESEL&ambitCountry=DE&maxPrice=11000&minFirstRegistrationDate=2006-01-01&makeModelVariant1.makeId=3500&makeModelVariant1.modelId=20&pageNumber=1" data-touch="hover" data-touch-wrapper=".cBox-body--resultitem">
Thank you very much!!!
UPDATE
I'm trying from here:
http://suchen.mobile.de/fahrzeuge/search.html?isSearchRequest=true&scopeId=C&makeModelVariant1.makeId=1900&makeModelVariant1.modelId=10&makeModelVariant1.modelDescription=&makeModelVariantExclusions%5B0%5D.makeId=&categories=Limousine&minSeats=&maxSeats=&doorCount=&minFirstRegistrationDate=2006-01-01&maxFirstRegistrationDate=&minMileage=&maxMileage=&minPrice=&maxPrice=11000&minPowerAsArray=&maxPowerAsArray=&maxPowerAsArray=PS&minPowerAsArray=PS&fuels=DIESEL&minCubicCapacity=&maxCubicCapacity=&ambitCountry=DE&zipcode=&q=&climatisation=&airbag=&daysAfterCreation=7&withImage=true&adLimitation=&export=&vatable=&maxConsumptionCombined=&emissionClass=&emissionsSticker=&damageUnrepaired=NO_DAMAGE_UNREPAIRED&numberOfPreviousOwners=&minHu=&usedCarSeals= get cars links and image links and all information, with information is everything fine, my script works good, but i have problem with scraping images and links.. here is my script :
<?php
$id= $_GET['id'];
$user= $_GET['user'];
$login=$_COOKIE['login'];
$query = mysql_query("SELECT pavadinimas,nuoroda,kuras,data,data_new from mobile where vartotojas='$user' and id='$id'");
$rezultatas=mysql_fetch_row($query);
$url = "$rezultatas[1]";
$info = file_get_contents($url);
function scrape_between($data, $start, $end){
$data = stristr($data, $start);
$data = substr($data, strlen($start));
$stop = stripos($data, $end);
$data = substr($data, 0, $stop);
return $data;
}
//turinio iskirpimas
$turinys = scrape_between($info, '<div class="g-col-9">', '<footer class="footer">');
//filtravimas naikinami mokami top skelbimai
$contentas = preg_replace('/<div class="cBox-body cBox-body--topResultitem".*?>(.*?)<\/div>/', '' ,$turinys);
//filtravimas baigtas
preg_match_all('/<span class="h3".*?>(.*?)<\/span>/',$contentas,$pavadinimas);
preg_match_all('/<span class="u-block u-pad-top-9 rbt-onlineSince".*?>(.*?)<\/span>/',$contentas,$data);
preg_match_all('/<span class="u-block u-pad-top-9".*?>(.*?)<\/span>/',$contentas,$miestas);
preg_match_all('/<span class="h3 u-block".*?>(.*?)<\/span>/', $contentas, $kaina);
preg_match_all('/<a[A-z0-9-_:="\.\/ ]+href="(http:\/\/suchen.mobile.de\/fahrzeuge\/[^"]*)"[A-z0-9-_:="\.\/ ]\s*>\s*<div/s', $contentas, $matches);
print_r($pavadinimas);
print_r($data);
print_r($miestas);
print_r($kaina);
print_r($result);
print_r($matches);
?>
1. To capture src attribute starting by http://i.ebayimg.com/ of all img tags :
regex : /src=\"((?:http|https):\\/\\/i.ebayimg.com\\/.+?.jpg)\"/i
Here is an example :
$re = "/src=\"((?:http|https):\\/\\/i.ebayimg.com\\/.+?.jpg)\"/i";
$str = "codeOfHTMLPage";
preg_match_all($re, $str, $matches);
Check it in live : here
If you want to be sure that you capture this url on an img tag then use this regex (keep in mind that performance will decrease if page is very long) :
$re = "/<img(?:.*?)src=\"((?:http|https):\\/\\/i.ebayimg.com\\/.+?.jpg)\"/i";
2. To capture href attribute starting by http://i.ebayimg.com/ of all a tags :
regex : /href=\"((?:http|https):\\/\\/suchen.mobile.de\\/fahrzeuge\\/.+?.jpg)\"/i
Here is an example :
$re = "/href=\"((?:http|https):\\/\\/suchen.mobile.de\\/fahrzeuge\\/.+?.jpg)\"/i;
$str = "codeOfHTMLPage";
preg_match_all($re, $str, $matches);
Check it in live : here
If you want to be sure that you capture this url on an a tag then use this regex (keep in mind that performance will decrease if page is very long) :
$re = "/<a(?:.*?)href=\"((?:http|https):\\/\\/suchen.mobile.de\\/fahrzeuge\\/.+?.jpg)\"/i";
More handy with DOMDocument:
libxml_use_internal_errors(true);
$dom = new DOMDocument;
$dom->loadHTMLFile($yourURL);
$imgNodes = $dom->getElementsByTagName('img');
$result = [];
foreach ($imgNodes as $imgNode) {
$src = $imgNode->getAttribute('src');
$urlElts = parse_url($src);
$ext = strtolower(array_pop(explode('.', $urlElts['path'])));
if ($ext == 'jpg' && $urlElts['host'] == 'i.ebayimg.com')
$result[] = $src;
}
print_r($result);
To get "normal" links, use the same way (DOMDocument + parse_url).

Replacing URLs with link using a for loop

I've been searching around for this but all I could find was broken scripts and plus, I might have a method that is quite simple.
I'm trying to use a for () loop for this one.
This is what I've got:
<?php
$reg_exUrl = "/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";
$makerepstring = "Here is a link: http://youtube.com and another: http://google.com";
if(preg_match_all($reg_exUrl, $makerepstring, $url)) {
// make the url into link
for($i=0; $i < count(array_keys($url[0])); $i++){
$makerepstring = preg_replace($reg_exUrl, ''.$url[0][$i].' ', $makerepstring);
}
}
echo $makerepstring;
?>
However this fails brutally for some reason I can't comprehend.
The output from echo $makerepstring; as follows(from source code):
http://google.com " target="_blank" rel="nofollow">http://google.com </a> http://google.com " target="_blank" rel="nofollow">http://google.com </a>
I'd really like to do it with a for()... Could somebody try and figure out how to get this to work with me?
Thanks in advance!
/J
$reg_exUrl = "/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";
$makerepstring = "http://youtube.com http://google.com";
$url = array();
$instances = preg_match_all($reg_exUrl, $makerepstring, $url);
if ($instances > 0) {
// make the url into link
for($i=0; $i < count(array_keys($url[0])); $i++){
$makerepstring = preg_replace($reg_exUrl, ''.$url[0][$i].' ', $makerepstring);
/*echo $url[0][$i]."<br />";
echo $i."<br />";
print_r($url);
echo "<br />";*/
}
}
echo $makerepstring;
This does not work either, although I'm not quite sure how you meant I should do this.
EDIT:
$reg_exUrl = "/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";
$makeurl = "http://google.com http://youtube.com";
if(preg_match($reg_exUrl, $makeurl, $url)) {
echo preg_replace($reg_exUrl, ''.$url[0].' ', $makeurl);
} else {
echo $makeurl;
}
Would give:
http://google.com http://google.com
that's not how preg_match_all works. http://php.net/manual/en/function.preg-match-all.php shows you that the matches go in a passed-along array, and the function returns the number of matches, instead. So first call
...
$matches = array();
$instances = preg_match_all(..., $matches);
if ($instances > 0) {
// and then your code
}
...
And then iterate over the $matches array, which now has content.
You are performing the match twice:
in preg_match_all function
then you are matching again in preg_replace, which should not happen here
Use string concatation instead:
$makerepstring = "Here is a link: http://youtube.com and another: http://google.com";
$new_str = '';
if(preg_match_all($reg_exUrl, $makerepstring, $url)) {
var_dump($url[0]);
// make the url into link
for($i=0; $i < count(array_keys($url[0])); $i++){
$new_str .= ''.$url[0][$i].' ';
}
}
echo $new_str;

Using preg_replace to get id

how to get id url with preg_replace.
this is the link:
http://www.DDDD.com.br/photo/5b87f8eaa7c20f79c3257eb3ec0a35e0/id how do I get the id? in the case would be: 5b87f8eaa7c20f79c3257eb3ec0a35e0
In this case I recommend not to use preg_match (preg_replace would be used to replace something.
Simply use
$array = explode('/',$_SERVER['REQUEST_URI']);
$id = $array[1];
If you must use preg_match:
$array = array();
preg_match('#^/photo/([0-9a-f]{32})/id$#',$_SERVER['REQUEST_URI'],$array);
$id = $array[1];
You can do this easily using strripos to find the last / in the URL.
$url = $_SERVER['REQUEST_URI'];
if (($pos = strripos($url, '/')) !== false) {
$id = substr($url, $pos + 1);
}
else {
trigger_error('You must supply a valid photo ID');
}
If you would like to just extract that id string, you can use:
$id_url = "http://www.DDDD.com.br/photo/5b87f8eaa7c20f79c3257eb3ec0a35e0/id";
$pattern = "/photo\/([a-zA-Z0-9]*)/";
preg_match($pattern, $id_url, $output_array);
echo $output_array[1];
Or, to make the replacement:
$id_url = "http://www.DDDD.com.br/photo/5b87f8eaa7c20f79c3257eb3ec0a35e0/id";
$pattern = "/photo\/([a-zA-Z0-9]*)/";
$replacement = "your replacement";
$replaced_url = preg_replace($pattern, $replacement, $id_url);
echo $replaced_url;
PHP Live Regex - a useful tool for testing your patterns

Categories