check string for youtube link or image - php

I have a small piece of code that checks a string for a url and adds the < a href> tag to create a link. I also have it check the string for a youtube link and then add rel="youtube" to the < a> tag.
How can I get the code to only add rel to the youtube links?
How can I get it to add a different rel to any type of image link?
$text = "http://site.com a site www.anothersite.com/ http://www.youtube.com/watch?v=UyxqmghxS6M here is another site";
$linkstring = preg_replace( '/(http|ftp)+(s)?:(\/\/)((\w|\.)+)(\/)?(\S+)?/i', '\4', $text );
if(preg_match('/http:\/\/www\.youtube\.com\/watch\?v=[^&]+/', $linkstring, $vresult)) {
$linkstring = preg_replace( '/(http|ftp)+(s)?:(\/\/)((\w|\.)+)(\/)?(\S+)?/i', '<a rel="youtube" href="\0">\4</a>', $text );
$type= 'youtube';
}
else {
$type = 'none';
}
echo $text;
echo $linkstring, "<br />";
echo $type, "<br />";

Try http://simplehtmldom.sourceforge.net/.
Code:
<?php
include('simple_html_dom.php');
$html = str_get_html('Link');
$html->find('a', 0)->rel = 'youtube';
echo $html;
Output:
[username#localhost dom]$ php dom.php
Link
You can build an entire page DOM or a simple single link with this library.
Detecting hostname of URL:
Pass the url to parse_url. parse_url returns an array of the URL parts.
Code:
print_r(parse_url('http://www.youtube.com/watch?v=UyxqmghxS6M'));
Output:
Array
(
[scheme] => http
[host] => www.youtube.com
[path] => /watch
[query] => v=UyxqmghxS6M
)

Try the following:
//text
$text = "http://site.com/bounty.png a site www.anothersite.com/ http://www.youtube.com/watch?v=UyxqmghxS6M&featured=true here is another site";
//Youtube links
$pattern = "/(http:\/\/){0,1}(www\.){0,1}youtube\.com\/watch\?v=([a-z0-9\-_\|]{11})[^\s]*/i";
$replacement = '<a rel="youtube" href="http://www.youtube.com/watch?v=\3">\0</a>';
$text = preg_replace($pattern, $replacement, $text);
//image links
$pattern = "/(http:\/\/){0,1}(www\.){0,1}[^\/]+\/[^\s]+\.(png|jpg|jpeg|bmp|gif)[^\s]*/i";
$replacement = '<a rel="image" href="\0">\0</a>';
$text = preg_replace($pattern, $replacement, $text);
note that the latter can only detect links to images which have an extension. As such, links like www.example.com?image=3 will not be detected.

Related

Find specific domain name and append url in string PHP

Let's say I have the following string:
<?php
$str = 'To subscribe go to Here';
?>
What I'm trying to do is find the URLS within the string that have a specific domain name, "foo.com" for this example, then append the url.
What I want to accomplish:
<?php
$str = 'To subscribe go to Here';
?>
If the domain name in the urls isn't foo.com, I don't want them to be appended.
You can use parse_url() function and the DomDoccument class of php to manipulate the urls, like this:
$str = 'To subscribe go to Here';
$dom = new DomDocument();
$dom->loadHTML($str);
$urls = $dom->getElementsByTagName('a');
foreach ($urls as $url) {
$href = $url->getAttribute('href');
$components = parse_url($href);
if($components['host'] == "foo.com"){
$components['path'] .= "?package=2";
$url->setAttribute('href', $components['scheme'] . "://" . $components['host'] . $components['path']);
}
$str = $dom->saveHtml();
}
echo $str;
Output:
To subscribe go to [Here]
^ href="http://foo.com/subscribe?package=2"
Here are the references:
The DOMDocument class
parse_url()

PHP - 'preg_replace regex' all images' URL

I tried to replace all image URLs with an other image URL but I didn't success to correctly write the regex.
My images are not necessarily in an img tag with src="".
It is mostly enclosed with ="image url"
Content to replace for example:
[side_section poster="image.jpg" position="left" bgrepeat="no-repeat" bgcolor="#f6f6f6" paddingtop="70" paddingbot="70" txtcolor="" ]
$content = (string) preg_replace('/(?[!=")(http:\\/\\/.+(png|jpeg|jpg|gif|bmp))/Ui', './images/placeholder.png', (string) $content);
Here is what you need:
$content = '[side_section poster="image.jpg" position="left" bgrepeat="no-repeat" bgcolor="#f6f6f6" paddingtop="70" paddingbot="70" txtcolor="" ]';
$newContent = (string) preg_replace('/="([^"]*\.(?:png|jpeg|jpg|gif|bmp))"/', '="./images/placeholder.png"', (string) $content);
echo $newContent;
The regex used is: ="([^"]*\.(?:png|jpeg|jpg|gif|bmp))"
You can test the it here: DEMO
However the string that you use to replace your image paths should look like this: '="./images/placeholder.png"'
As an alternative use this function:
function replaceImg($content, $path)
{
return (string) preg_replace('/="([^"]*\.(?:png|jpeg|jpg|gif|bmp))"/', '="'.$path.'"', (string) $content);
}
example:
$content = '[side_section poster="image.jpg" position="left" bgrepeat="no-repeat" bgcolor="#f6f6f6" paddingtop="70" paddingbot="70" txtcolor="" ]';
echo replaceImg($content, './images/placeholder.png');
OUTPUT
[side_section poster="./images/placeholder.png" position="left" bgrepeat="no-repeat" bgcolor="#f6f6f6" paddingtop="70" paddingbot="70" txtcolor="" ]
example 2:
$content = 'position="left" poster="image.jpg"';
echo replaceImg($content, './images/placeholder.png');
OUTPUT
position="left" poster="./images/placeholder.png"

get meta description , title and image from url like facebook link sharing

my code is
function getTitle($Url){
$str = file_get_contents($Url);
if(strlen($str)>0){
preg_match("/\<title\>(.*)\<\/title\>/",$str,$title);
return $title[1];
}
else
{
return false;
}
}
function getMetas($Url){
$str = file_get_contents($Url);
if(strlen($str)>0){
// preg_match("/\<title\>(.*)\<\/title\>/",$str,$title);
preg_match("/<meta name=\"description\" content=\"(.*?)\"/",$str,$title);
// preg_match( '<meta name="description".*content="([^"]+)">siU', $str, $title);
return $title[1];
}
else
{
return false;
}
}
//Example:
$url=$_POST['url'];
echo getTitle($url);
echo "<br><br>";
echo getMetas($url);
this does not shows result for all the url's , example http://google.com
Why are you using regular expression for parsing the <meta> tags ?
PHP has an in-built function for parsing the meta information , it is called the get_meta_tags()
Illustration :
<?php
$tags = get_meta_tags('http://www.stackoverflow.com/');
echo "<pre>";
print_r($tags);
OUTPUT:
Array
(
[twitter:card] => summary
[twitter:domain] => stackoverflow.com
[og:type] => website
[og:image] => http://cdn.sstatic.net/stackoverflow/img/apple-touch-icon#2.png?v=fde65a5a78c6
[og:title] => Stack Overflow
[og:description] => Q&A for professional and enthusiast programmers
[og:url] => http://stackoverflow.com/
)
As you can see the title , image and description are being parsed which you really want.
I know the question is 1.5 years old. But if you are still looking for it, you can use https://urlmeta.org. Its a free API to extract URL meta.
You can check a URL for http or https by
$url='stackoverflow.com';
$http_check='http://';
$https_check='https://';
if(substr($url,0,7)!=$http_check){
$url=$http_check.$url;
}else if(substr($url,0,8)!=$https_check){
$url=$https_check.$url;
}else{
$url=$url
}
then you can use the above answer
<?php
$tags = get_meta_tags($url);
echo "<pre>";
print_r($tags);

preg_match and preg_replace for youtube url

i have this code
preg_match_all('%(?:youtube\.com/(?:[^/]+/.+/|(?:v|e(?:mbed)?)/|.*[?&]v=)|youtu\.be/)([^"&?/ ]{11})%i', $asd, $match);
to find youtube key of urls like
http://www.youtube.com/watch?v=ZiJRPREeQ1Q
<br />
http://www.youtube.com/watch?v=GaEZgqxPHLs&feature=related
this work good to find the code ZiJRPREeQ1Q and GaEZgqxPHLs , now i want to replace all the html line with a new code
wanna to use
preg_replace
to find the whole youtube url
*
to a new code how can i do that ?
--------------adds--------------
after i get the code of youtube from url by preg_math_all
i used this code to extract the codes
foreach($match[1] as $youtube){
// $youtube; // this handle the youtube code
$match = ""; // what can i write here relative to $youtube ?
$str .= preg_replace($match, 'new code',$content); // $content handle the whole thread that contain the youtube url <a href=*>*</a>
}
the only thing that i need that what's regular expression that i can use to replace youtube code
$html = file_get_contents($url); //or curl function
$re="<link itemprop=\"embedURL\" href=\"(.+)\">";
preg_match_all("/$re/siU", $html, $matches);
$youtube = $matches[1][0];
or
$html = file_get_contents($url); //or curl function
$re="<link itemprop=\"url\" href=\"(.+)\">";
preg_match_all("/$re/siU", $html, $matches);
$youtube = $matches[1][0];

php regex to get string inside href tag

I need a regex that will give me the string inside an href tag and inside the quotes also.
For example i need to extract theurltoget.com in the following:
URL
Additionally, I only want the base url part. I.e. from http://www.mydomain.com/page.html i only want http://www.mydomain.com/
Dont use regex for this. You can use xpath and built in php functions to get what you want:
$xml = simplexml_load_string($myHtml);
$list = $xml->xpath("//#href");
$preparedUrls = array();
foreach($list as $item) {
$item = parse_url($item);
$preparedUrls[] = $item['scheme'] . '://' . $item['host'] . '/';
}
print_r($preparedUrls);
$html = 'URL';
$url = preg_match('/<a href="(.+)">/', $html, $match);
$info = parse_url($match[1]);
echo $info['scheme'].'://'.$info['host']; // http://www.mydomain.com
this expression will handle 3 options:
no quotes
double quotes
single quotes
'/href=["\']?([^"\'>]+)["\']?/'
Use the answer by #Alec if you're only looking for the base url part (the 2nd part of the question by #David)!
$html = 'URL';
$url = preg_match('/<a href="(.+)">/', $html, $match);
$info = parse_url($match[1]);
This will give you:
$info
Array
(
[scheme] => http
[host] => www.mydomain.com
[path] => /page.html" class="myclass" rel="myrel
)
So you can use $href = $info["scheme"] . "://" . $info["host"]
Which gives you:
// http://www.mydomain.com
When you are looking for the entire url between the href, You should be using another regex, for instance the regex provided by #user2520237.
$html = 'URL';
$url = preg_match('/href=["\']?([^"\'>]+)["\']?/', $html, $match);
$info = parse_url($match[1]);
this will give you:
$info
Array
(
[scheme] => http
[host] => www.mydomain.com
[path] => /page.html
)
Now you can use $href = $info["scheme"] . "://" . $info["host"] . $info["path"];
Which gives you:
// http://www.mydomain.com/page.html
http://www.the-art-of-web.com/php/parse-links/
Let's start with the simplest case - a well formatted link with no extra attributes:
/<a href=\"([^\"]*)\">(.*)<\/a>/iU
For all href values replacement:
function replaceHref($html, $replaceStr)
{
$match = array();
$url = preg_match_all('/<a [^>]*href="(.+)"/', $html, $match);
if(count($match))
{
for($j=0; $j<count($match); $j++)
{
$html = str_replace($match[1][$j], $replaceStr.urlencode($match[1][$j]), $html);
}
}
return $html;
}
$replaceStr = "http://affilate.domain.com?cam=1&url=";
$replaceHtml = replaceHref($html, $replaceStr);
echo $replaceHtml;
This will handle the case where there are no quotes around the URL.
/<a [^>]*href="?([^">]+)"?>/
But seriously, do not parse HTML with regex. Use DOM or a proper parsing library.
/href="(https?://[^/]*)/
I think you should be able to handle the rest.
Because Positive and Negative Lookbehind are cool
/(?<=href=\").+(?=\")/
It will match only what you want, without quotation marks
Array (
[0] => theurltoget.com )

Categories