I found this code (Swap all youtube urls to embed via preg_replace()) to swap youtube urls (http://www.youtube.com/watch?v=CfDQ92vOfdc, or http://www.youtube.com/v/CfDQ92vOfdc) into youtube embed urls (http://www.youtube.com/embed/CfDQ92vOfdc) but it doesn't seem to be working? Any ideas? I don't know much about regular expression.
Here's the code:
$string = 'http://www.youtube.com/watch?v=CfDQ92vOfdc';
$search = '#<a (?:.*?)href=["\\\']http[s]?:\/\/(?:[^\.]+\.)*youtube\.com\/(?:v\/|watch\?(?:.*?\&)?v=|embed\/)([\w\-\_]+)["\\\']#ixs';
$replace = 'http://www.youtube.com/embed/$2';
$url = preg_replace($search,$replace,$string);
but it's still displaying as:
http://www.youtube.com/watch?v=CfDQ92vOfdc
instead of:
http://www.youtube.com/embed/CfDQ92vOfdc
Thanks in advance.
One problem is that your expression is expecting a-href tags around the address.
Another issue is that your $replace string is using single-quotes which will not parse $2.
This simpler expression should work:
$string = 'http://www.youtube.com/watch?v=CfDQ92vOfdc';
$search = '/youtube\.com\/watch\?v=([a-zA-Z0-9]+)/smi';
$replace = "youtube.com/embed/$1";
$url = preg_replace($search,$replace,$string);
echo $url;
Either change
$string = 'http://www.youtube.com/watch?v=CfDQ92vOfdc';
to
$string = '<a href="http://www.youtube.com/watch?v=CfDQ92vOfdc" ></a>';
OR
$search = '#<a (?:.*?)href=["\\\']http[s]?:\/\/(?:[^\.]+\.)*youtube\.com\/(?:v\/|watch\?(?:.*?\&)?v=|embed\/)([\w\-\_]+)["\\\']#ixs';
to
$search = '#(.*?)(?:href="https?://)?(?:www\.)?(?:youtu\.be/|youtube\.com(?:/embed/|/v/|/watch?.*?v=))([\w\-]{10,12}).*#x';
If there is anyone who is still looking for a better straight up solution ,
here it is I just played with your code until it gave me an easy solution.
$string = $content;
$search = '/www.youtube\.com\/watch\?v=([a-zA-Z0-9]+)/smi';
$replace = "<iframe width='560' height='315' src='https://youtube.com/embed/$1' frameborder='0' allowfullscreen></iframe>
";
$content = preg_replace($search,$replace,$string);
NOTE: to choose how you want the links to be processed just edit the $search part,
if you will be processing from www.youtube.com it will be
$search = '/www.youtube\.com\/watch\?v=([a-zA-Z0-9]+)/smi';
else if you want it to process just youtube.com links just remove the www.
$search = '/youtube\.com\/watch\?v=([a-zA-Z0-9]+)/smi';
here is a function i wrote that you echo out the result:
function youtube_url_to_embed($youtube_url) {
$search = '/youtube\.com\/watch\?v=([a-zA-Z0-9]+)/smi';
$replace = "youtube.com/embed/$1";
$embed_url = preg_replace($search,$replace,$youtube_url);
return $embed_url;
}
Related
I am working with an editor that works purely with internal relative links for files which is great for 99% of what I use it for.
However, I am also using it to insert links to files within an email body and relative links don't cut the mustard.
Instead of modifying the editor, I would like to search the string from the editor and replace the relative links with external links as shown below
Replace
files/something.pdf
With
https://www.someurl.com/files/something.pdf
I have come up with the following but I am wondering if there is a better / more efficient way to do it with PHP
<?php
$string = 'A link, some other text, A different link';
preg_match_all('/<a[^>]+href=([\'"])(?<href>.+?)\1[^>]*>/i', $string, $result);
if (!empty($result)) {
// Found a link.
$baseUrl = 'https://www.someurl.com';
$newUrls = array();
$newString = '';
foreach($result['href'] as $url) {
$newUrls[] = $baseUrl . '/' . $url;
}
$newString = str_replace($result['href'], $newUrls, $string);
echo $newString;
}
?>
Many thanks
Lee
You can simply use preg_replace to replace all the occurrences of files starting URLs inside double quotes:
$string = 'A link, some other text, A different link';
$string = preg_replace('/"(files.*?)"/', '"https://www.someurl.com/$1"', $string);
The result would be:
A link, some other text, A different link
You really should use DOMdocument for such job, but if you want to use a regex, this one does the job:
$string = '<a some_attribute href="files/something.pdf" class="abc">A link</a>, some other text, <a class="def" href="files/somethingelse.pdf" attr="xyz">A different link</a>';
$baseUrl = 'https://www.someurl.com';
$newString = preg_replace('/(<a[^>]+href=([\'"]))(.+?)\2/i', "$1$baseUrl/$3$2", $string);
echo $newString,"\n";
Output:
<a some_attribute href="https://www.someurl.comfiles/something.pdf" class="abc">A link</a>, some other text, <a class="def" href="https://www.someurl.com/files/somethingelse.pdf" attr="xyz">A different link</a>
I have a link being outputted on my site, what i want to do is replace the visible text that the user sees, but the link will always remain the same.
There will be many different dynamic urls with the text being changed, so all the example regex that i have found so far only use exact tags like '/.*/'...or something similar
Edited for a better example
$link = '<a href='some-dynamic-link'>Text to replace</a>';
$pattern = '/#(<a.*?>).*?(</a>)#/';
$new_text = 'New text';
$new_link = preg_replace($pattern, $new_text, $link);
When printing the output, the following is what i am looking for, against my result.
Desired
<a href='some-dynamic-link'>New text</a>
Actual
'New text'
As you're already using the capture groups, why not actually use them.
$link = "<a href='some-dynamic-link'>Text to replace</a>";
$newText = "Replaced!";
$result = preg_replace('/(<a.*?>).*?(<\/a>)/', '$1'.$newText.'$2', $link);
If you needs to get everything in Between tags then you can use below function
<?php
function getEverything_inBetween_tags(string $htmlStr, string $tagname)
{
$pattern = "#<\s*?$tagname\b[^>]*>(.*?)</$tagname\b[^>]*>#s";
preg_match_all($pattern, $htmlStr, $matches);
return $matches[1];
}
$str = 'see here for more details about test.com';
echo getEverything_inBetween_tags($str, 'a');
//output:- see here for more details about test.com
?>
if you needs to extract HTML Tag & get Array of that tag
<?php
function extractHtmlTag_into_array(string $htmlStr, string $tagname)
{
preg_match_all("#<\s*?$tagname\b[^>]*>.*?</$tagname\b[^>]*>#s", $htmlStr , $matches);
return $matches[0];
}
$str = '<p>test</p>test.com<span>testing string</span>';
$res = extractHtmlTag_into_array($str, 'a');
print_r($res);
//output:- Array([0] => "amazon.in/xyz/abc?test=abc")
?>
I have the following regex :
$string = preg_replace("/([\w]+:\/\/[\w-?&;#~=\.\/\#]+[\w\/])/i","<a target=\"_blank\" href=\"$1\">$1</A>",$string);
Using it to parse this string : http://www.ttt.com.ar/hello_world
Produces this new string :
<a target="_blank" href="http://www.ttt.com.ar/hello_world">http://www.ttt.com.ar/hello_world</A>
So far , soo good. What I want to do is to get replacement $1 to be a substring of $1 producing an output like :
<a target="_blank" href="http://www.ttt.com.ar/hello_world">http://www.ttt.com.ar/...</A>
Pseudocode of what I mean:
$string = preg_replace("/([\w]+:\/\/[\w-?&;#~=\.\/\#]+[\w\/])/i","<a target=\"_blank\" href=\"$1\">substring($1,0,24)..</A>",$string);
Is this even possible? Probably Im just doing all wrong :)
Thanks in advance.
Check out preg_replace_callback():
$string = 'http://www.ttt.com.ar/hello_world';
$string = preg_replace_callback(
"/([\w]+:\/\/[\w-?&;#~=\.\/\#]+[\w\/])/i",
function($matches) {
$link = $matches[1];
$substring = substr($link, 0, 24) . '..';
return "<a target=\"_blank\" href=\"$link\">$substring</a>";
},
$string
);
var_dump($string);
// <a target="_blank" href="http://www.ttt.com.ar/hello_world">http://www.ttt.com.ar/...</a>
Note, you can also use the e modifier in PHP to execute functions in your preg_replace(). This has been deprecated in PHP 5.5.0, in favor of preg_replace_callback().
You can use a capturing group inside of a lookahead like this:
preg_replace(
"/((?=(.{24}))[\w]+:\/\/[\w-?&;#~=\.\/\#]+[\w\/])/i",
"<a target=\"_blank\" href=\"$1\">$2..</A>",
$string);
This will capture the entire URL in group 1, but it will also capture the first 24 characters of it in group 2.
You are showing bad practice. Regexes should not being used to parse or modify xml content from application's context.
Suggests:
Use a DOM parsing to read and modify the value
use parse_url() to get the protocol + domain name
Example:
$doc = new DOMDocument();
$doc->loadHTML(
'<a target="_blank" href="http://www.ttt.com.ar/hello_world">http://www.ttt.com.ar/hello_world</A>'#
);
$link = $doc->getElementsByTagName('a')->item(0);
$url = parse_url($link->nodeValue);
$link->nodeValue = $url['scheme'] . '://' . $url['host'] . '/...';
echo $doc->saveHTML();
I am still relatively new to Regular Expressions and feel My code is being too greedy. I am trying to add an id attribute to existing links in a piece of code. My functions is like so:
function addClassHref($str) {
//$str = stripslashes($str);
$preg = "/<[\s]*a[\s]*href=[\s]*[\"\']?([\w.-]*)[\"\']?[^>]*>(.*?)<\/a>/i";
preg_match_all($preg, $str, $match);
foreach ($match[1] as $key => $val) {
$pattern[] = '/' . preg_quote($match[0][$key], '/') . '/';
$replace[] = "<a id='buttonRed' href='$val'>{$match[2][$key]}</a>";
}
return preg_replace($pattern, $replace, $str);
}
This adds the id tag like I want but it breaks the hyperlink. For example:
If the original code is : Link
Instead of <a id="class" href="http://www.google.com">Link</a>
It is giving
<a id="class" href="http">Link</a>
Any suggestions or thoughts?
Do not use regular expressions to parse XML or HTML.
$doc = new DOMDocument();
$doc->loadHTML($html);
$all_a = $doc->getElementsByTagName('a');
$firsta = $all_a->item(0);
$firsta->setAttribute('id', 'idvalue');
echo $doc->saveHTML($firsta);
You've got some overcomplications in your regex :)
Also, there's no need for the loop as preg_replace() will hit all the instances of the search pattern in the relevant string. The first regex below will take everything in the a tag and simply add the id attribute on at the end.
$str = 'Link' . "\n" .
'Link' . "\n" .
'Link';
$p = "{<\s*a\s*(href=[^>]*)>([^<]*)</a>}i";
$r = "<a $1 id=\"class\">$2</a>";
echo preg_replace($p, $r, $str);
If you only want to capture the href attribute you could do the following:
$p = '{<\s*a\s*href=["\']([^"\']*)["\'][^>]*>([^<]*)</a>}i';
$r = "<a href='$1' id='class'>$2</a>";
Your first subpattern ([\w.-]*) doesn't match :, thus it stops at "http".
Couldn't you just use a simple str_replace() for this? Regex seems like overkill if this is all you're doing.
$str = str_replace('<a ', '<a id="someID" ', $str);
Ok, basically I have an array of bad urls and I would like to search through a string and strip them out. I want to strip everything from the opening tag to the closing tag, but only if the url in the hyperlink is in the array of bad urls. Here is how I would picture it working but I don't understand regular expressions well.
foreach($bad_urls as $bad_url){
$pattern = "/<a*$bad_url*</a>/";
$replacement = ' ';
preg_replace($pattern, $replacement, $content);
}
Thanks in advance.
Assuming that your 'bad urls' are properly formatted URLs, I would suggest doing something like this:
foreach($bad_urls as $bad_url){
$pattern = '/<[aA]\s.+[href|HREF]\=\"' . convert_to_pattern($bad_url) . '\".+<\/[aA]>/msU';
$replacement = ' ';
$content = preg_replace_all($pattern, $replacement, $content);
}
and separately
function convert_to_pattern($url)
{
searches = array('%', '&', '?', '.', '/', ';', ' ');
replaces = array('\%','\&','\?','\.','\/','\;','\ ');
return preg_replace_all($searches, $replaces, $url);
}
Please do not try to parse HTML using regular expressions. Just load up the HTML in a DOM, find all the <a> tags and check the href property. Much simpler and fool-proof.