I want to convert a string in PHP. This string may contain URL or image tags or other tags, but I don't want to convert image tags src value into a link. For example:
We have a link https://youtube.com/watch/8374h87shdv which needs to be converted but this is not to be <image class="emoji" alt="emoji" src="https://icloud.com/png/sdsdv234f.png"
The above string needs to be converted but without src URL.
I am using this currently:
function convert_strings( $content ){
$url = '~(?:(https?)://([^\s<]+)|(www\.[^\s<]+?\.[^\s<]+))(?<![\.,:])~i';
$content = preg_replace($url, '$0', $content);
$content = preg_replace('/(?<!\S)#([0-9a-zA-Z]+)/', '#$1', $content);
$content = convert_smilies( $content );
return $content;
}
But converts all. How can I achieve this?
Related
Hello I have my code that copy the html from external url and echo it on my page.
Some of the HTMLs have links and/or picure SRC inside.
I will need some help to truncate them (from absolute url to relative url inside $data )
For example : inside html there is href
<a href="https://www.trade-ideas.com/products/score-vs-ibd/" >
or SRC
<img src="http://static.trade-ideas.com/Filters/MinDUp1.gif">
I would like to keep only subdirectory.
/products/score-vs-ibd/z
/Filters/MinDUp1.gif
Maybe with preg_replace , but im not familiar with Regular expressions.
This is my original code that works very well, but now im stuck truncating the links.
<?php
$post_tags = get_the_tags();
if ( $post_tags ) {
$tag = $post_tags[0]->name;
}
$html= file_get_contents('https://www.trade-ideas.com/ticky/ticky.html?symbol='. "$tag");
$start = strpos($html,'<div class="span3 height-325"');
$end = strpos($html,'<!-- /span -->',$start);
$data= substr($html,$start,$end-$start);
echo $data ;
?>
Here is the code:
function getUrlPath($url) {
$re = '/(?:https?:\/\/)?(?:[^?\/\s]+[?\/])(.*)/';
preg_match($re, $url, $matches);
return $matches[1];
}
Example: getUrlPaths("http://myassets.com:80/files/images/image.gif") returns files/images/image.gif
You can locate all the URLs in the html string with a regex using preg_match_all().
The regex:
'/=[\'"](https?:\/\/.*?(\/.*))[\'"]/i'
will capture both the entire URL and the path/query string for every occurrence of ="http://domain/path" or ='https://domain/path?query' (http/https, single or double quotes, with/without query string).
Then you can just use str_replace() to update the html string.
<?php
$html = '<a href="https://www.trade-ideas.com/products/score-vs-ibd/" >
<img src="http://static.trade-ideas.com/Filters/MinDUp1.gif">
<img src=\'https://static.trade-ideas.com/Filters/MinDUp1.gif?param=value\'>';
$pattern = '/=[\'"](https?:\/\/.*?(\/.*))[\'"]/i';
$urls = [];
preg_match_all($pattern, $html, $urls);
//var_dump($urls);
foreach($urls[1] as $i => $uri){
$html = str_replace($uri, $urls[2][$i], $html);
}
echo $html;
Run it live here.
Note, this will change all absolute URLs enclosed in quotes immediately following an =.
I'm using file_get_contents to display other websites on my website
and i want all links inside this content to be like this:
not like this:
I would use the str_replace() on your output string. For example,
$newOutput = str_replace('<a href="/', '<a href="www.google.com/', $output);
Where $output is the data from your file_get_contents(). For example,
$output = file_get_contents('example.txt');
EDIT:
To match your new requirement you mentioned in the comment in order to do that only for the links in the body, then you should split the $output into two parts, the head and the body and then do this only on the second one. Therefore, the code you should have is:
$output = file_get_contents('example.txt'); //get all content
$outputArray = explode('</head>', $output); //split the content
$outputBody = $outputArray[1]; //get everything after the </head>
$newOutput = str_replace('<a href="/', '<a href="www.google.com/', $outputBody);
How to find and replace all URL paths in an HTML file? I have an HTML file with links from Wayback Machine, like these:
"/web/2016***/http://blog.mydomain.com/archive/img.jpg"
"/web/2016***/http://blog.mydomain.com/archive/img2.jpg"
"/web/2016***/http://blog.mydomain.com/archive/page2.html"
The 2016*** part is dynamic. How do I extract these elements:
"/archive/img.jpg"
"/archive/img2.jpg"
"/archive/page2.html"
I have tried:
$html = $url;
$content = file_get_contents($html);
$newhtml = preg_replace( 'web/-[^-.]*\./' , '/' , $content);
file_put_contents('post1.html', $newhtml);
Try this regular expression: \/web.*blog\.mydomain\.com(.*):
preg_replace('\/web.*blog\.mydomain\.com(.*)', '\1', $content);
Check it out in action: https://regex101.com/r/m5ZaRo/3
I'm looking to separate the text and images in my Wordpress post. I want to be able to put them in different areas on my page.
I currently have only been able to get the content in tags but can't isolate and separate the content.
current html
<?php
$content = wpautop($content); // Add paragraph-tags
$content = str_replace('<p></p>', '', $content); // remove empty paragraphs
$content = preg_replace('/<p>\s*(<a .*>)?\s*(<img .* \/>)\s*(<\/a>)?\s*<\/p>/iU', '\1\2\3', $content); // remove paragraphs around img tags
echo $content;
?>
Updated based on your comment.
You can get all the content with no images.
$content = get_the_content();
$text = wp_strip_all_tags( $content );
And to get the images you can get all the URLs inside of an array.
$re = '/<img .+? src="(.+?)"/i';
preg_match_all( $re, $content, $images );
// images url are located on $images[1]
I want to pull out the list (ul) element from my wordpress post(s) so I can put it in a different location.
My current css pulls out the images and blockqute and puts just the text
html
<?php
$content = preg_replace('/<blockquote>(.*?)<\/blockquote>/', '', get_the_content());
$content = preg_replace('/(<img [^>]*>)/', '', $content);
$content = wpautop($content); // Add paragraph-tags
$content = str_replace('<p></p>', '', $content); // remove empty paragraphs
echo $content;
?>
Just a friendly reminder is that it is generally not recommended to parse html with regex.
If you would like to do that anyway you could try like this:
$pattern = '~<ul>(.*?)</ul>~s';
So in your code it would look like this:
preg_match_all('/(~<ul>(.*?)</ul>~s)/', $content, $ulElements);
And then for removing it from the original string:
preg_replace('/(~<ul>(.*?)</ul>~s)/', '', $content);