I'm using file_get_contents to display other websites on my website
and i want all links inside this content to be like this:
not like this:
I would use the str_replace() on your output string. For example,
$newOutput = str_replace('<a href="/', '<a href="www.google.com/', $output);
Where $output is the data from your file_get_contents(). For example,
$output = file_get_contents('example.txt');
EDIT:
To match your new requirement you mentioned in the comment in order to do that only for the links in the body, then you should split the $output into two parts, the head and the body and then do this only on the second one. Therefore, the code you should have is:
$output = file_get_contents('example.txt'); //get all content
$outputArray = explode('</head>', $output); //split the content
$outputBody = $outputArray[1]; //get everything after the </head>
$newOutput = str_replace('<a href="/', '<a href="www.google.com/', $outputBody);
Related
I am working with an editor that works purely with internal relative links for files which is great for 99% of what I use it for.
However, I am also using it to insert links to files within an email body and relative links don't cut the mustard.
Instead of modifying the editor, I would like to search the string from the editor and replace the relative links with external links as shown below
Replace
files/something.pdf
With
https://www.someurl.com/files/something.pdf
I have come up with the following but I am wondering if there is a better / more efficient way to do it with PHP
<?php
$string = 'A link, some other text, A different link';
preg_match_all('/<a[^>]+href=([\'"])(?<href>.+?)\1[^>]*>/i', $string, $result);
if (!empty($result)) {
// Found a link.
$baseUrl = 'https://www.someurl.com';
$newUrls = array();
$newString = '';
foreach($result['href'] as $url) {
$newUrls[] = $baseUrl . '/' . $url;
}
$newString = str_replace($result['href'], $newUrls, $string);
echo $newString;
}
?>
Many thanks
Lee
You can simply use preg_replace to replace all the occurrences of files starting URLs inside double quotes:
$string = 'A link, some other text, A different link';
$string = preg_replace('/"(files.*?)"/', '"https://www.someurl.com/$1"', $string);
The result would be:
A link, some other text, A different link
You really should use DOMdocument for such job, but if you want to use a regex, this one does the job:
$string = '<a some_attribute href="files/something.pdf" class="abc">A link</a>, some other text, <a class="def" href="files/somethingelse.pdf" attr="xyz">A different link</a>';
$baseUrl = 'https://www.someurl.com';
$newString = preg_replace('/(<a[^>]+href=([\'"]))(.+?)\2/i', "$1$baseUrl/$3$2", $string);
echo $newString,"\n";
Output:
<a some_attribute href="https://www.someurl.comfiles/something.pdf" class="abc">A link</a>, some other text, <a class="def" href="https://www.someurl.com/files/somethingelse.pdf" attr="xyz">A different link</a>
I'm trying to remove script tags from the source code using regular expression.
/<\s*script[^>]*[^\/]>(.*?)<\s*\/\s*script\s*>/is
But I ran into the problem when I need to remove the code inside another code.
Please see this screenshot
I'm tested in https://regex101.com/r/R6XaUT/1
How do I correctly create a regular expression so that it can cover all the code?
Sample text:
$text = '<b>sample</b> text with <div>tags</div>';
Result for strip_tags($text):
Output: sample text with tags
Result for strip_tags_content($text):
Output: text with
Result for strip_tags_content($text, ''):
Output: <b>sample</b> text with
Result for strip_tags_content($text, '', TRUE);
Output: text with <div>tags</div>
I hope that someone is useful :)
source link
Simply use the PHP function strip_tags. See
http://php.net/manual/de/function.strip-tags.php
$string = "<div>hello</div>";
echo strip_tags($string);
Will output
hello
You also can provide a list of tags to keep.
==
Another approach is this:
// Load a file into $html
$html = file_get_contents('scratch.html');
$matches = [];
preg_match_all("/<\/*([^\s>]*)>/", $html, $matches);
// Have a list of all Tags only once
$tags = array_unique($matches[1]);
// Find the script index and remove it
$scriptTagIndex = array_search("script", $tags);
if($scriptTagIndex !== false) unset($tags[$scriptTagIndex]);
// Taglist must be a string containing <tagname1><tagename2>...
$allowedTags = array_map(function ($s) { return "<$s>"; }, $tags);
// Stript the HTML and keep all Tags except for removed ones (script)
$noScript = strip_tags($html,join("", $allowedTags));
echo $noScript;
I want to pull out the list (ul) element from my wordpress post(s) so I can put it in a different location.
My current css pulls out the images and blockqute and puts just the text
html
<?php
$content = preg_replace('/<blockquote>(.*?)<\/blockquote>/', '', get_the_content());
$content = preg_replace('/(<img [^>]*>)/', '', $content);
$content = wpautop($content); // Add paragraph-tags
$content = str_replace('<p></p>', '', $content); // remove empty paragraphs
echo $content;
?>
Just a friendly reminder is that it is generally not recommended to parse html with regex.
If you would like to do that anyway you could try like this:
$pattern = '~<ul>(.*?)</ul>~s';
So in your code it would look like this:
preg_match_all('/(~<ul>(.*?)</ul>~s)/', $content, $ulElements);
And then for removing it from the original string:
preg_replace('/(~<ul>(.*?)</ul>~s)/', '', $content);
I'm struggling on replacing text in each link.
$reg_ex = "/(http|https)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";
$text = '<br /><p>this is a content with a link we are supposed to click</p><p>another - this is a content with a link we are supposed to click</p><p>another - this is a content with a link we are supposed to click</p>';
if(preg_match_all($reg_ex, $text, $urls))
{
foreach($urls[0] as $url)
{
echo $replace = str_replace($url,'http://www.sometext'.$url, $text);
}
}
From the code above, I'm getting 3x the same text, and the links are changed one by one: everytime is replaced only one link - because I use foreach, I know.
But I don't know how to replace them all at once.
Your help would be great!
You don't use regexes on html. use DOM instead. That being said, your bug is here:
$replace = str_replace(...., $text);
^^^^^^^^--- ^^^^^---
you never update $text, so you continually trash the replacement on every iteration of the loop. You probably want
$text = str_replace(...., $text);
instead, so the changes "propagate"
If you want the final variable to contain all replacements change it so something like this...
You basically are not passing the replaced string back into the "subject". I assume that is what you are expecting since it's a bit difficult to understand the question.
$reg_ex = "/(http|https)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";
$text = '<br /><p>this is a content with a link we are supposed to click</p><p>another - this is a content with a link we are supposed to click</p><p>another - this is a content with a link we are supposed to click</p>';
if(preg_match_all($reg_ex, $text, $urls))
{
$replace = $text;
foreach($urls[0] as $url) {
$replace = str_replace($url,'http://www.sometext'.$url, $replace);
}
echo $replace;
}
I have some text contain html tags, I would like to replace all links with other one, but I want to replace just local links, not they start with http://
example :
test link
==> test link
Video
==> Video
I try this preg_replace but not working :
$exclude = '<a href=\"http://.*?';
$pattern = '<a href=\".*?';
$content=preg_replace("~(($exclude)?($pattern))~i",'<a href="/action.php?url=$4',$content);
Thanks!
What about something like this:
$content = preg_replace('#<a href="([^:]*)">#i', '<a href="/action.php?url=$1">', $content);