How to replace all specific strings between specific strings? [duplicate] - php

This question already has answers here:
replace all "foo" between ()
(3 answers)
Closed 7 years ago.
I like to replace all \n inside of <pre></pre> with a placeholder. This is what I created:
<?php
$html = "<div>\n<pre id=foo>Foo\n\nBar Bar\nFoo Foo</pre>\n\n</div>";
echo preg_replace("/(<pre[^>]*>[^<]*)(\n)([^<]*<\/pre)/", "$1{NEWLINE}$3", $html);
?>
It replaces only one \n as expected. Do I need to use preg_replace_callback() and a separate function to replace the linebreaks or is it possible with one regex alone?
EDIT: Any solution available for this, too?
$html2 = "<div>\n<pre id=foo><b>Foo\n\n</b>Bar Bar\nFoo Foo</pre>\n\n</div>";

You can do this using a callback as you suggested.
$html = preg_replace_callback('~<pre[^>]*>\K.*?(?=</pre>)~si',
function($m) {
return str_replace(array("\r\n", "\n", "\r"), '{NEWLINE}', $m[0]);
}, $html);
Although, I would recommend using DOM to perform this task.
$doc = new DOMDocument;
#$doc->loadHTML($html); // load the HTML
$nodes = $doc->getElementsByTagName('pre');
$find = array("\r\n", "\n", "\r");
foreach ($nodes as $node) {
$node->nodeValue = str_replace($find, '{NEWLINE}', $node->nodeValue);
}
echo $doc->saveHTML();

My question is duplicate:
https://stackoverflow.com/a/5756032/318765
This is what I need:
<?php
echo preg_replace("/(\r\n|\n\r|\n|\r)(?=[^<>]*<\/pre)/", "{NEWLINE}", $html);
?>

Related

PHP preg_replace words in array but skip links (href) [duplicate]

This question already has answers here:
php regex to match outside of html tags
(4 answers)
Closed 3 years ago.
I'm working on some code to replace words inside the WordPress content for links. For example: the word "example" needs to be replaced for a text link: example.
I've got this working with the following code:
function word_replace($text){
$site = esc_url( home_url() );
$replace = array(
'example' => 'example',
'word' => 'word',
);
$text = str_replace(array_keys($replace), $replace, $text);
return $text;
}
The only issue is that words inside a href="" attribute also get replaced and this breaks the HTML. How do I avoid words from being replaced inside a href="" attribute or inside a class="" attribute? What regex do I need to skip these attributes? A piece of example code would be a big help :-)
TRY THIS OUT
$site = 'http://example.com';
$html = 'Link';
$dom = new DomDocument;
$dom->loadHTML($html);
$nodes = $dom->getElementsByTagName('a');
$node = $nodes[0];
$node->setAttribute('href', 'page.html');
echo $dom->saveHTML($node);

replacing next occurence of tag [duplicate]

This question already has an answer here:
find and replace keywords by hyperlinks in an html fragment, via php dom
(1 answer)
Closed 8 years ago.
Ive got a large string with some markup in it I want to change in order for it to work with fpdf.
<span style="text-decoration: underline;">some text</span>
I need to replace the tags here with
<i>some text</i>
However a simple str_replace(); wont work because there are span tags that should not be replaced. I need to make something that finds <span style="text-decoration: underline;">
and then looks for the next occurence of </span> and only replaces that. I haven't got the slightest clue on how to do this. I've looked at http://us.php.net/strpos but not sure on how to implement that, and if that will be the solution. Can anyone give me some pointers?
Thanks.
This should do the trick:
<?php
$in = '<span>Invalid</span><span style="text-decoration: underline;">some text</span><span>Invalid</span>';
$out = preg_replace('#<span style=".*">([^<]+)<\/span>#', '<i>\1</i>', $in);
echo $out;
?>
View on Codepad.org
You can also restrict what text you'll look for in the tag, for example, only alphanumerics and whitespaces:
<?php
$in = '<span>Invalid</span><span style="text-decoration: underline;">some text</span><span>Invalid</span>';
$out = preg_replace('#<span style=".*">([\w|\s]+)<\/span>#', '<i>\1</i>', $in);
echo $out;
?>
View on Codepad.org
$dom = new domDocument;
$dom->loadHTML($html);
$spans = $dom->getElementsByTagName('span');
foreach ($spans as $node){
$text = $node->textContent;
$node->removeChild($node->firstChild);
$fragment = $dom->createDocumentFragment();
$fragment->appendXML('<i>'.$text.'</i>');
$node->appendChild($fragment);
}
$out = preg_replace('/^<!DOCTYPE.+?>/', '', str_replace( array('<html>', '</html>', '<body>', '</body>'), array('', '', '', ''), $dom->saveHTML()));

grab text in the middle to a variable [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
PHP DOMDocument - get html source of BODY
I have the following code as a variable and trying to grab everything in between the body tags (while keeping the p tags etc). Whats the best way of doing this?
pregmatch
strpos / substr
<head>
<title></title>
</head>
<body>
<p>Services Calls2</p>
</body>
Neither. You can use a XML parser, like DomDocument:
$dom = new DOMDocument();
$dom->loadHTML($var);
$body = $dom->getElementsByTagName('body')->item(0);
$content = '';
foreach($body->childNodes as $child)
$content .= $dom->saveXML($child);
Try this, $html has the text:
$s = strpos($html, '<body>') + strlen('<body>');
$f = '</body>';
echo trim(substr($html, $s, strpos($html, $f) - $s));
I recommend you to use preg_match because contents between <p>Services Calls2</p> can change all the time then subtr or strpos is going to require quite controversial code.
Example:
$a = '<h2><p>Services Calls2</p></h2>';
preg_match("/<p>(?:\w|\s|\d)+<\/p>/", $a, $ar);
var_dump($ar);
The regex is going to allow alphabets, space and digits only.

preg_replace - How to remove contents inside a tag?

Say I have this.
$string = "<div class=\"name\">anyting</div>1234<div class=\"name\">anyting</div>abcd";
$regex = "#([<]div)(.*)([<]/div[>])#";
echo preg_replace($regex,'',$string);
The output is
abcd
But I want
1234abcd
How do I do it?
Like this:
preg_replace('/(<div[^>]*>)(.*?)(<\/div>)/i', '$1$3', $string);
If you want to remove the divs too:
preg_replace('/<div[^>]*>.*?<\/div>/i', '', $string);
To replace only the content in the divs with class name and not other classes:
preg_replace('/(<div.*?class="name"[^>]*>)(.*?)(<\/div>)/i', '$1$3', $string);
$string = "<div class=\"name\">anything</div>1234<div class=\"name\">anything</div>abcd";
echo preg_replace('%<div.*?</div>%i', '', $string); // echo's 1234abcd
Live example:
http://codepad.org/1XEC33sc
add ?, it will find FIRST occurence
preg_replace('~<div .*?>(.*?)</div>~','', $string);
http://sandbox.phpcode.eu/g/c201b/3
This might be a simple example, but if you have a more complex one, use an HTML/XML parser. For example with DOMDocument:
$doc = DOMDocument::loadHTML($string);
$xpath = new DOMXPath($doc);
$query = "//body/text()";
$nodes = $xpath->query($query);
$text = "";
foreach($nodes as $node) {
$text .= $node->wholeText;
}
Which query you have to use or whether you have to process the DOM tree in some other way, depends on the particular content you have.

Parsing HTML and replacing strings [duplicate]

This question already has answers here:
How do you parse and process HTML/XML in PHP?
(31 answers)
Closed 3 years ago.
I have a large quantity of partial HTML stored in a CMS database.
I'm looking for a way to go through the HTML and find any <a></a> tags that don't have a title and add a title to them based on the contents of the tags.
So if I had some text I'd like to modify the tag to look like:
<a title="some text" href="somepage"></a>
Some tags already have a title and some anchor tags have nothing between them.
So far I've managed to make some progress with php and regex.
But I can't seem to be able to get the contents of the anchors, it just displays either a 1 or a 0.
<?php
$file = "test.txt";
$handle = fopen("$file", "r");
$theData = fread($handle, filesize($file));
$line = explode("\r\n", $theData);
$regex = '/^.*<a ((?!title).)*$/'; //finds all lines that don't contain an anchor with a title
$regex2 = '/<a .*><\/a>/'; //finds all lines that have nothing between the anchors
$regex3 = '/<a.*?>(.+?)<\/a>/'; //finds the contents of the anchors
foreach ($line as $lines)
{
if (!preg_match($regex2, $lines) && preg_match($regex, $lines)){
$tags = $lines;
$contents = preg_match($regex3, $tags);
$replaced = str_replace("<a ", "<a title=\"$contents\" ", $lines);
echo $replaced ."\r\n";
}
else {
echo $lines. "\r\n";
}
}
?>
I understand regex is probably not the best way to parse HTML so any help or alternate suggestions would be greatly appreciated.
Use PHP's built-in DOM parsing. Much more reliable than regex. Be aware that loading HTML into the PHP DOM will normalize it.
$doc = new DOMDocument();
#$doc->loadHTML($html); //supress parsing errors with #
$links = $doc->getElementsByTagName('a');
foreach ($links as $link) {
if ($link->getAttribute('title') == '') {
$link->setAttribute('title', $link->nodeValue);
}
}
$html = $doc->saveHTML();
If it was coherent, you could use a simplistic regex. But it'll fail if your anchors have classes or anything. Also it doesn't corrently encode the title= attribute:
preg_replace('#<(a\s+href="[^"]+")>([^<>]+)</a>#ims', '<$1 title="$2">$2</a>',);
Therefore phpQuery/querypath is likely the robuster approach:
$html = phpQuery::newDocument($html);
foreach ($html->find("a") as $a) {
if (empty($a->attr("title")) {
$a->attr("title", $a->text());
}
}
print $html->getDocument();
Never use regex on parsing HTML. In php, use DOM.
Here's a more simple one: http://simplehtmldom.sourceforge.net/

Categories