PHP : Echo Content between Two points in an HTML Document - php

I've found this code already for dealing with content between tags
$content_processed = preg_replace_callback(
'#\<pre\>(.+?)\<\/pre\>#s', create_function(
'$matches',
'return "<pre>".htmlentities($matches[1])."</pre>";' ), $content );
but how could I get it to just get a section of the HTML. The bit I'm looking at starts with;
click here</a></p><p><span class='title'>Soups<br />
and ends at
<div style='font-size:0.8em;'>
(The parts I've chosen are quite long because that way they are unique in the HTML.)

Do not parse html with regex. Bad, bad idea. Better use an XML parser to make it a nested object/array. That way you will be off much safer.
HOWEVER, if you use static code only on your web page (EG code that is never subject to change), you can just explode on that delimiter to chop the page in two halves, and explode again
example:
$html = file_get_contents('path/to/page.phtml');
$text = explode('click here</a></p><p><span class=\'title\'>Soups<br />', $html);
$text = explode('<div style='font-size:0.8em;'>', $text[1]);
$text = $text[0];
echo $text;

Related

I need to find a string in a string then replace that and text around it

i have a string that has markers and I need to replace with text from a database. this text string is stored in a database and the markers are for auto fill with data from a different part of the database.
$text = '<span data-field="la_lname" data-table="user_properties">
{Listing Agent Last Name}
</span>
<br>RE: The new offer<br>Please find attached....'
if i can find the data marker by:
strpos($text, 'la_lname');
can i use that to select everything in and between the <span> and </span> tags..
so the new string looks like:
'Sommers<br>RE: The new offer<br>Please find attached....'
I thought I could explode the string based on the <span> tags but that opens up a lot of problems as I need to keep the text intact and formated as it is. I just want to insert the data and leave everything else untouched.
To get what's between two parts of a string
for example if you have
<span>SomeText</span>
If you want to get SomeText then I suggest using a function that gets whatever is between two parts that you put as parameters
<?php
function getbetween($content,$start,$end) {
$r = explode($start, $content);
if (isset($r[1])){
$r = explode($end, $r[1]);
return $r[0];
}
return '';
}
$text = '<span>SomeText</span>';
$start = '<span>';
$end = '</span>';
$required_text = getbetween($text,$start,$end);
$full_line = $start.$required_text.$end;
$text = str_replace($full_line, 'WHAT TO REPLACE IT WITH HERE',$text);
You could try preg_replace or use a DOM Parser, which is far more useful for navigating HTML-like-structure.
I should add that while regular expressions should work just fine in this example, you may need to do more complex things in the future or traverse more intrincate DOM structures for your replacements, so a DOM Parser is the way to go in this case.
Using PHP Simple HTML DOM Parser
$html = str_get_html('<span data-field="la_lname" data-table="user_properties">{Listing Agent Last Name}</span><br>RE: The new offer<br>Please find attached....');
$html->find('span')->innerText = 'New value of span';

Finding and replacing attributes using preg_replace

I am trying to redo some forms that have uppercase field names and spaces, there are hundreds of fields and 50 + forms... I decided to try to write a PHP script that parses through the HTML of the form.
So now I have a textarea that I will post the html into and I want to change all the field names from
name="Here is a form field name"
to
name="here_is_a_form_field_name"
How in one command could I parse through and change it so all in the name tags would be lowercase and spaces replace with underscores
I am assuming preg_replace with an expression?
Thanks!
I would suggest not using regex for manipulation of HTML .. I would use DOMDocument instead, something like the following
$dom = new DOMDocument();
$dom->loadHTMLFile('filename.html');
// loop each textarea
foreach ($dom->getElementsByTagName('textarea') as $item) {
// setup new values ie lowercase and replacing space with underscore
$newval = $item->getAttribute('name');
$newval = str_replace(' ','_',$newval);
$newval = strtolower($newval);
// change attribute
$item->setAttribute('name', $newval);
}
// save the document
$dom->saveHTML();
An alternative would be to use something like Simple HTML DOM Parser for the job - there are some good examples on the linked site
I agree that preg_replace() or rather preg_replace_callback() is the right tool for the job, here's an example of how to use it for your task:
preg_replace_callback('/ name="[^"]"/', function ($matches) {
return str_replace(' ', '_', strtolower($matches[0]))
}, $file_contents);
You should, however, check the results afterwards using a diff tool and fine-tune the pattern if necessary.
The reason why I would recommend against a DOM parser is that they usually choke on invalid HTML or files that contain for example tags for templating engines.
This is your Solution:
<?php
$nameStr = "Here is a form field name";
while (strpos($nameStr, ' ') !== FALSE) {
$nameStr = str_replace(' ', '_', $nameStr);
}
echo $nameStr;
?>

preg_replace only OUTSIDE tags ? (... we're not talking full 'html parsing', just a bit of markdown)

What is the easiest way of applying highlighting of some text excluding text within OCCASIONAL tags "<...>"?
CLARIFICATION: I want the existing tags PRESERVED!
$t =
preg_replace(
"/(markdown)/",
"<strong>$1</strong>",
"This is essentially plain text apart from a few html tags generated with some
simplified markdown rules: <a href=markdown.html>[see here]</a>");
Which should display as:
"This is essentially plain text apart from a few html tags generated with some simplified markdown rules: see here"
... BUT NOT MESS UP the text inside the anchor tag (i.e. <a href=markdown.html> ).
I've heard the arguments of not parsing html with regular expressions, but here we're talking essentially about plain text except for minimal parsing of some markdown code.
Actually, this seems to work ok:
<?php
$item="markdown";
$t="This is essentially plain text apart from a few html tags generated
with some simplified markdown rules: <a href=markdown.html>[see here]</a>";
//_____1. apply emphasis_____
$t = preg_replace("|($item)|","<strong>$1</strong>",$t);
// "This is essentially plain text apart from a few html tags generated
// with some simplified <strong>markdown</strong> rules: <a href=
// <strong>markdown</strong>.html>[see here]</a>"
//_____2. remove emphasis if WITHIN opening and closing tag____
$t = preg_replace("|(<[^>]+?)(<strong>($item)</strong>)([^<]+?>)|","$1$3$4",$t);
// this preserves the text before ($1), after ($4)
// and inside <strong>..</strong> ($2), but without the tags ($3)
// "This is essentially plain text apart from a few html tags generated
// with some simplified <strong>markdown</strong> rules: <a href=markdown.html>
// [see here]</a>"
?>
A string like $item="odd|string" would cause some problems, but I won't be using that kind of string anyway... (probably needs htmlentities(...) or the like...)
You could split the string into tag‍/‍no-tag parts using preg_split:
$parts = preg_split('/(<(?:[^"\'>]|"[^"<]*"|\'[^\'<]*\')*>)/', $str, -1, PREG_SPLIT_DELIM_CAPTURE);
Then you can iterate the parts while skipping every even part (i.e. the tag parts) and apply your replacement on it:
for ($i=0, $n=count($parts); $i<$n; $i+=2) {
$parts[$i] = preg_replace("/(markdown)/", "<strong>$1</strong>", $parts[$i]);
}
At the end put everything back together with implode:
$str = implode('', $parts);
But note that this is really not the best solution. You should better use a proper HTML parser like PHP’s DOM library. See for example these related questions:
Highlight keywords in a paragraph
Regex / DOMDocument - match and replace text not in a link
First replace any string after a tag, but force your string is after a tag:
$t=preg_replace("|(>[^<]*)(markdown)|i",'$1<strong>$2</strong>',"<null>$t");
Then delete your forced tag:
$show=preg_replace("|<null>|",'',$show);
You could split your string into an array at every '<' or '>' using preg_split(), then loop through that array and replace only in entries not beginning with an '>'. Afterwards you combine your array to an string using implode().
This regex should strip all HTML opening and closing tags: /(<[.*?]>)+/
You can use it with preg_replace like this:
$test = "Hello <strong>World!</strong>";
$regex = "/(<.*?>)+/";
$result = preg_replace($regex,"",$test);
actually this is not very efficient, but it worked for me
$your_string = '...';
$search = 'markdown';
$left = '<strong>';
$right = '</strong>';
$left_Q = preg_quote($left, '#');
$right_Q = preg_quote($right, '#');
$search_Q = preg_quote($search, '#');
while(preg_match('#(>|^)[^<]*(?<!'.$left_Q.')'.$search_Q.'(?!'.$right_Q.')[^>]*(<|$)#isU', $your_string))
$your_string = preg_replace('#(^[^<]*|>[^<]*)(?<!'.$left_Q.')('.$search_Q.')(?!'.$right_Q.')([^>]*<|[^>]*$)#isU', '${1}'.$left.'${2}'.$right.'${3}', $your_string);
echo $your_string;

Stripping html tags using php

How can i strip html tag except the content inside the pre tag
code
$content="
<div id="wrapper">
Notes
</div>
<pre>
<div id="loginfos">asdasd</div>
</pre>
";
While using strip_tags($content,'') the html inside the pre tag too stripped of. but i don't want the html inside pre stripped off
Try :
echo strip_tags($text, '<pre>');
You may do the following:
Use preg_replace with 'e' modifier to replace contents of pre tags with some strings like ###1###, ###2###, etc. while storing this contents in some array
Run strip_tags()
Run preg_relace with 'e' modifier again to restore ###1###, etc. into original contents.
A bit kludgy but should work.
<?php
$document=html_entity_decode($content);
$search = array ("'<script[^>]*?>.*?</script>'si","'<[/!]*?[^<>]*?>'si","'([rn])[s]+'","'&(quot|#34);'i","'&(amp|#38);'i","'&(lt|#60);'i","'&(gt|#62);'i","'&(nbsp|#160);'i","'&(iexcl|#161);'i","'&(cent|#162);'i","'&(pound|#163);'i","'&(copy|#169);'i","'&#(d+);'e");
$replace = array ("","","\1","\"","&","<",">"," ",chr(161),chr(162),chr(163),chr(169),"chr(\1)");
$text = preg_replace($search, $replace, $document);
echo $text;
?>
$text = 'YOUR CODE HERE';
$org_text = $text;
// hide content within pre tags
$text = preg_replace( '/(<pre[^>]*>)(.*?)(<\/pre>)/is', '$1###pre###$3', $text );
// filter content
$text = strip_tags( $text, '<pre>' );
// insert back content of pre tags
if ( preg_match_all( '/(<pre[^>]*>)(.*?)(<\/pre>)/is', $org_text, $parts ) ) {
foreach ( $parts[2] as $code ) {
$text = preg_replace( '/###pre###/', $code, $text, 1 );
}
}
print_r( $text );
Ok!, you leave nothing but one choice: Regular Expressions... Nobody likes 'em, but they sure get the job done. First, replace the problematic text with something weird, like this:
preg_replace("#<pre>(.+?)</pre>#", "||k||", $content);
This will effectively change your
<pre> blah, blah, bllah....</pre>
for something else, and then call
strip_tags($content);
After that, you can just replace the original value in ||k||(or whatever you choose) and you'll get the desired result.
I think your content is not stored very well in the $content variable
could you check once by converting inner double quotes to single quotes
$content="
<div id='wrapper'>
Notes
</div>
<pre>
<div id='loginfos'>asdasd</div>
</pre>
";
strip_tags($content, '<pre>');
You may do the following:
Use preg_replace with 'e' modifier to replace contents of pre tags with some strings like ###1###, ###2###, etc. while storing this contents in some array
Run strip_tags()
Run preg_relace with 'e' modifier again to restore ###1###, etc. into original contents.
A bit kludgy but should work.
Could you please write full code. I understood, but something goes wrong. Please write full programming code

Regex Replace with Backreference modified by functions

I want to replace the class with the div text like this :
This: <div class="grid-flags" >FOO</div>
Becomes: <div class="iconFoo" ></div>
So the class is changed to "icon". ucfirst(strtolower(FOO)) and the text is removed
Test HTML
<div class="grid-flags" >FOO</div>
Pattern
'/class=\"grid-flags\" \>(FOO|BAR|BAZ)/e'
Replacement
'class="icon'.ucfirst(strtolower($1).'"'
This is one example of a replacement string I've tried out of seemingly hundreds. I read that the /e modifier evaluates the PHP code but I don't understand how it works in my case because I need the double quotes around the class name so I'm lost as to which way to do this.
I tried variations on the backref eg. strtolower('$1'), strtolower('\1'), strtolower('{$1}')
I've tried single and double quotes and various escaping etc and nothing has worked yet.
I even tried preg_replace_callback() with no luck
function callback($matches){
return 'class="icon"'.ucfirst(strtolower($matches[0])).'"';
}
It was difficult for me to try to work out what you meant, but I think you want something like this:
preg_replace('/class="grid-flags" \>(FOO|BAR|BAZ)/e',
'\'class="icon\'.ucfirst(strtolower("$1")).\'">\'',
$text);
Output for your example input:
<div class="iconFoo"></div>
If this isn't what you want, could you please give us some example inputs and outputs?
And I have to agree that this would be easier with an HTML parser.
Instead of using the e(valuate) option you can use preg_replace_callback().
$text = '<div class="grid-flags" >FOO</div>';
$pattern = '/class="grid-flags" >(FOO|BAR|BAZ)/';
$myCB = function($cap) {
return 'class="icon'.ucfirst($cap[1]).'" >';
};
echo preg_replace_callback($pattern, $myCB, $text);
But instead of using regular expressions you might want to consider a more suitable parser for html like simple_html_dom or php's DOM extension.
This works for me
$html = '<div class="grid-flags" >FOO</div>';
echo preg_replace_callback(
'/class *= *\"grid-flags\" *\>(FOO|BAR|BAZ)/'
, create_function( '$matches', 'return \'class="icon\' . ucfirst(strtolower($matches[1])) .\'">\'.$matches[1];' )
, $html
);
Just be aware of the problems of parsing HTML with regex.

Categories