grab text in the middle to a variable [duplicate] - php

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
PHP DOMDocument - get html source of BODY
I have the following code as a variable and trying to grab everything in between the body tags (while keeping the p tags etc). Whats the best way of doing this?
pregmatch
strpos / substr
<head>
<title></title>
</head>
<body>
<p>Services Calls2</p>
</body>

Neither. You can use a XML parser, like DomDocument:
$dom = new DOMDocument();
$dom->loadHTML($var);
$body = $dom->getElementsByTagName('body')->item(0);
$content = '';
foreach($body->childNodes as $child)
$content .= $dom->saveXML($child);

Try this, $html has the text:
$s = strpos($html, '<body>') + strlen('<body>');
$f = '</body>';
echo trim(substr($html, $s, strpos($html, $f) - $s));

I recommend you to use preg_match because contents between <p>Services Calls2</p> can change all the time then subtr or strpos is going to require quite controversial code.
Example:
$a = '<h2><p>Services Calls2</p></h2>';
preg_match("/<p>(?:\w|\s|\d)+<\/p>/", $a, $ar);
var_dump($ar);
The regex is going to allow alphabets, space and digits only.

Related

Transforming <span style="font-weight:bold">some text<span> into <b>some text</b> in PHP [duplicate]

This question already has answers here:
How do you parse and process HTML/XML in PHP?
(31 answers)
Closed 3 years ago.
As title, If I have some html <p><span style="font-style:italic">abcde</span><span style="font-weight:bold">abcde</span></p>, I want to strip the style tags and transform them into html tags, so to make it become <p><i>abcde</i><b>abcde</b></p>. How can I do that in PHP?
I notice that when I open the html in CKEditor, this kind of transformation is done automatically. But I want to do it in backend PHP. Thanks.
$string = '<p><span style="font-style-italic;font-weight:bold">abcde</span><span style="font-weight:bold">abcde</span></p>';
$dom = new DOMDocument();
$dom->loadHTML($string);
$xp = new DOMXPath($dom);
$str = '';
$results = $xp->query('//span');
if($results->length>0){
foreach($results as $result){
$style = $result->getAttribute("style");
$style_arr = explode(";",$style);
$style_template = '%s';
if(count($style_arr)>0){
foreach($style_arr as $style_item){
if($style_item == 'font-style-italic'){
$style_template = '<i>'.$style_template.'</i>';
}
if($style_item == 'font-weight:bold'){
$style_template = '<b>'.$style_template.'</b>';
}
}
}
$str .= sprintf($style_template,$result->nodeValue);
}
}
$str = '<p>'.$str.'</p>';
You can also use html tags under php parameters or php opening and closing tags like this
<?php
echo"<h1>Here is Heading h1 </h1>";
?>
Or you can Put your html code in " " after echo
Like this
<?php
echo"Your Html Code Here";
?>
$output = preg_replace('/(<[^>]+) style=".*?"/i', '$1', $input);
Match a < follow by one or more and not > until space came and the style="anything" reached. The /i will work with capital STYLE and $1 will leave the tag as it is, if the tag does not include style="". And for the single quote style='' use this:
(<[^>]+) style=("|').*?("|')

How to extract title from php file get contents? [duplicate]

This question already has answers here:
Fastest way to retrieve a <title> in PHP
(7 answers)
Closed 3 years ago.
<?php
$content=file_get_contents('example.com');
// it would return html <head>.....
<title>Example.com</title>
I want to extract example.com from title
$title=pick('<title>','</title>',$content);
Echo $title;
And it would show Example.com
You can use substr to substring the HTML content and stripos to find the title tags.
I add 7 to the position to remove the tag.
$html = file_get_contents('example.com');
$pos = stripos($html, "<title>")+7;
echo substr($html, $pos, stripos($html, "</title>")-$pos);
Example:
https://3v4l.org/qvC40
This assumes there is only one title tag on the page, if there is more then it will get the first title tag.
You can use file_get_content() instead of $string.
$string = "<title>MY TITLE</title>";
$pattern = "/<title>(.*?)<\/title>/";
preg_match($pattern, $string, $matches);
echo "RESULT : ".$matches[1];
Try using PHP's simple xml parser to read the title node.
$xml = simplexml_load_string(file_get_contents('example.com'));
echo $xml->head->title;

How to replace all specific strings between specific strings? [duplicate]

This question already has answers here:
replace all "foo" between ()
(3 answers)
Closed 7 years ago.
I like to replace all \n inside of <pre></pre> with a placeholder. This is what I created:
<?php
$html = "<div>\n<pre id=foo>Foo\n\nBar Bar\nFoo Foo</pre>\n\n</div>";
echo preg_replace("/(<pre[^>]*>[^<]*)(\n)([^<]*<\/pre)/", "$1{NEWLINE}$3", $html);
?>
It replaces only one \n as expected. Do I need to use preg_replace_callback() and a separate function to replace the linebreaks or is it possible with one regex alone?
EDIT: Any solution available for this, too?
$html2 = "<div>\n<pre id=foo><b>Foo\n\n</b>Bar Bar\nFoo Foo</pre>\n\n</div>";
You can do this using a callback as you suggested.
$html = preg_replace_callback('~<pre[^>]*>\K.*?(?=</pre>)~si',
function($m) {
return str_replace(array("\r\n", "\n", "\r"), '{NEWLINE}', $m[0]);
}, $html);
Although, I would recommend using DOM to perform this task.
$doc = new DOMDocument;
#$doc->loadHTML($html); // load the HTML
$nodes = $doc->getElementsByTagName('pre');
$find = array("\r\n", "\n", "\r");
foreach ($nodes as $node) {
$node->nodeValue = str_replace($find, '{NEWLINE}', $node->nodeValue);
}
echo $doc->saveHTML();
My question is duplicate:
https://stackoverflow.com/a/5756032/318765
This is what I need:
<?php
echo preg_replace("/(\r\n|\n\r|\n|\r)(?=[^<>]*<\/pre)/", "{NEWLINE}", $html);
?>

PHP parse HTML tags [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to parse and process HTML with PHP?
I'm pretty new to PHP.
I have the text of a body tag of some page in a string variable.
I'd like to know if it contains some tag ... where the tag name tag1 is given, and if so, take only that tag from the string.
How can I do that simply in PHP?
Thanks!!
You would be looking at something like this:
<?php
$content = "";
$doc = new DOMDocument();
$doc->load("example.html");
$items = $doc->getElementsByTagName('tag1');
if(count($items) > 0) //Only if tag1 items are found
{
foreach ($items as $tag1)
{
// Do something with $tag1->nodeValue and save your modifications
$content .= $tag1->nodeValue;
}
}
else
{
$content = $doc->saveHTML();
}
echo $content;
?>
DomDocument represents an entire HTML or XML document; serves as the root of the document tree. So you will have a valid markup, and by finding elements By Tag Name you won't find comments.
Another possibility is regex.
$matches = null;
$returnValue = preg_match_all('#<li.*?>(.*?)</li>#', 'abc', $matches);
$matches[0][x] contains the whole matches such as <li class="small">list entry</li>, $matches[1][x] containt the inner HTML only such as list entry.
Fast way:
Look for the index position of tag1 then look for the index position of /tag1. Then cut the string between those two indexes. Look up strpos and substr on php.net
Also this might not work if your string is too long.
$pos1 = strpos($bigString, '<tag1>');
$pos2 = strpos($bigString, '</tag1>');
$resultingString = substr($bigString, -$pos1, $pos2);
You might have to add and/or substract some units from $pos1 and $pos2 to get the $resultingString right.
(if you don't have comments with tag1 inside of them sigh)
The right way:
Look up html parsers

Parsing HTML and replacing strings [duplicate]

This question already has answers here:
How do you parse and process HTML/XML in PHP?
(31 answers)
Closed 3 years ago.
I have a large quantity of partial HTML stored in a CMS database.
I'm looking for a way to go through the HTML and find any <a></a> tags that don't have a title and add a title to them based on the contents of the tags.
So if I had some text I'd like to modify the tag to look like:
<a title="some text" href="somepage"></a>
Some tags already have a title and some anchor tags have nothing between them.
So far I've managed to make some progress with php and regex.
But I can't seem to be able to get the contents of the anchors, it just displays either a 1 or a 0.
<?php
$file = "test.txt";
$handle = fopen("$file", "r");
$theData = fread($handle, filesize($file));
$line = explode("\r\n", $theData);
$regex = '/^.*<a ((?!title).)*$/'; //finds all lines that don't contain an anchor with a title
$regex2 = '/<a .*><\/a>/'; //finds all lines that have nothing between the anchors
$regex3 = '/<a.*?>(.+?)<\/a>/'; //finds the contents of the anchors
foreach ($line as $lines)
{
if (!preg_match($regex2, $lines) && preg_match($regex, $lines)){
$tags = $lines;
$contents = preg_match($regex3, $tags);
$replaced = str_replace("<a ", "<a title=\"$contents\" ", $lines);
echo $replaced ."\r\n";
}
else {
echo $lines. "\r\n";
}
}
?>
I understand regex is probably not the best way to parse HTML so any help or alternate suggestions would be greatly appreciated.
Use PHP's built-in DOM parsing. Much more reliable than regex. Be aware that loading HTML into the PHP DOM will normalize it.
$doc = new DOMDocument();
#$doc->loadHTML($html); //supress parsing errors with #
$links = $doc->getElementsByTagName('a');
foreach ($links as $link) {
if ($link->getAttribute('title') == '') {
$link->setAttribute('title', $link->nodeValue);
}
}
$html = $doc->saveHTML();
If it was coherent, you could use a simplistic regex. But it'll fail if your anchors have classes or anything. Also it doesn't corrently encode the title= attribute:
preg_replace('#<(a\s+href="[^"]+")>([^<>]+)</a>#ims', '<$1 title="$2">$2</a>',);
Therefore phpQuery/querypath is likely the robuster approach:
$html = phpQuery::newDocument($html);
foreach ($html->find("a") as $a) {
if (empty($a->attr("title")) {
$a->attr("title", $a->text());
}
}
print $html->getDocument();
Never use regex on parsing HTML. In php, use DOM.
Here's a more simple one: http://simplehtmldom.sourceforge.net/

Categories