Scrape an H1 element on the current page in PHP - php

I'm currently working with Wordpress. I have a hook that runs before a <title> attribute is populated with text that a user enters in the dashboard.
Now I want to set a default title of each page to equal an <h1> attribute text value on a current page. A fragment of the callback function for the hook I'm working with would look like:
if (!$seoTitle) {
$seoTitle = '<....>';
}
return $seoTitle;
I want seoTitle to default to an <h1> element text on the current page. Is it doable? How can I achieve this?

I'm not totally sure how you get your HTML but you could parse it with the built in DOM parser.
<?php
$html = "<!DOCTYPE html>
<html>
<head>
<title>Page Title</title>
</head>
<body>
<h1>This is a Heading one</h1>
<p>This is a paragraph.</p>
<h1>This is a Heading two</h1>
<p>This is a paragraph.</p>
<h1>This is a Heading three</h1>
<p><a href='testwww'> This is a paragraph.</a></p>
</body>
</html>";
$dom = new DOMDocument();
$dom->loadHTML($html);
//If you want to get it from a website you could do the following:
//$dom->loadHTML(file_get_contents('http://www.w3schools.com/'));
// iterate through the html to get all h1 text
foreach($dom->getElementsByTagName('h1') as $heading) {
$h1 = $heading->nodeValue;
echo $h1 . "<br>";
}
?>

Assuming you have your HTML content within a variable and doing this after the page has fully loaded please take a look at the below example:
<?php
$htmlContent = '<html><body><h1>HELLO</h1></body></html>'; // change this to what you need
$seoTitle = preg_replace('/(.*)<h1>([^>]*)<\/h1>(.*)/is', '$2', $htmlContent);
echo $seoTitle; // will output: HELLO
?>

echo "<h1>".(string)$seoTitle."</h1>";
Should work. You can also break out of the php ?> and then type regular html and then break in when you wanna echo the variable.

Related

How to stop adding new values to an array after one value is added to that array?

I have some html files that contain the same tags with different strings between these tags , I want to get strings from specific tags and after it finds the first match then this string is the only added to the array , for more details see this code.
The html:
<!DOCTYPE html>
<html>
<head></head>
<body>
<h1>Some Text</h1>
<p>This is the first Paragraph</p>
<ul>
<li></li>
<li></l1>
</ul>
<p>This is the second Pharagraph</p>
</body>
</html>
The html files will contain more elements
I want to get the text inside the first <p> only and prevent wasting time searching the whole html file while I just want to get one value from a specific tag.
The PHP:
//Loop inside all the HTML files inside a folder
$files = glob("files/*.html");
foreach($files as $file){
//Get the whole content of each HTMl file
$content = file_get_contents($file);
//Search for specific tag
preg_match_all('#<p>(.*?)<\/p>', $content, $matches);
}
I only want to add the value of the first match to the $matches.
I can't edit the html code to add class or id to the tags I want to get values from because I'm not the one who created them and I can't edit all the files manually
I don't mind using another way to get these values but it should achieve what I want (only the first match then it's stopped searching the whole file)
You can do this with DomDocument.
<?php
$html = '<!DOCTYPE html>
<html>
<head></head>
<body>
<h1>Some Text</h1>
<p>This is the first Paragraph</p>
<ul>
<li></li>
<li></l1>
</ul>
<p>This is the second Pharagraph</p>
</body>
</html>';
$err = libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($html);
libxml_clear_errors();
libxml_use_internal_errors($err);
// find all p tags, select the first, get its value
$pValue = $dom->getElementsByTagName('p')->item(0)->nodeValue;
//This is the first Paragraph
echo $pValue;
https://3v4l.org/kjFoC
So if you wanted to add to your code, perhaps do it like:
<?php
function getFirstParagraph($src) {
$err = libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($src);
libxml_clear_errors();
libxml_use_internal_errors($err);
return $dom->getElementsByTagName('p')->item(0)->nodeValue;
}
//Loop inside all the HTML files inside a folder
$files = glob("files/*.html");
foreach($files as $file){
//Get the whole content of each HTMl file
$content = file_get_contents($file);
//
$matches[] = getFirstParagraph($content);
}

How to select Content of ALL div's with PHP

I want to select contents of every DIV tags in PHP.
Just imagine we have this HTML page :
<html>
<body>
<div class="one">Content1</div>
<span>blah..</span>
<div class="two">Content2</div>
</body>
</html>
Now , i want to have every DIV tag content, For example from that HTML code , I want to have Content1 in One variable and the Content2 in the other Variable and so on ....
Just need to access the parts easily. Just this.
Every page have random number of DIV tags, so i need a flexable Code to detect DIV tags and put the content of every one in array or any type of variable..
How to do it ?
DOMDocument
$divs = array();
$HTML = '<html>
<body>
<div class="one">Content1</div>
<span>blah..</span>
<div class="two">Content2</div>
</body>
</html>';
$doc = new DOMDocument();
$doc->loadHTML($HTML);
foreach($doc->getElementsByTagName('div') as $div) {
array_push($divs, $div->textContent);
}
var_dump($divs);
example
try to use strip_tags() function:
http://php.net/manual/en/function.strip-tags.php
You can download PHP Simple HTML DOM Parser
And access the div tags like this :
$html = file_get_html('urltopage.com');
foreach($html->find('div') as $e)
echo $e->innertext . '<br>';

PHP: php and .html file separation

I'm currently working on separating HTML & PHP code here's my code which is currently working for me.
code.php
<?php
$data['#text#'] = 'A';
$html = file_get_contents('test.html');
echo $html = str_replace(array_keys($data),array_values($data),$html);
?>
test.html
<html>
<head>
<title>TEST HTML</title>
</head>
<body>
<h1>#text#</h1>
</body>
</html>
OUTPUT: A
it search and change the #text# value to array_value A it works for me.
Now i'm working on a code to search "id" tags on html file. If it's searches the "id" in ".html" file it will put the array_values in the middle of >
EX: <div id="test"> **aray_values here** </div>
test.php
<?php
$data['id="test"'] = 'A';
$html = file_get_contents('test.html');
foreach ($data as $search => $value)
{
if (strpos($html , $search))
{
echo 'FOUND';
echo $value;
}
}
?>
test.html
<html>
<head>
<title>TEST</title>
</head>
<body>
<div id="test" ></div>
</body>
</html>
My problem is I don't know how to put the array_values in the middle of every ></ search in the .html file.
Desired OUTPUT: <div id="test" >A</div>
function callbackInsert($matches)
{
global $data;
return $matches[1].$matches[3].$matches[4].$data[$matches[3]].$matches[6];
}
$data['test'] = 'A';
$html = file_get_contents('test.html');
foreach ($data as $search => $value)
{
preg_replace_callback('#(<([a-zA-Z]+)[^>]*id=")(.*?)("[^>]*>)([^<]*?)(</\\2>)#ism', 'callbackInsert', $html);
}
Warning: code is not tested and could be improved - re global keyword and what items are allowed between > and
Regular expression explanation:
(<([a-zA-Z]+) - any html tag starting including the last letter of the tag
[^>]* - anything that is inside a tag <>
id=")(.*?)(" - the id attribute and its value
[^>]* - anything that is inside a tag <>
>) - the closing tag
([^<]*?) - anything that is not a tag, tested by opening a tag <
(</\\2>) - the closing tag matching the 2nd bracket, ie. the matching opening tag
Use views (.phtml) files to dynamically generate content. This is native for PHP (no 3rd party required).
See this answer: What is phtml, and when should I use a .phtml extension rather than .php?
and this:
https://stackoverflow.com/questions/62617/whats-the-best-way-to-separate-php-code-and-html

How to display title of page in body content under h1 tag

i have used RSSEO plugin for optimizing my joomla site, however i want my h1 tag in custom components and pages to be similar to page title. Tried below
<h1>
<script type="text/javascript">
<!--
document.write(document.title);
//-->
</script></h1>
The above script is able to display h1 tag, but when checks source code its not seo friendly as display script
I think i need server side php code, have tried using
<h1><?php echo $PageTitle ?></h1>
But above is not displaying any value. only leading to blank h1 tags
Can anyone suggest and advise pls to do it effectively
thanks
Try this:
HTML:
<h1 id="pagetitle"></h1>
JavaScript:
document.getElementById('pagetitle').innerHTML = document.title;
If you want the script inline:
<h1 id="pagetitle"></h1>
<script>
document.getElementById('pagetitle').innerHTML = document.title;
</script>
Joomla $document will contain the title, so simply inject it in your component / template:
<?php
$document = JFactory::getDocument();
echo "<h1>".$document->getMetaData('title')."</h1>";
?>
this should do the trick.
Try this :
<?php
$url="http://".$_SERVER['HTTP_HOST'].$_SERVER['REQUEST_URI'];
$page = fread(fopen($url, "r"), 2048); // first 2KB
if(preg_match("/<title>(.+)<\/title>/i",$page,$result))
{
echo "The title of $url is $result[1]</b>";
}
else
{
echo "The page doesn't have a title tag";
}
?>
This will load faster as you want :
This works in Joomla! 2.5.x:
<?php
$document = JFactory::getDocument();
?>
<h1><?php echo $document->getTitle(); ?></h1>

find if a img have "alt", if not then add from array ( serverside )

first I need to find all img in the sites,
and then check if the img have the "alt" attribute, if image have the attribute it'll be escaped and if it not have one or the alt is empty,a string will be randomly added to img from a list or array.
here is how you do it with javascript:
find if a img have alt in jquery if not then add from array
but it did not help me because according to this:
How do search engines crawl Javascript?
search bots can't read it , if you use JavaScript you need to use server-side language to add keyword to img alt.
what next? php? can i do it with a simple code?
Well, import it into an DOMDocument object and find all images inside.
Seems rather trivial. See the DOMDocument class
Here's my code for the problem:
<?php
$html = <<<HTML
<html lang="en-US">
<head>
<meta charset="UTF-8">
<title></title>
</head>
<body>
<p>
<img src="test.png">
<img src="test.jpg" alt="Testing">
<img src="test.gif">
</p>
</body>
</html>
HTML;
$dom = new DOMDocument();
$dom->loadHTML($html);
$images = $dom->getElementsByTagName("img");
foreach ($images as $image) {
if (!$image->hasAttribute("alt")) {
$altAttribute = $dom->createAttribute("alt");
$altAttribute->value = "Ready Value!";
$image->appendChild($altAttribute);
}
}
echo $dom->saveHTML();

Categories