getting last image from post - php

Looking to add to my DOM code to include targeting the last image of my wordpress posts
EDIT - the code I have only targets/pulls out the blockquote of the content. I want to be able to use the last image in my wordpress post as a background for specific div.
html
<?php
$content = get_the_content();
$content = wpautop($content);
$doc = new DOMDocument();
$doc->loadHTML(get_the_content(), LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($doc);
foreach ($xpath->query('//blockquote') as $node) {
$node->parentNode->removeChild($node);
}
// My attempt
foreach ($xpath->query('//img') as $node) {
$node->parentNode->removeChild($node);
}
$content = $doc->saveHTML($doc);
?>
My attempt has been able to remove the images altogether not target the last one.

Try preg_match_all():
// Make sure shortcodes are fetched.
$content = apply_filters('the_content', $content);
preg_match_all('/<img([^>]+)>/', $content, $images);
$last_image = array_pop($images[0]);

Related

adding text to a file put contents (PHP)

I'm very new to php. I understand that echo is how you output text, but not sure how to apply it with the below scenario. Below, data is being scraped and outputted. Wondering if there's a way with the file_put_contents to add a text to the output, and the text I'm trying to add is a "%". Reason is the output of the below code is a random number that changes daily, and it's in fact a percent, so I'd like to add that to the end of the output every time.
Thanks so much for any assistance.
// get japanchange
function getJapanchange(){
$doc = new DOMDocument;
// We don't want to bother with white spaces
$doc->preserveWhiteSpace = false;
// Most HTML Developers are chimps and produce invalid markup...
$doc->strictErrorChecking = false;
$doc->recover = true;
$doc->loadHTMLFile('http://________________//global-
indices/');
$xpath = new DOMXPath($doc);
$query = "//div[#class='MT10']";
$entries = $xpath->query($query);
foreach ($entries as $entry) {
$result = trim($entry->textContent);
$ret_ = explode(' ', $result);
//make sure every element in the array don't start or end with blank
foreach ($ret_ as $key=>$val){
$ret_[$key]=trim($val);
}
//delete the empty element and the element is blank "\n" "\r" "\t"
//I modify this line
$ret_ = array_values(array_filter($ret_,deleteBlankInArray));
//echo the last element
file_put_contents(globalVars::$_cache_dir . "japanchange",
$ret_[56]);
}
}
If you just want to add a % to the end of the output to the file your already using. You could simple do
file_put_contents(globalVars::$_cache_dir . "japanchange",
$ret_[56].'%');

How to scrape multiple divs?

Hello I've got a bunch of divs I'm trying to scrape the content values from and I've managed to successfully pull out one of the values, result! However I've hit a brick wall, I want to now pull out the one after it inside the current code I've done. Hit a brick wall here would appreciate any help.
Here is the bit of code i'm currently using.
foreach ($arr as &$value) {
$file = $DOCUMENT_ROOT. $value;
$doc = new DOMDocument();
$doc->loadHTMLFile($file);
$xpath = new DOMXpath($doc);
$elements = $xpath->query("//*[contains(#class, 'covGroupBoxContent')]//div[3]//div[2]");
if (!is_null($elements)) {
foreach ($elements as $element) {
$nodes = $element->childNodes;
foreach ($nodes as $node) {
$maps = $node->nodeValue;
echo $maps;
}
}
}
}
I simply want them all to have separate outputs that I can echo out.
I recommend you use Simple HTML DOM. Beyond that I need to see a sample of the HTML you are scraping.
If you are scraping a website outside your domain I'd recommend saving the source HTML to a file for review and testing. Some websites combat scraping, thus what you see in the browser is not what your scraper would see.
Also, I'd recommend setting a random user agent via ini_set(). If you need a function for this I have one.
<?php
$html = file_get_html($url);
IF ($html) {
$myfile = fopen("testing.html", "w") or die("Unable to open file!");
fwrite($myfile, $html);
fclose($myfile);
}
?>

parsing html document for anchor tag

say i have
» Download MP4 « - <b>144p (Video Only)</b> - <span> 19.1</span> MB<br />
html page like this i wanna parse it with simple dom php parser and i wanna get download mp4 114p 19.1 as out put while i tried this code
foreach($displaybody->find('a ') as $element) {
// echo $element->innertext . '<br/>';
it returned me download mp4 only how do i parse remaining values download mp4 114p 19.1 please help me out
You can't use the <a> tag anymore since some of the text you're trying to access isn't inside it anymore, target the document itself and then use ->plaintext:
$html = <<<EOT
» Download MP4 « - <b>144p (Video Only)</b> - <span> 19.1</span> MB<br />
EOT;
$displaybody = str_get_html($html);
echo $displaybody->plaintext;
Here is another way of accessing each row thru DOMDocument with xpath:
// load the sites html page in DOMDocument
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$html_page = file_get_contents('http://www.mohammediatechnologies.in/download/downloadtest.php?name=8KPEiGqDQHg');
$dom->loadHTML(mb_convert_encoding($html_page, 'HTML-ENTITIES', 'UTF-8'));
libxml_clear_errors();
$xpath = new DOMXpath($dom);
$data = array();
// target elements which is inside an anchor and a line break (treat them as each row)
$links = $xpath->query('//*[following-sibling::a and preceding-sibling::br]');
$temp = '';
foreach($links as $link) { // for each rows of the link
$temp .= $link->textContent . ' '; // get all text contents
if($link->tagName == 'br') {
$unit = $xpath->evaluate('string(./preceding-sibling::text()[1])', $link);
$data[] = $temp . $unit; // push them inside an array
$temp = '';
}
}
echo '<pre>';
print_r($data);
Sample Output

Need a help on PHP domDocument

I want to append some text to divs which has same class.
$dom = new DOMdocument();
$dom->formatOutput = true;
#$dom->loadHTMLFile('first.html');
$xpath = new DOMXPath($dom)
$after = new DOMText('Newly appended text');
$elements = $xpath->query('//div[#class="mix"]');
foreach($elements as $element)
{
$element->appendChild($after);
//echo $dom->saveHTML();
}
$dom->saveHTMLFile('first.html');
But when I open first.html, The appended text is only appeded to last div of above class.
If I uncomment saveHTML() then it shows perfect result. Just problem after saving.
You cannot append the same DOM node to multiple points in the tree, which is what you are doing here. You need to create a separate (but identical) node each time:
foreach($elements as $element)
{
$after = new DOMText('Newly appended text'); // moved this inside the loop
$element->appendChild($after);
}

PHP: regex search a pattern in a file and pick it up

I am really confused with regular expressions for PHP.
Anyway, I cant read the whole tutorial thing now because I have a bunch of files in html which I have to find links in there ASAP. I came up with the idea to automate it with a php code which it is the language I know.
so I think I can user this script :
$address = "file.txt";
$input = #file_get_contents($address) or die("Could not access file: $address");
$regexp = "??????????";
if(preg_match_all("/$regexp/siU", $input, $matches)) {
// $matches[2] = array of link addresses
// $matches[3] = array of link text - including HTML code
}
My problem is with $regexp
My required pattern is like this:
href="/content/r807215r37l86637/fulltext.pdf" title="Download PDF
I want to search and get the /content/r807215r37l86637/fulltext.pdf from above lines which I have many of them in the files.
any help?
==================
edit
title attributes are important for me and all of them which I want, are titled
title="Download PDF"
Once again regexp are bad for parsing html.
Save your sanity and use the built in DOM libraries.
$dom = new DOMDocument();
#$dom->loadHTML($html);
$x = new DOMXPath($dom);
$data = array();
foreach($x->query("//a[#title='Download PDF']") as $node)
{
$data[] = $node->getAttribute("href");
}
Edit
Updated code based on ircmaxell comment.
That's easier with phpQuery or QueryPath:
foreach (qp($html)->find("a") as $a) {
if ($a->attr("title") == "PDF") {
print $a->attr("href");
print $a->innerHTML();
}
}
With regexps it depends on some consistency of the source:
preg_match_all('#<a[^>]+href="([^>"]+)"[^>]+title="Download PDF"[^>]*>(.*?)</a>#sim', $input, $m);
Looking for a fixed title="..." attribute is doable, but more difficult as it depends on the position before the closing bracket.
try something like this. If it does not work, show some examples of links you want to parse.
<?php
$address = "file.txt";
$input = #file_get_contents($address) or die("Could not access file: $address");
$regexp = '#<a[^>]*href="([^"]*)"[^>]*title="Download PDF"#';
if(preg_match_all($regexp, $input, $matches, PREG_SET_ORDER)) {
foreach ($matches as $match) {
printf("Url: %s<br/>", $match[1]);
}
}
edit: updated so it searches for Download "PDF entries" only
The best way is to use DomXPath to do the search in one step:
$dom = new DomDocument();
$dom->loadHTML($html);
$xpath = new DomXPath($dom);
$links = array();
foreach($xpath->query('//a[contains(#title, "Download PDF")]') as $node) {
$links[] = $node->getAttribute("href");
}
Or even:
$links = array();
$query = '//a[contains(#title, "Download PDF")]/#href';
foreach($xpath->evaluate($query) as $attr) {
$links[] = $attr->value;
}
href="([^]+)" will get you all the links of that form.

Categories