How to find specific query in the URL and display whole link - php

Current code is like this :
include 'simple_html_dom.php';
// Create DOM from URL or file
$html = file_get_html('http://www.AnyLinkAlsoCan.com');
// Find all links
foreach($html->find('a') as $element)
echo $element->href . '<br>';
It will crawl and find tag like this :
<a href="http://news.example.com/node">
And will output all link it found on the website.
Example
http://news.example.com.my/node/321072
http://news.example.com.my/taxonomy/term/2
http://news.example.com.my/node/321060?tid=2
I want to search url that contain only ?tid= as you can see on the 3rd URL in the example.
http://news.example.com.my/node/321060?tid=2
I replace echo $element->href="*?tid but that just return error. Can someone help me with this?

You can use preg_match or you can check all urls taken if they contain ?tid
<?php
include 'simple_html_dom.php';
// Create DOM from URL or file
$html = file_get_html('http://www.AnyLinkAlsoCan.com');
// Find all links
foreach($html->find('a') as $element) {
$search = '?tid';
if(strpos($element->href,$search)) {
echo $element->href . '<br>';
}
}
?>

Use parse_url() to parse each url and then only select ones you want based on PHP_URL_QUERY

Related

Getting content from external div

I need to get content from external page.
For example:
Let's use this site: https://en.wikipedia.org/wiki/Main_Page I need to get
only content of "On this day..." so it means div with id="mp-otd"
How can I do that with PHP?
You can do this by suing PHP DOM parser
include_once('simple_html_dom.php');
$html = file_get_html('https://en.wikipedia.org/wiki/Main_Page');
$div_content = $html->find('div[id=mp-otd]', 0);
Need to download library from http://simplehtmldom.sourceforge.net/
for example
// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');
// Find specific
foreach($html->find('div #mp-otd') as $element)
echo $element->innertext . '<br>';

Issue with php simple html DOM parser in Joomla

I try to isert a stock-chart-module from an orther site into my own website.
As i use:
jimport('simplehtmldom.simple_html_dom');
// get DOM from URL or file
$html = file_get_html('http://www.raiffeisen.com/');
foreach($html->find('div#agrarfenster') as $element)
echo $element->innertext;
The Output will work. But i need this Code for the required output:
jimport('simplehtmldom.simple_html_dom');
// get DOM from URL or file
$html = file_get_html('http://www.raiffeisen.com/');
foreach($html->find('div#boersenfenster_bf_4562') as $element)
echo $element->innertext;
This Code would'nt work. But why?
My guess is that there are those underscores in the "boersenfenster_bf_4562".
Can somebody help me?

parsing html page using php to find out text on which link is assiged

say i have html code like this
$html = "This is some stuff right here. OH MY GOSH";
i am trying to get values of href and also on which anchor work i mean check this out text i am able to get href value by following this code
$displaybody->find('a ') as $element;
echo $element;
well it works for me but how do i get value of check this out could you guys help me out. i did search but i am not able to find it out . thanks in advance
my actual html look like this
» Download MP4 « - <b>144p (Video Only)</b> - <span> 19.1</span> MB<br />
my href look like this above code return download mp4 and i want it like downloadmp4 114p (video only) 19.1 mb how do i do that
If what you are using now is the SimpleHTMLDOM, then ->innertext works fine on that anchor elements that you have found:
include 'simple_html_dom.php';
$html = "This is some stuff right here. OH MY GOSH";
$displaybody = str_get_html($html);
foreach($displaybody->find('a ') as $element) {
echo $element->innertext . '<br/>';
}
If you were referring to PHP's DOMDocument, then its not find() function you need to use, to target each anchor element, you need to use ->getElementsByTagName(), then each selected elements you need to use ->nodeValue:
$html = "This is some stuff right here. OH MY GOSH";
$dom = new DOMDocument();
$dom->loadHTML($html);
foreach($dom->getElementsByTagName('a') as $element) {
echo $element->nodeValue . '<br/>';
}

PHP extract specific DIV from target URL

I'm using Simple HTML DOM to try and extract a div and all of it's contents from a target URL, here is my code:
<?php
require 'simple_html_dom.php';
$html = file_get_html('http://mozilla.org');
foreach($html->find('.accordion') as $element)
echo $element . '<br>';
?>
The problem I have is that the above code only extracts the plain text of the div. There are also images in the div that I need to extract. If I use this following code, then all images are extracted but so is everything else in the page.
<?php
require 'simple_html_dom.php';
$html = file_get_html('http://mozilla.org');
echo $html;
?>
So my question is, how can I use the first bit of code to extract the contents + images from .accordion?
Thanks
You could always try;
$imgs = array();
foreach($html->find('.accordion',0)->find('img') as $img){
$imgs[] = $img->src;
}
print_r($imgs);
This should populate the $imgs variable with all of the image links from the .accordion div.
:)

how to crawl and download all pdf files from html link?

This is my code to crawl all pdf links but it doesn't work. How to download from those links and save to a folder on my computer?
<?php
set_time_limit(0);
include 'simple_html_dom.php';
$url = 'http://example.com';
$html = file_get_html($url) or die ('invalid url');
//extrack pdf links
foreach($html->find('a[href=[^"]*\.pdf]') as $element)
echo $element->href.'<br>';
?>
foreach($htnl->find('a[href=[^"]*\.pdf]') as element)
^---typo. should be an 'm' ^---typo. need a $ here
How does your code "not work", other than because of above typo?
Have you looked into into phpquery?
http://code.google.com/p/phpquery/
More simple solution here will be:
foreach ($html->find('a[href$=pdf]') as $element)
https://simplehtmldom.sourceforge.io/manual.htm
[attribute$=value] Matches elements that have the specified attribute
and it ends with a certain value.

Categories