I am trying to scrap http://spys.one/free-proxy-list/but here i just want get Proxy by ip:port column only
i checked the website there was 3 table
Anyone can help me out?
<?php
require "scrapper/simple_html_dom.php";
$html=file_get_html("http://spys.one/free-proxy-list/");
$html=new simple_html_dom($html);
$rows = array();
$table = $html->find('table',3);
var_dump($table);
Try the below script. It should fetch you only the required items and nothing else:
<?php
include 'simple_html_dom.php';
$url = "http://spys.one/free-proxy-list/";
$html = file_get_html($url);
foreach($html->find("table[width='65%'] tr[onmouseover]") as $file) {
$data = $file->find('td', 0)->plaintext;
echo $data . "<br/>";
}
?>
Output it produces like:
176.94.2.84
178.150.141.93
124.16.84.208
196.53.99.7
31.146.161.238
I really don 't know, what your simple html dom library does. Anyway. Nowadays PHP has all aboard what you need for parsing specific dom elements. Just use PHPs own DOMXPath class for querying dom elements.
Here 's a short example for getting the first column of a table.
$dom = new \DOMDocument();
$dom->loadHTML('https://your.url.goes.here');
$xpath = new \DomXPath($dom);
// query the first column with class "value" of the table with class "attributes"
$elements = $xpath->query('(/table[#class="attributes"]//td[#class="value"])[1]');
// iterate through all found td elements
foreach ($elements as $element) {
echo $element->nodeValue;
}
This is a possible example. It does not solve exactly your issue with http://spys.one/free-proxy-list/. But it shows you how you could easily get the first column of a specific table. The only thing you have to do now is finding the right query in the dom of the given site for the table you want to query. Because the dom of the given site is a pretty complex table layout from ages ago and the table you want to parse does not have a unique id or something else, you have to find out.
Related
There is this website
http://www.oxybet.com/france-vs-iceland/e/5209778/
What I want is to scrape not the full table but PARTS of this table.
For example to only display rows that include sportingbet stoiximan and mybet and I don't need all columns only 1 x 2 columns, also the numbers that are with red must be scraped as is with the red box or just display an asterisk next to them in the scrape can this be done or do I need to scrape the whole table on a database first then query the database?
What I got now is this code I borrowed from another similar question on this forum which is:
<?php
require('simple_html_dom.php');
$html = file_get_html('http://www.oxybet.com/france-vs-iceland/e/5209778/');
$table = $html->find('table', 0);
$rowData = array();
foreach($table->find('tr') as $row) {
// initialize array to store the cell data from each row
$flight = array();
foreach($row->find('td') as $cell) {
// push the cell's text to the array
$flight[] = $cell->plaintext;
}
$rowData[] = $flight;
}
echo '<table>';
foreach ($rowData as $row => $tr) {
echo '<tr>';
foreach ($tr as $td)
echo '<td>' . $td .'</td>';
echo '</tr>';
}
echo '</table>';
?>
which returns the full table. What I want mainly is somehow to detect the numbers selected in the red box (in 1 x 2 areas) and display an asterisk next to them in my scrape, secondly I want to know if its possible to scrape specific columns and rows and not everything do i need to use xpath?
I beg for someone to point me in the right direction I spent hours on this, the manual doesn't explain much http://simplehtmldom.sourceforge.net/manual.htm
Link is dead. However, you can do this with xPath and reference the cells that you want by their colour and order, and many more ways too.
This snippet will give you the general gist; taken from a project I'm working on atm:
function __construct($URL)
{
// make new DOM for nodes
$this->dom = new DOMDocument();
// set error level
libxml_use_internal_errors(true);
// Grab and set HTML Source
$this->HTMLSource = file_get_contents($URL);
// Load HTML into the dom
$this->dom->loadHTML($this->HTMLSource);
// Make xPath queryable
$this->xpath = new DOMXPath($this->dom);
}
function xPathQuery($query){
return $this->xpath->query($query);
}
Then simply pass a query to your DOMXPath, like //tr[1]
I'm trying to scrap data from one websites. I stuck on ratings.
They have something like this:
<div class="rating-static rating-10 margin-top-none margin-bottom-sm"></div>
<div class="rating-static rating-13 margin-top-none margin-bottom-sm"></div>
<div class="rating-static rating-46 margin-top-none margin-bottom-sm"></div>
Where rating-10 is actually one star, rating-13 two stars in my case, rating-46 will be five stars in my script.
Rating range can be from 0-50.
My plan is to create switch and if I get class range from 1-10 I will know how that is one star, from 11-20 two stars and so on.
Any idea, any help will be appreciated.
Try this
<?php
$data = '<div class="rating-static rating-10 margin-top-none margin-bottom-sm"></div>';
$dom = new DOMDocument;
$dom->loadHTML($data);
$xpath = new DomXpath($dom);
$div = $dom->getElementsByTagName('div')[0];
$div_style = $div->getAttribute('class');
$final_data = explode(" ",$div_style);
echo $final_data[1];
?>
this will give you expected output.
I had an similiar project, this should be the way to do it if you want to parse the whole HTML site
$dom = new DOMDocument();
$dom->loadHTML($html); // The HTML Source of the website
foreach ($dom->getElementsByTagName('div') as $node){
if($node->getAttribute("class") == "rating-static"){
$array = explode(" ", $node->getAttribute("class"));
$ratingArray = explode("-", $array[1]); // $array[1] is rating-10
//$ratingArray[1] would be 10
// do whatever you like with the information
}
}
It could be that you must change the if part to an strpos check, I haven't tested this script, but I think that getAttribute("class") returns all classes. This would be the if statement then
if(strpos($node->getAttribute("class"), "rating-static") !== false)
FYI try using Querypath for future parsing needs. Its just a wrapper around PHP DOM parser and works really really well.
There is a table on the website Goal.com that I have attached to this question. I want to know how to store the strings in the column Player Name into a variable or database somehow.
The reason for this is because I have a variable in my code called $player. This variable stores a different string every 24 hours and is printed onto my site. This is done by using a custom made function.
I want to code that if '$player' is equal to any string in the column 'Player Name' from goal.com, to re-run the function so a different string is stored in variable and printed on my website.
TABLE : http://www.goal.com/en/scores/transfer-zone?ICID=TZ_DD1_VA
PHP Simple HTML DOM Parser can do the job for you. http://simplehtmldom.sourceforge.net/
Download simple_html_dom.php here; http://sourceforge.net/projects/simplehtmldom/files/simple_html_dom.php/download
Here is a full example.
<?php
include("simple_html_dom.php");
libxml_use_internal_errors(true);
$doc = new DOMDocument();
$doc->loadHTMLFile("http://www.goal.com/en/scores/transfer-zone?ICID=TZ_DD1_VA");
$xpath = new DOMXPath($doc);
$player_names = $xpath->query("//td[#class='player_name_col']");
foreach ($player_names as $player_name) {
echo $player_name->nodeValue . "<br />";
}
?>
I tried all sorts of things but couldn't find a solution.
I want to retrieve elements from html code using xpath in php.
Ex:
<div class='student'>
<div class='name'>Michael</div>
<div class='age'>26</div>
</div>
<div class='student'>
<div class='name'>Joseph</div>
<div class='age'>27</div>
</div>
I want to retrieve the information and put them in an array as follows:
$student[0][name] = Michael;
$student[0][age] = 26;
$student[1][name] = Joseph;
$student[1][age] = 27;`
In other words i want the matching ages to stay with the names.
I tried the following:
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpathDom = new DomXPath($dom);
$homepostcontentNodes = $xpathDom->query("//*[contains(#class, 'student')]//*[contains(#class, 'name')]");`
However, this is only grabbing me the nodes 'names'
How can i get the matching age nodes?
Of course it is only grabbing the nodes name - you are telling it to!
What you will need to do is in two steps:
Pick out all the student nodes
For each student node, pick out the columns
This is a pretty standard step in linearization of data, and the XPath queries are simple:
Step 1
You pretty much have it:
$studentNodes = $xpathDom->query("//div[contains(#class, 'student')]");
This will return all your student nodes.
Step 2
This is where the magic happens. We have our nodes, we can loop through them (DOMNodeList implements Iterator, so we can foreach-loop through them). What we need to figure out is how to find its children...
...Oh wait. DOMNode implements a method called getNodePath which returns the full, direct XPath path to the node. This allows us to then simply append /div to get all the div direct descendents to the node!
Another quick foreach, and we get this code:
$studentNodes = $xpathDom->query("//div[contains(#class, 'student')]");
$result = array();
foreach ($studentNodes as $v) {
// Child nodes: student
$r = array();
$columns = $xpathDom->query($v->getNodePath()."/div");
foreach ($columns as $v2) {
// Attributes allows me to get the 'class' property of the node. Bit clunky, but there's no alternative
$r[$v2->attributes->getNamedItem("class")->textContent] = $v2->textContent;
}
$result[] = $r;
}
var_dump($result);
Full fiddle: http://codepad.viper-7.com/t868Wh
I've been recently playing with DOMXpath in PHP and had success with it, trying to get more experience with it I've been playing grabbing certain elements of different sites. I am having trouble getting the weather marker off of http://www.theweathernetwork.com/weather/cape0005 this website.
Specifically I want
//*[#id='theTemperature']
Here is what I have
$url = file_get_contents('http://www.theweathernetwork.com/weather/cape0005');
$dom = new DOMDocument();
#$dom->loadHTML($url);
$xpath = new DOMXPath($dom);
$tags = $xpath->query("//*[#id='theTemperature']");
foreach ($tags as $tag){
echo $tag->nodeValue;
}
Is there something I am doing wrong here? I am able to produce actual results on other tags on the page but specifically not this one.
Thanks in advance.
You might want to improve your DOMDocument debugging skills, here some hints (Demo):
<?php
header('Content-Type: text/plain;');
$url = file_get_contents('http://www.theweathernetwork.com/weather/cape0005');
$dom = new DOMDocument();
#$dom->loadHTML($url);
$xpath = new DOMXPath($dom);
$tags = $xpath->query("//*[#id='theTemperature']");
foreach ($tags as $i => $tag){
echo $i, ': ', var_dump($tag->nodeValue), ' HTML: ', $dom->saveHTML($tag), "\n";
}
Output the number of the found node, I do it here with $i in the foreach.
var_dump the ->nodeValue, it helps to show what exactly it is.
Output the HTML by making use of the saveHTML function which shows a better picture.
The actual output:
0: string(0) ""
HTML: <p id="theTemperature"></p>
You can easily spot that the element is empty, so the temperature must go in from somewhere else, e.g. via javascript. Check the Network tools of your browser.
what happens is straightforward, the page contains an empty id="theTemperature" element which is a placeholder to be populated with javascript. file_get_contents() will just download the page, not executing javascript, so the element remains empty. Try to load the page in the browser with javascript disabled to see it yourself
The element you're trying to select is indeed empty. The page loads the temperature into that id through ajax. Specifically this script:
http://www.theweathernetwork.com/common/js/master/citypage_ajax.js?cb=201301231338
but when you do a file_get_contents those scripts obviously don't get resolved. I'd go with guido's solution of using the RSS