Scrape Text With PHP & Display On Website - php

I am a complete beginner with PHP. I understand the concepts but am struggling to find a tutorial I understand. My goal is this:
Use the xpath addons for Firefox to select which piece of text I would like to scrape from a site
Format the scraped text properly
Display the text on a website
Example)
// Get the HTML Source Code
$url='http://steamcommunity.com/profiles/76561197967713768';
$source = file_get_contents($url);
// DOM document Creation
$doc = new DOMDocument;
$doc->loadHTML($source);
// DOM XPath Creation
$xpath = new DOMXPath($doc);
// Get all events
$username = $xpath->query('//html/body/div[3]/div[1]/div/div/div/div[3]/div[1]');
echo $username;
?>
In this example, I would like to scrape the username (which at the time of writing is mopar410).
Thank you for your help - I am so lost :( Right now I managed to use xpath with importXML in Google doc spreadsheets and that works, but I would like to be able to do this on my own site with PHP to learn how.
This is code I found online and edited the URL and the variable - as I am not aware of how to write this myself.

They have a public API.
Simply use http://steamcommunity.com/profiles/STEAM_ID/?xml=1
<?php
$profile = simplexml_load_file('http://steamcommunity.com/profiles/76561197967713768/?xml=1', 'SimpleXMLElement', LIBXML_NOCDATA);
echo (string)$profile->steamID;
Outputs: mopar410 (at time of writing)
This also provides other information such as mostPlayedGame, hoursPlayed, etc (look for the xml node names).

Related

Sending url parameters through file_get_contents returns nothig

I am trying to scrape a website in order to get latitude and longitude for counties in the us(there are 3306 thus why I am trying to do it through code and not manually)
I am using the code below
function GetLatitude($countyName,$stateShortName){
//Create DOM from url
$page = file_get_contents("https://www.mapdevelopers.com/geocode_tool.php?$countyName,$stateShortName");
$doc = new DOMDocument();
$doc->loadHTML($page);
$node = $doc->getElementById("display_lat");
var_dump($doc);
}
GetLatitude("Guilford County","NC");
This returns nothing but if I change the url to get without the parameters like "https://www.mapdevelopers.com/geocode_tool.php" then I can see that $doc now has some information in it but that is not useful because the value I need (latitude) is dependent upon the parameters passed into the url.
How do I solve this issue?
EDIT:
Based on the suggestion to encode the parameters I changed my code to this and now the document contains information but appears as though it is ignoring the parameters
<?
function GetLatitude($countyName,$stateShortName){
$countyName = urlencode($countyName);
$stateShortName = urlencode($stateShortName);
//Create DOM from url
$page = file_get_contents("https://www.mapdevelopers.com/geocode_tool.php?address=$countyName,$stateShortName");
$doc = new DOMDocument();
$doc->loadHTML($page);
$node = $doc->getElementById("display_lat");
var_dump($doc);
}
GetLatitude("Clarke County","AL");
?>
Your issue is that the latitude information etc isn't present on page load, and java script puts it there
You're going to have a hard time trying to run a webpage with JS and scraping it from PHP without something in the middle, maybe re-try this project with something like puppet or phantomjs so you can run your script against a real browser.
Searching the page there is a ajax request to https://www.mapdevelopers.com/data.php
Sending a POST or GET request will give you the response you are looking for

Taking a div from other website with PHP DOM

Yesterday, I tried to take a div from other website to my web.
I want PHP to read the information that gives the div and compare between the string I give and the string that the website gives to me.
Here is my code:
//Blah, blah, blah
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML(file_get_contents('http://www.habbo.'.$hotel.'/home/'.$habbo));
$xpath = new DomXpath($dom);
$motto = $xpath->query('//*[#class="profile-motto"]')->item(0)->textContent;
echo $motto;
if($code !== $motto){
$num_habbo = 2;
}
//Blah, blah, blah
A example of a page:
http://www.habbo.es/home/iEnriqueSP
The string I want to take is in "Mi perfil", between "Añadir amigo" and the avatar of the user.
When I try to show the string with echo $motto, PHP show nothing.
I don't know if cURL is necesary with PHP DOM but in my hosting's PHP Info cURL appears enable:
Thanks for your attention
There's a nice library for tasks like this.
PHP Simple HTML DOM Parser
It let's you parse html files and fetch both inner and outer text. The documentation should also be quite simple.

DOMDocument : access the next following tag in PHP

I have installed a JSON plugin and got the content of HTML page. Now I want to parse and find a particular table, which has only class, but no id. I parse it using the PHP class DOMDocument.I have the idea to access the tag before the table and after that somehow to access the next following tag(my table) using DOMDocument.
Example:
<a name="Telefonliste" id="Telefonliste"></a>
<table class="wikitable">
So, i get fist the <a> and after that I get <table>.
I have got all the tables using the following commands and especially getElementsByTagName(). After that I can access item(2) where my table is:
$dom = new DOMDocument();
//load html source
$html = $dom->loadHTML($myHtml);
//discard white space
$dom->preserveWhiteSpace = false;
//the table by its tag name
$table = $dom->getElementsByTagName('table');
$rows = $table->item(2)->getElementsByTagName('tr');
This way is ok, but I want to make it more general, because now I know that the table is located in item(2), but the location can be changed e.g if a new table is included in the HTML page before my table. My table will not be in item(2), but in item(3). So, I want it it to parse in a way that I can still reach this table without changing something in my code. Can I do it using DOMDocument as a DOM parser?
You can use DOMXPath, and make the expression as general as you need it.
For example:
$dom = new DOMDocument();
//discard white space
$dom->preserveWhiteSpace = false;
//load html source
$dom->loadHTML($myHtml);
$domxpath = new DOMXPath($dom);
$table = $domxpath->query('//table[#class="wikitable" and not(#id)][0]')->item(0);
$elementBeforeTable = $table->previousSibling;
$rows = $table->getElementsByTagName('tr');
I've started writing a simple extension of this for the purpose of web scraping. I'm not 100% on the direction I want to take with it yet, but you can see an example of how to get the original HTML back in the response of the search rather than just raw text.
https://github.com/WolfeDev/PageScraper
EDIT: I plan on implementing basic table parsing soon.

how to get page contents

I'm trying to make a recent news like functionality for my site. For this i've made a web crawler and have being able to collect links from a page up till now by doing the following
$dom = new domDocument;
#$dom->loadHTML(file_get_contents($url));
$dom->preserveWhiteSpaces = false;
$linksToStore = $dom->getElementsByTagName('a');
foreach($linksToStore as $tag){
$links[$tag->getAttribute('href')]= $tag->childNodes->item(0)->nodeValue;
}
how can i get contents from the pages pointed by those links related to a particular domain which in my case is 'Medical'??
Use this http://simplehtmldom.sourceforge.net/ library to extract contents from the page. The selector works same as of jQuery, which makes it very familier and efficient to extract the contents.
Also, check this http://davidwalsh.name/php-notifications to know more

Read XML file and write to a php array/file

I need to update the country list of my website and I want to automate the process. Country list can be found here
http://www.iso.org/iso/country_codes...code_lists.htm // Edit : Can't find the good link...
I tried it this way –
http://www.w3schools.com/php/php_xml_parser_expat.asp (PHP XML Expat Parser)
However, this didn't seem to work well as I was confused where to actually 'get' the data and print it to my own array for later use.
Now I want to try it using XML DOM.
Just want to check with everyone, if I had a simple XML file to read, that contained a country code and country name as follows:
<Entry>
<Country_name>AFGHANISTAN</Country_name>
<Code_element>AF</Code_element>
</Entry>
I want to read this file (DOM method), and then feed the data into a separate file/array of mine that will be accessed by my website. What PHP xml functions would YOU use/recommend to do this simple task?
Any help in this regards is appreciated.
Use SimpleXML
how about
$dom = new DOMDOcument();
$dom->loadXML($xml);
$xpath = new DOMXpath($dom);
$res = $xpath->query("/CODE");
$allres = array();
foreach($res as $node){
$result = array();
$result['country'] = ($node->getElementsByTagName("Country_name")->item(0)->nodeValue);
$result['code'] = ($node->getElementsByTagName("Code_element")->item(0)->nodeValue);
$allres[] = $res
}
in the end $allres array would contain all your country codes and names

Categories