I'm trying to make a recent news like functionality for my site. For this i've made a web crawler and have being able to collect links from a page up till now by doing the following
$dom = new domDocument;
#$dom->loadHTML(file_get_contents($url));
$dom->preserveWhiteSpaces = false;
$linksToStore = $dom->getElementsByTagName('a');
foreach($linksToStore as $tag){
$links[$tag->getAttribute('href')]= $tag->childNodes->item(0)->nodeValue;
}
how can i get contents from the pages pointed by those links related to a particular domain which in my case is 'Medical'??
Use this http://simplehtmldom.sourceforge.net/ library to extract contents from the page. The selector works same as of jQuery, which makes it very familier and efficient to extract the contents.
Also, check this http://davidwalsh.name/php-notifications to know more
Related
I am trying to scrape a website in order to get latitude and longitude for counties in the us(there are 3306 thus why I am trying to do it through code and not manually)
I am using the code below
function GetLatitude($countyName,$stateShortName){
//Create DOM from url
$page = file_get_contents("https://www.mapdevelopers.com/geocode_tool.php?$countyName,$stateShortName");
$doc = new DOMDocument();
$doc->loadHTML($page);
$node = $doc->getElementById("display_lat");
var_dump($doc);
}
GetLatitude("Guilford County","NC");
This returns nothing but if I change the url to get without the parameters like "https://www.mapdevelopers.com/geocode_tool.php" then I can see that $doc now has some information in it but that is not useful because the value I need (latitude) is dependent upon the parameters passed into the url.
How do I solve this issue?
EDIT:
Based on the suggestion to encode the parameters I changed my code to this and now the document contains information but appears as though it is ignoring the parameters
<?
function GetLatitude($countyName,$stateShortName){
$countyName = urlencode($countyName);
$stateShortName = urlencode($stateShortName);
//Create DOM from url
$page = file_get_contents("https://www.mapdevelopers.com/geocode_tool.php?address=$countyName,$stateShortName");
$doc = new DOMDocument();
$doc->loadHTML($page);
$node = $doc->getElementById("display_lat");
var_dump($doc);
}
GetLatitude("Clarke County","AL");
?>
Your issue is that the latitude information etc isn't present on page load, and java script puts it there
You're going to have a hard time trying to run a webpage with JS and scraping it from PHP without something in the middle, maybe re-try this project with something like puppet or phantomjs so you can run your script against a real browser.
Searching the page there is a ajax request to https://www.mapdevelopers.com/data.php
Sending a POST or GET request will give you the response you are looking for
I'm using laravel to create a simple social network. Users can type # in the post area to get a list of their friends to mention them. Every mention in a link like this (using zurb/tribute from github)
<a type="mention" href="/user/Jordan" title="Jordan">Jordan</a>
Normal links other than mentions won't have type='mention'
Now when I get the post and insert it into the database I need to get a list of users mentioned in the post. I'm looking for links which have the type ='mention' and if there's any I want to get the title of everyone to insert into the notification system. What PHP code do I need to add in this if statement?
if(stristr(request('post'),' type="mention" ')){
}
Aside from using an AST (Abstract Syntax Tree), your best bet would be to either use DOM on PHP Side, e.g.:
$string = '<a type="mention" href="/user/Jordan" title="Jordan">Jordan</a>';
$doc = new \DOMDocument();
$doc->loadHTML($string);
$xpath = new \DOMXPath($doc);
foreach ($xpath->query("//a[#type='mention']") as $a) {
$href = $a->getAttribute('href');
$title = $a->getAttribute('title');
echo sprintf("Found mention of %s with href of %s\n", $title, $href);
}
However, I probably wouldn't be sending the A node back to the server. You should consider working out some way to make it a display-only feature implemented on the browser side, and simply send the "#jordan" string back to the server.
I am a complete beginner with PHP. I understand the concepts but am struggling to find a tutorial I understand. My goal is this:
Use the xpath addons for Firefox to select which piece of text I would like to scrape from a site
Format the scraped text properly
Display the text on a website
Example)
// Get the HTML Source Code
$url='http://steamcommunity.com/profiles/76561197967713768';
$source = file_get_contents($url);
// DOM document Creation
$doc = new DOMDocument;
$doc->loadHTML($source);
// DOM XPath Creation
$xpath = new DOMXPath($doc);
// Get all events
$username = $xpath->query('//html/body/div[3]/div[1]/div/div/div/div[3]/div[1]');
echo $username;
?>
In this example, I would like to scrape the username (which at the time of writing is mopar410).
Thank you for your help - I am so lost :( Right now I managed to use xpath with importXML in Google doc spreadsheets and that works, but I would like to be able to do this on my own site with PHP to learn how.
This is code I found online and edited the URL and the variable - as I am not aware of how to write this myself.
They have a public API.
Simply use http://steamcommunity.com/profiles/STEAM_ID/?xml=1
<?php
$profile = simplexml_load_file('http://steamcommunity.com/profiles/76561197967713768/?xml=1', 'SimpleXMLElement', LIBXML_NOCDATA);
echo (string)$profile->steamID;
Outputs: mopar410 (at time of writing)
This also provides other information such as mostPlayedGame, hoursPlayed, etc (look for the xml node names).
Yesterday, I tried to take a div from other website to my web.
I want PHP to read the information that gives the div and compare between the string I give and the string that the website gives to me.
Here is my code:
//Blah, blah, blah
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML(file_get_contents('http://www.habbo.'.$hotel.'/home/'.$habbo));
$xpath = new DomXpath($dom);
$motto = $xpath->query('//*[#class="profile-motto"]')->item(0)->textContent;
echo $motto;
if($code !== $motto){
$num_habbo = 2;
}
//Blah, blah, blah
A example of a page:
http://www.habbo.es/home/iEnriqueSP
The string I want to take is in "Mi perfil", between "AƱadir amigo" and the avatar of the user.
When I try to show the string with echo $motto, PHP show nothing.
I don't know if cURL is necesary with PHP DOM but in my hosting's PHP Info cURL appears enable:
Thanks for your attention
There's a nice library for tasks like this.
PHP Simple HTML DOM Parser
It let's you parse html files and fetch both inner and outer text. The documentation should also be quite simple.
What I want to accomplish might be a little hardcore, but I want to know if it's possible:
The question:
My question is the same as PHP-Retrieve content from page, but I want to use it on multiple pages.
The situation:
I'm using a website about TV shows. All the TV shows have the same URL and then the name of the show:
http://bierdopje.com/shows/NAME_OF_SHOW
On every show page, there's a line which tells you if the show is cancelled or still running. I want to retrieve that line to make an overview of the cancelled shows (the website only supports an overview of running shows, so I want to make an extra functionality).
The real question:
How can I tell DOM to retrieve all the shows and check for the status of the show?
(http://bierdopje.com/shows/*).
The Note:
I understand that this process may take a while because it is reading the whole website (or is it too much data?).
use this code to fetch only the links from the single website.
include_once('simple_html_dom.php');
$html = file_get_html('http://www.couponrani.com/');
// Find all links
foreach($html->find('a') as $element)
echo $element->href . '<br>';
I use phpquery to fetch data from a web page, like jQuery in Dom.
For example, to get the list of all shows, you can do this :
<?php
require_once 'phpQuery/phpQuery/phpQuery.php';
$doc = phpQuery::newDocumentHTML(
file_get_contents('http://www.bierdopje.com/shows')
);
foreach (pq('.listing a') as $key => $a) {
$url = pq($a)->attr('href'); // will give "/shows/07-ghost"
$show = pq($a)->text(); // will give "07 Ghost"
}
Now you can process all shows individualy, make a new phpQuery::newDocumentHTML for each show and with an selector extract the information you need.
Get the status of a show
$html = file_get_contents('http://www.bierdopje.com/shows/alcatraz');
$doc = phpQuery::newDocumentHTML($html);
$status = pq('.content>span:nth-child(6)')->text();