HTML parser to a website chart [closed]

HTML parser to a website chart [closed] - php

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
So, there is this website where they have some statistics about that website, and it's updated daily. What I want to do is to scrap that website and retrieve those informations to my website and put them into a line chart. Something like these http://code.google.com/apis/ajax/playground/?type=visualization#line_chart
or these http://code.google.com/apis/ajax/playground/?type=visualization#image_line_chart
The only problem is that I don't know how to do it.

Get Permission
If you need to crawl a remote page, you should start by making sure you're not doing anything illegal. I'm in no way capable of counseling you about the legality of what you're looking to do, but I can say with assurance that some people don't like having their data scrapped - you really should check with the site administrators to see if they have any problem with this.
Got Permission? Good...
When you're cleared, consider using the DOMDocument class to create a representation of the DOM in a series of nested objects that you can interact with pretty trivially:
$doc = new DOMDocument();
$doc->loadHTML(file_get_contents("http://foo.com/data/statistics"));
If you're at all familiar with DOM-traversal and selection methods in JavaScript, the DOMDocument interface will seem pretty familiar. In order to get a particular element by ID, you'd use the appropriate method:
$statistics = $doc->getElementById("statistics");
And to get all TD elements within that element:
$cells = $statistics->getElementsByTagName("td");
Without going into too much detail, you could continue to use the methods provided to you by the DOMDocument class to traverse and select data. When you get to the actual nodes you're seeking, you could easily save their values into an array, and then output that array as a string with some JavaScript to show the Google graph.
Be Nice!
It would be wise to cache the results of this operation so that you don't hit the remote server every time the script runs. Save the output to a local file with a timestamp, and check that timestamp for expiration - when it's expired, remove it, and create a new cached result in its place.
What This Might Look Like
Here is a very basic implementation of what your solution may resemble when it is complete. Note that we only hit the remote server at most once per day.
// Silence errors due to malformed HTML
libxml_use_internal_errors(true);
// This function builds out data out, when necessary
function build_output () {
// Create a DOMDocument, grab debt-clock HTML
$doc = new DOMDocument();
$doc->loadHTML(file_get_contents("http://brillig.com/debt_clock/"));
// Find element representing total National Debt
$total = $doc->getElementsByTagName("img")->item(0);
// Grab value from the alt attribute
$total = $total->attributes->getNamedItem("alt")->nodeValue;
// Second paragraph has two more values of interest
$parag = $doc->getElementsByTagName("p")->item(1);
// Build out resulting array of data: Natl. Debt, Population, Ind. Debt
$data = Array(
"ntl" => str_replace(" ", "", $total),
"pop" => $parag->getElementsByTagName("b")->item(0)->nodeValue,
"pay" => $parag->getElementsByTagName("b")->item(1)->nodeValue
);
// Return a JSON string of this data
return json_encode($data);
}
// Most recent cache file (today)
$cache_name = date("Y-m-d");
if (!file_exists($cache_name)) {
// Today's cache doesn't exist, create it.
$output = build_output();
$message = "Fresh: " . $output;
file_put_contents($cache_name, $output);
} else {
// Today has already been cached, load it.
$message = "Cached: " . file_get_contents($cache_name);
}
// Show the user the output
echo $message;
When loading from cache, the output from the above script looks similar to this:
Cached: {
"ntl":"$16,293,644,693,599.87",
"pop":"313,929,808",
"pay":"$51,902.19"
}
You can load that JSON data into any service you like now to generate an image representing its data.

Related

Fetch data from site Page by Page & go through sub links

URL : http://www.sayuri.co.jp/used-cars
Example : http://www.sayuri.co.jp/used-cars/B37753-Toyota-Wish-japanese-used-cars
Hey guys , need some help with one of my personal projects , I've already wrote the code to fetch data from each single car url (example) and post on my site
Now i need to go through the main url : sayuri.co.jp/used-cars , and :
1) Make an array / list / nodes of all the urls for all the single cars in it , then run my internal code for each one to fetch data , then move on to the next one
I already have the code to save each url into a log file when completed (don't think it will be necessary if it goes link by link without starting from the top but will ensure no repetition.
2) When all links are done for the page , it should move to the next page and do the same thing until the end ( there are 5-6 pages max )
I've been stuck on this part since last night and would really appreciate any help . Thanks
My code to get data from the main url :
$content = file_get_contents('http://www.sayuri.co.jp/used-cars/');
// echo $content;
and
$dom = new DOMDocument;
$dom->loadHTML($content);
//echo $dom;

I'm guessing you already know this since you say you've gotten data from the car entries themselves, but a good point to start is by dissecting the page's DOM and seeing if there are any elements you can use to jump around quickly. Most browsers have page inspection tools to help with this.
In this case, <div id="content"> serves nicely. You'll note it contains a collection of tables with the required links and a <div> that contains the text telling us how many pages there are.
Disclaimer, but it's been years since I've done PHP and I have not tested this, so it is probably neither correct or optimal, but it should get you started. You'll need to tie the functions together (what's the fun in me doing it?) to achieve what you want, but these should grab the data required.
You'll be working with the DOM on each page, so a convenience to grab the DOMDocument:
function get_page_document($index) {
$content = file_get_contents("http://www.sayuri.co.jp/used-cars/page:{$index}");
$document = new DOMDocument;
$document->loadHTML($content);
return $document;
}
You need to know how many pages there are in total in order to iterate over them, so grab it:
function get_page_count($document) {
$content = $document->getElementById('content');
$count_div = $content->childNodes->item($content->childNodes->length - 4);
$count_text = $count_div->firstChild->textContent;
if (preg_match('/Page \d+ of (\d+)/', $count_text, $matches) === 1) {
return $matches[1];
}
return -1;
}
It's a bit ugly, but the links are available inside each <table> in the contents container. Rip 'em out and push them in an array. If you use the link itself as the key, there is no concern for duplicates as they'll just rewrite over the same key-value.
function get_page_links($document) {
$content = $document->getElementById('content');
$tables = $content->getElementsByTagName('table');
$links = array();
foreach ($tables as $table) {
if ($table->getAttribute('class') === 'itemlist-table') {
// table > tbody > tr > td > a
$link = $table->firstChild->firstChild->firstChild->firstChild->getAttribute('href');
// No duplicates because they just overwrite the same entry.
$links[$link] = "http://www.sayuri.co.jp{$link}";
}
}
return $links;
}
Perhaps also obvious, but these will break if this site changes their formatting. You'd be better off asking if they have a REST API or some such available for long term use, though I'm guessing you don't care as much if it's just a personal project for tinkering.
Hope it helps prod you in the right direction.

PHP and MySQL vote system OOP [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 9 years ago.
To begin with we're beginners to PHP, we're studying Multimedia Design and we have been assigned to make a website in plain HTML. Furthermore we also have to include some PHP (which must be object-oriented). Our idea is to call the URL from our Youtube videos in our database and each video should have a vote button attached.
We can easily call our videos to a specific page in a div box on our website. This is our video_class.php:
<?php class Video {
private $db;
public function insertVideo($videoId) {
$row = $this->db->query("SELECT url FROM video WHERE id = ".$videoId);
$ost = $this->db->loadRows($row);
echo '<iframe width="200" height="200" src="https://www.youtube.com/embed/' . $ost[0]['url'] . '" frameborder="0" allowfullscreen></iframe>';
}
public function setDatabaseConnection($db) {
$this->db = $db;
} } ?>
And the page we're loading it to:
<?php // Create database connection
// Load Database class file
require_once 'db_class.php';
//Creating new object instance from Database class
$db = new event();
// Run initiate function and provide credentials.
$db->initiate("localhost","root","","event");
$db->connect(); // Connect to MySQL database
// Load Video class file
require_once 'video_class.php';
$video = new Video;
$video->setDatabaseConnection($db);
$row=$db->query("SELECT url FROM video WHERE id = 1");
$ost=$db->loadRows($row);
//var_dump($ost);
$row1=$db->query("SELECT url FROM video WHERE id = 2");
$ost1=$db->loadRows($row1);
//var_dump($ost1);
$row2=$db->query("SELECT url FROM video WHERE id = 3");
$ost2=$db->loadRows($row2);
//var_dump($ost2); ?>
HTML:
<center><div class="video_clip">
<?php echo '<iframe width="200" height="200" src="https://www.youtube.com/embed/' . $ost[0]['url'] . '" frameborder="0" allowfullscreen></iframe>'; ?>
<img src="images/vote.png">
</div><!--video_clip end-->
But the real problem is next:
We have 3 videos you can vote on by clicking on the vote button, under each video. Each button must count the clicks and store it in our database. We have absolutely no clue how to make this possible. Our teacher told to link to a subpage (for example, "vote.php"). On that page we should use:
$_GET[id]
fetch id from $get
get current votes from video where id = 1/2/3
add+1
save votes in video where id=1
and finish with a redirect
Can someone help us? We have found a few possible solutions on the forums, but still no luck! Sorry for the long post and too much text :)
DATABASE STRUCTURE:
Table name:
users
Table comments: users
Column Type Null Default Comments MIME
id int(11) No
videoId int(11) No
Table name:
video
Table comments: video
Column Type Null Default Comments MIME
id int(11) No
url varchar(50) No

If you want to keep it simple, you might want to skip the part where the page doesn't reload. You can make a button do all sorts of javascript tricks (google jquery and ajax), but there's no need for this.
supposing your url is yourfile.php
Make a link called upvote beneath each video linking it to yourfile.php?voteid=xx
where xx is the id of the video
if you click the link, you get redirected to the same page, but now you have a get parameter
In your code, before you show the page, check if any votes are being cast
if(isset($_GET['voteid']){
//save vote!
}
Now you are on the same page, you retrieve the votes (one is higher then it was before), and you can just keep on going.

This is quite open-ended, so I'll add some clues to get you going.
Firstly, since your votes will affect the database, you should use post and not get (see here for more details)1. Once you've dealt with the operation, you can then do your redirect.
So, under your <iframe>, set up a <form> with a post method. In it, add three input tags each having a type of submit, and each having a different name attribute. For your action, you can aim it at a different page if you want to, but since it is simple I would point it at itself. Thus, use <?php echo $_SERVER[ 'PHP_SELF' ] ?> for the time being.
OK, so this will send the post data to the same page. Thus, in your page PHP, just after your database initialisation, catch the post op like so:
// Your existing code
$db->connect(); // Connect to MySQL database
// New code
if ($_POST) {
print_r($_POST);
exit();
// #todo Parse the result in your POST array
// #todo Save the result in the database
// #todo Redirect to self
}
// Load Video class file
require_once 'video_class.php';
What that will do is dump the post data on the screen, and then exit immediately. This is a good prototyping approach to see that you're on the right track.
Adding #todo notes is quite a good approach too - do these in order, and delete the comment when that piece is written and tested. Don't forget to add new comments explaining code, if appropriate.
1 If using the $_GET array is an essential component of the exercise, then you could use three post forms, each with their own button, and with the action containing a separate query string that will appear in the $_GET array. However I'd argue that's a bit convoluted, and probably not the best way to achieve this in practice.

update PHP variable on click of html class [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 9 years ago.
I need to run PHP, specifically PHP I can't do it any other language, on click of a link with the class .URL
Specifically the PHP I need to run is this:
$list[4]+=10;
and the link that I need it to run on click of looks like this:
Some site's URL
I have heard about jQuery's ajax() function and its derivatives. But how can I do those to update the value of a PHP variable on click on .URL ?

First things first, most of your question is not possible in the way you want it done. Specifically incrementing a variable in PHP such that you have $list[4] += 10. I say this because when this script is run this won't exist anymore, you'd have to load it in from where ever you happen to be storing data (assuming a DB).
So, a short example of what you're trying to achieve you'll need a couple of files.
index.php - This is where your code happens that renders the page with the links on it.
link_clicked.php - This is called when a link is clicked.
You'll add need this basic Javascript in your code (it uses jQuery because you mentioned it in your question). I've broken this snippet into many pieces which is not how you'd normally write or see jQuery written to explain what is going on.
$(function() {
// Select all elements on the page that have 'URL' class.
var urls = $(".URL");
// Tell the elements to perform this action when they are clicked.
urls.click(function() {
// Wrap the current element with jQuery.
var $this = $(this);
// Fetch the 'href' attribute of the current link
var url = $this.attr("href");
// Make an AJAX POST request to the URL '/link_clicked.php' and we're passing
// the href of the clicked link back.
$.post("/link_clicked.php", {url: url}, function(response) {
if (!response.success)
alert("Failed to log link click.");
});
});
});
Now, what should our PHP look like to handle this?
<?php
// Tell the requesting client we're responding with JSON
header("Content-Type: application/json");
// If the URL was not passed back then fail.
if (!isset($_REQUEST["url"]))
die('{"success": false}');
$url = $_REQUEST["url"];
// Assume $dbHost, $dbUser, $dbPass, and $dbDefault is defined
// elsewhere. And open an connection to a MySQL database using mysqli
$conn = new mysqli($dbHost, $dbUser, $dbPass, $dbDefault);
// Escape url for security
$url = conn->real_escape_string($url);
// Try to update the click count in the database, if this returns a
// falsy value then we assume the query failed.
if ($conn->query("UPDATE `link_clicks` SET `clicks` = `clicks` + 1 WHERE url = '$url';"))
echo '{"success": true}';
else
echo '{"success": false}';
// Close the connection.
$conn->close();
// end link_clicked.php
This example is simplistic in nature and uses some unrecommended methods for performing tasks. I'll leave finding how to go about doing this properly per your requirements up to you.

How to grab data on website? [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
So, often, I check my accounts for different numbers. For example, my affiliate accounts: I check for cash increase.
I want to program a script where it can login to all these websites and then grab the money value for me and display it on one page. How can I program this?

you should take a look into curl.
You should be able to generate a script that retrieve some webpage easily.
Also take a look into simplexml and dom, it would help you to extract information from (X)HTML files.
Also Zend_Http could be a good alternative to curl.
Cheers

Well, sort of a vague question... I'd suggest the following steps:
send the login credentials via POST
grab and parse the response
do this for all relevant accounts / sites you wanna check
if you face specific problems feel free to comment on this answer
EDIT: I'd agree to RageZ in his technical approach. curl would be the 'weapon of choice' for me too... ^^
hth
K

First of all, check if the services where you want to log in have APIs.
It's be much easier as that's a format specifically made for the purpose of getting the datas and exploiting them in an other application.
If there is an API, you can look at it's documentation to see how to retrieve and use the datas.
If there isn't any, you need to scrap the HTML pages.
You can start by taking a look at Curl : http://php.net/curl
The idea is to simulate your own visit of the website by sending the loggin post request and getting the given datas.
After retrieving the page's datas, you can parse them with tools like dom.
http://php.net/dom

Use TestPlan, it was designed as a web automation system and makes such tasks very simple.

I would really have a look into Snoopy if i were you, its more user friendly than curl to use in your PHP scripts. Here is some sample code.
<?php
/*
You need the snoopy.class.php from
http://snoopy.sourceforge.net/
*/
include("snoopy.class.php");
$snoopy = new Snoopy;
// need an proxy?:
//$snoopy->proxy_host = "my.proxy.host";
//$snoopy->proxy_port = "8080";
// set browser and referer:
$snoopy->agent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)";
$snoopy->referer = "http://www.jonasjohn.de/";
// set some cookies:
$snoopy->cookies["SessionID"] = '238472834723489';
$snoopy->cookies["favoriteColor"] = "blue";
// set an raw-header:
$snoopy->rawheaders["Pragma"] = "no-cache";
// set some internal variables:
$snoopy->maxredirs = 2;
$snoopy->offsiteok = false;
$snoopy->expandlinks = false;
// set username and password (optional)
//$snoopy->user = "joe";
//$snoopy->pass = "bloe";
// fetch the text of the website www.google.com:
if($snoopy->fetchtext("http://www.google.com")){
// other methods: fetch, fetchform, fetchlinks, submittext and submitlinks
// response code:
print "response code: ".$snoopy->response_code."<br/>\n";
// print the headers:
print "<b>Headers:</b><br/>";
while(list($key,$val) = each($snoopy->headers)){
print $key.": ".$val."<br/>\n";
}
print "<br/>\n";
// print the texts of the website:
print "<pre>".htmlspecialchars($snoopy->results)."</pre>\n";
}
else {
print "Snoopy: error while fetching document: ".$snoopy->error."\n";
}
?>

Use VietSpider Web Data Extractor.
VietSpider Web Data Extractor: Software crawls the data from the websites (Data Scraper), format to XML standard (Text, CDATA) then store in the relational database.Product supports the various of RDBMs such as Oracle, MySQL, SQL Server, H2, HSQL, Apache Derby, Postgres ...VietSpider Crawler supports Session (login, query by form input), multi downloading, JavaScript handling, Proxy (and multi proxy by auto scan the proxies from website),...
Download from http://binhgiang.sourceforge.net

Updating the XML file using PHP script

I'm making an interface-website to update a concert-list on a band-website.
The list is stored as an XML file an has this structure :
I already wrote a script that enables me to add a new gig to the list, this was relatively easy...
Now I want to write a script that enables me to edit a certain gig in the list.
Every Gig is Unique because of the first attribute : "id" .
I want to use this reference to edit the other attributes in that Node.
My PHP is very poor, so I hope someone could put me on the good foot here...
My PHP script :

Well i dunno what your XML structure looks like but:
<gig id="someid">
<venue></venue>
<day></day>
<month></month>
<year></year>
</gig>
$xml = new SimpleXmlElement('gig.xml',null, true);
$gig = $xml->xpath('//gig[#id="'.$_POST['id'].'"]');
$gig->venue = $_POST['venue'];
$gig->month = $_POST['month'];
// etc..
$xml->asXml('gig.xml)'; // save back to file
now if instead all these data points are attributes you can use $gig->attributes()->venue to access it.
There is no need for the loop really unless you are doing multiple updates with one post - you can get at any specific record via an XPAth query. SimpleXML is also a lot lighter and a lot easier to use for this type of thing than DOMDOcument - especially as you arent using the feature of DOMDocument.

You'll want to load the xml file in a domdocument with
<?
$xml = new DOMDocument();
$xml->load("xmlfile.xml");
//find the tags that you want to update
$tags = $xml->getElementsByTagName("GIG");
//find the tag with the id you want to update
foreach ($tags as $tag) {
if($tag->getAttribute("id") == $id) { //found the tag, now update the attribute
$tag->setAttribute("[attributeName]", "[attributeValue]");
}
}
//save the xml
$xml->save();
?>
code is untested, but it's a general idea

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.