I'm trying to capture the content of a div from an html page with this code:
$content = file_get_contents('http://player.rockfm.fm/');
$content = preg_replace("/\r\n+|\r+|\n+|\t+/i", " ", $content);
preg_match('/<div id=\"metadata_player\">(.*?)<\/div>/', $content , $matchs);
print_r($matchs);
The result is empty, because that code is generated by javascript or ajax.
Is there any other way than using https://github.com/neorai/php-webdriver?
Solution:
$result = file_get_contents("http://bo.cope.webtv.flumotion.com/api/active?format=json&podId=78");
$array_full=(json_decode($result, true));
$symbols = array('"','}','{');
$array_full['value'] = str_replace($symbols, "", $array_full['value']);
$array_author_title= explode(",", $array_full['value']);
$array_author = explode(":", $array_author_title[1]);
$array_title = explode(":", $array_author_title[2]);
echo "Author: ".$array_author[1];
echo "</br>Title: ".$array_title[1];
thanks to: #urban and How to use cURL to get jSON data and decode the data?
This page is loading weirdly (Seems like it is firing 3 loadFinished events! Anyhow, the following code works:
// "Normal" JS
function waitForMetadata() {
// Initialize global meta
var meta = page.evaluate(function() {
return document.getElementById("metadata_player")
});
var txt = meta.innerHTML;
console.log("meta: '" + meta.outerHTML + "'")
if (txt != "") {
phantom.exit(0);
} else {
setTimeout(waitForMetadata, 1000);
}
}
// PhantomJS
var page = require('webpage').create();
page.open('http://player.rockfm.fm/')
page.onLoadFinished = function(status) {
console.log("Status: " + status);
if(status !== "success") {
console.log("FAIL!")
phantom.exit(1);
}
waitForMetadata();
};
The first part is a function that checks the contents of the div and if it is empty it schedules itself, else prints and exits. The second part is straight out of phantomJS tutorial: declares a page, registers an onLoad function and loads it.
Example output:
urban#kde-2:/tmp$ phantomjs ./test.js
Status: success
meta: '<div id="metadata_player"></div>'
Status: success
meta: '<div id="metadata_player"></div>'
meta: '<div id="metadata_player"></div>'
meta: '<div id="metadata_player"></div>'
meta: '<div id="metadata_player"></div>'
meta: '<div id="metadata_player"></div>'
meta: '<div id="metadata_player"></div>'
meta: '<div id="metadata_player"></div>'
meta: '<div id="metadata_player">GUNS N' ROSES<br><span id="artist">KNOCKIN' ON HEAVEN'S DOOR</span></div>'
NOTE: Once the content is loaded, with JS you can do whatever you like (instead of printing). Also, I think you want to use the span id=artist later on...
UPDATE 1:
This made me stubborn... I could not make it with with phantomjs however, I inspected the ajax call this page makes and it seems that you can get the currently playing song with:
$ curl 'http://bo.cope.webtv.flumotion.com/api/active?format=json&podId=78'
{"id": null, "uuid": "DFLT", "value": "{\"image\": \"\", \"author\": \"AEROSMITH\", \"title\": \"AMAZING\"}"}
This means you can do use any language you like and json_decodetwice: (1) for the outer map having id, uuid and value and (2) decode the value. My only concern would be if podId changes... but is seems static.
Hope it helps
Related
I'm trying to figure out how to update my live data, I found some examples on Google of Ajax but I can't seem to get them to work.
The part that contains and places the live data in a paragraph is :
$file = "Data.txt";
$data = file($file);
$line = $data[count($data)-1];
for($i=1;$i<6;$i++){
switch ($line) {
case $i:
echo "<p class ='bus".$i."'> <img id='bus' src = 'bus.png'> </p>";
break;
}
}
This is the full html file
<!DOCTYPE>
<html>
<head>
<title>Bus</title>
<link rel="stylesheet" href="stijlenbestand.css">
</head>
<body>
<?php
//aanmaken 5 bushaltes
echo '<figure>';
for($i=1;$i<6;$i++){
echo "<img src = 'bushalte.png'>";
}
echo '</figure>';
//laatste lijn van tekstbestand.
$file = "Data.txt";
$data = file($file);
$line = $data[count($data)-1];
for($i=1;$i<6;$i++){
switch ($line) {
case $i:
echo "<p class ='bus".$i."'> <img id='bus' src = 'bus.png'> </p>";
break;
}
}
?>
</body>
</html>
For a live update you need two parts.
First is the part where your page is and the second part is where your data comes from.
Php is a very static language. Once your script is finished it won't do anything anymore.
For a "live-website" you need Javascript.
if you want to use jQuery i would recommend you to use the jQuery.post() function.
jQuery Code in your Website:
$.post( "test.php", { name: "John", time: "2pm" })
.done(function( data ) {
alert( "Data Loaded: " + data );
});
Your test.php
if(isset($_POST['name'])) {
//Do Some Stuff
$a = 'var a';
echo json_encode($a);
}
This is not ajax. Ajax means having frontend code fetching new information on the background. This information then gets appended to the DOM. (Usually the information is transfered as JSON encoded data but lets keep that out of scope.)
For this you need two files:
A frontend file (for instance static index.html with some content)
A backend file providing the data
The frontend file would run some JavaScript then requests the backend file
The backend file responds and returns some output
The javascript adds output to the DOM.
There are many ways of doing this and I don't have the te time to explain all of it here but you might want to have a look at: http://www.w3schools.com/jquery/jquery_ajax_intro.asp
This provides a simple example based on jQuery.
I'm pulling data from a mysql db using php and echoing a json_encoded array.
Using ajax I pull in the results and set the values of various dom elements. I have a string which has some html tags eg <p></p>.
When I set the element $("#element").html(data['text']) it adds double quotes to the text and all of the html elements appear as text.
I can't seem to remove the quotes using replace. Oddly when I alert the value there are no quotes. They only appear in the html when I view the code.
What is the best way to include html with text? And how do I get jquery to render this has html and not text?
Many thanks!
PHP / MySQL
//Article by id
if(isset($_GET['do']) && $_GET['do']=='get_art') {
$content = array();
$id = clean_input($_GET['id']);
$q = "SELECT * FROM articles WHERE id = '$id'";
$r = $conn->query($q);
$row = $r->fetch_assoc();
$content['title'] = str_replace("€", '€', $row['title']);
$content['img'] = $row['img'];
$content['text'] = str_replace('€', '€', $row['text']);
$content['text'] = htmlentities($row['text']);
echo json_encode($content);
}
//end article by id
jQuery
//load article onclick
$('body').on('click', '.get_art', function (e) {
e.preventDefault();
var href = $(this).attr('href').replace('#', ' ');
$.ajax({
url: 'actions.inc.php?do=get_art&id=' + href,
dataType: 'json',
type: 'post',
success: function(data) {
var art_text = data['text'];
art_text = art_text.replace('€', '€');
$("#art_img").attr("src", "images/" + data['img']);
$("#art_title").html(data['title']);
$("#art_text").html($.parseHTML(art_text));
} //end success
});//end ajax
});
//end load
The htmlentities (php) was making a mess of things, got it working now after I console logged the output from the php file
I have here a function that creates a clickable link:
function makeClickableLinks($text) {
$notIncludedonLink = preg_replace('/\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|$!:,.;]*[A-Z0-9+&##\/%=~_|$]/i', '', $text); // removing not included on the link
$urlLink = str_replace($notIncludedonLink,'',$text);
$finalText = str_replace($urlLink,''.$urlLink.'',$text);
return $finalText;
}
But instead of returning plain clickable link:
http://docs.google.com/
it displays:
http://docs.google.com/
I tried using htmlentities but it doesn't work.
Here's a JS code that sends data to server:
function checkNewLink() {
var latestId = $("input[name=latestLink]").val();
$('.newReply').load("links/ajax.php?action=newreply&msgid=<?php echo $msgId; ?>&latestid=" + latestId);
}
setInterval("checkNewLink()", 200);
where latestId contains the inputted link. It will be sent to ajax.php. Every 200 ms, it will check if there are new inputted link.
<?php
function makeClickableLinks($text){
return preg_replace('!(((f|ht)tp(s)?://)[-a-zA-Zа-яА-Я()0-9#:%_+.~#?&;//=]+)!i', '$1', $text);
}
echo makeClickableLinks('test http://docs.google.com/ test');
output code (http://codepad.org/EZE1HFZ4)
test http://docs.google.com/ test
AFTER UPDATE
change
setInterval("checkNewLink()", 200);
to
setInterval(function(){ checkNewLink() }, 200);
read setInterval() Method
There is a Dutch news website at: nu.nl
I am very interested in getting the first url headline which is resided over her:
<h3 class="hdtitle">
<a style="" onclick="NU.AT.internalLink(this, event);" xtclib="position1_article_1" href="/buitenland/2880252/griekse-hotels-ontruimd-bosbranden.html">
Griekse hotels ontruimd om bosbranden <img src="/images/i18n/nl/slideshow/bt_fotograaf.png" class="vidlinkicon" alt=""> </a>
</h3>
So my question is how do I get this url? Can I do this with Jquery? I would think not because it is not on my server. So maybe I would have to use PHP? Where do I start...?
Tested and working
Because http://www.nu.nl is not your site, you can do a cross-domain GET using the PHP proxy method, otherwise you will get this kind of error:
XMLHttpRequest cannot load http://www.nu.nl/. Origin
http://yourdomain.com is not allowed by Access-Control-Allow-Origin.
First of all use this file in your server at PHP side:
proxy.php (Updated)
<?php
if(isset($_GET['site'])){
$f = fopen($_GET['site'], 'r');
$html = '';
while (!feof($f)) {
$html .= fread($f, 24000);
}
fclose($f);
echo $html;
}
?>
Now, at javascript side using jQuery you can do the following:
(Just to know I am using prop(); cause I use jQuery 1.7.2 version. So, if you are using a version before 1.6.x, try attr(); instead)
$(function(){
var site = 'http://www.nu.nl';
$.get('proxy.php', { site:site }, function(data){
var href = $(data).find('.hdtitle').first().children(':first-child').prop('href');
var url = href.split('/');
href = href.replace(url[2], 'nu.nl');
// Put the 'href' inside your div as a link
$('#myDiv').html('' + href + '');
}, 'html');
});
As you can see, the request is in your domain but is a kind of tricky thing so you won't get the Access-Control-Allow-Origin error again!
Update
If you want to get all headlines href as you wrote in comments, you can do the following:
Just change jQuery code like this...
$(function(){
var site = 'http://www.nu.nl';
$.get('proxy.php', { site:site }, function(data){
// get all html headlines
headlines = $(data).find('.hdtitle');
// get 'href' attribute of each headline and put it inside div
headlines.map(function(elem, index){
href = $(this).children(':first-child').prop('href');
url = href.split('/');
href = href.replace(url[2], 'nu.nl');
$('#myDiv').append('' + href + '<br/>');
});
}, 'html');
});
and use updated proxy.php file (for both cases, 1 or all headlines).
Hope this helps :-)
You can use simplehtmldom library to get that link
Something like that
$html = file_get_html('website_link');
echo $html->getElementById("hdtitle")->childNodes(1)->getAttribute('href');
read more here
I would have suggested RSS, but unfortunately the headline you're looking for doesn't seem to appear there.
<?
$f = fopen('http://www.nu.nl', 'r');
$html = '';
while(strpos($html, 'position1_article_1') === FALSE)
$html .= fread($f, 24000);
fclose($f);
$pos = strpos($html, 'position1_article_1');
$urlleft = substr($html, $pos + 27);
$url = substr($urlleft, 0, strpos($urlleft, '"'));
echo 'http://www.nu.nl' . $url;
?>
Outputs: http://www.nu.nl/buitenland/2880252/griekse-hotels-ontruimd-bosbranden.html
Use cURL to retrieve the page. Then, use the following function to parse the string you've provided;
preg_match("/<a.*?href\=\"(.*?)\".*?>/is",$text,$matches);
The result URL will be in the $matches array.
If you want to set up a jQuery bot to scrape the page through a browser (Google Chrome extensions allow for this functionality):
// print out the found anchor link's href attribute
console.log($('.hdtitle').find('a').attr('href'));
If you want to use PHP, you'll need to scrape the page for this href link. Use libraries such as SimpleTest to accomplish this. The best way to periodically scrape is to link your PHP script to a cronjob as well.
SimpleTest: http://www.lastcraft.com/browser_documentation.php
cronjob: http://net.tutsplus.com/tutorials/php/managing-cron-jobs-with-php-2/
Good luck!
I have two XML sources to retrieve data from. I want to use them alternately per page load. So when someone visits the page the first source will be used, next time the visit the page the other source will be used. Here is the ajax request I am using to get one data source:
$(document).ready(function() {
$.ajax({
type: "GET",
url: "source1.xml", //how do I alternately load two different xml data sources?
dataType: "xml",
success: function(xml) {
var counter = 0
var output = '<li>';
$(xml).find('person').each(function(){
counter++;
var image = $(this).find('image').text();
var name = $(this).find('name').text();
var title = $(this).find('title').text();
var company = $(this).find('company').text();
output = output + '<div><img src=img/' + image + '.jpg />' + '<br /><label><span>' + name + '</span><br />' + title + '<br />' + company + '</label><br /></div>';
if(counter % 3 === 0){
output = output + '</li><li>';
}
});
output = output + '</li>';
$('#update-target ul').html(output);
}
});
});
For extra info, here is how I am alternately loading 2 flash files using PHP:
if(isset($_SESSION['rotation'])){
$picker = $_SESSION['rotation'];
}else{
$picker = rand(0,1);
}
if($picker == 0){
echo '<script type="text/javascript">
var video1 = new SWFObject("somefile1.swf", "p1", "151", "590", "9", "#ffffff");
video1.addParam("wmode","transparent");
video1.write("meh");
</script>';
$_SESSION['rotation'] = ++$picker;
} else {
echo '<script type="text/javascript">
var video1 = new SWFObject("somefile2.swf", "p1", "151", "590", "9", "#ffffff");
video1.addParam("wmode","transparent");
video1.write("meh");
</script>';
$_SESSION['rotation'] = --$picker;
}
I realize I could just stick the jquery document ready code right in there where I have the js calling the flash but it does not seem like a very efficient way of handling this. What is a "best case" way to do this?
You can just use a variable to keep it short, like this:
echo '<script type="text/javascript">var xmlSource = "source1.xml";</script>';
Use that in an if caluse as well, then just reference that in your code:
url: xmlSource,
There are other ways of course, using a cookie (the cookie plugin), putting the text right in the document.ready handler, etc...whichever seems most elegant to you I suppose.
I recommend the variable from the PHP side or a cookie...both of these options allow the document.ready code to stay outside the page in an external script, and not downloaded by the user each time.