Warning when retrieving Metadata from remote webpage - php

I am getting these two errors when retrieving meta data from a remote webpage. Is this an escaping issue or maybe a cURL issue?
Warning: get_meta_tags(<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://...#import url( "http://www.zymic.com/forum/style_images/v6/folder_editor_images/css_rte.css" ); </style> </head> <body> <div id="ipbwrapper"> <!--ipb.javascript.start--> <script type="text/javascript"> //<![CDATA[ var ipb_var_st = "0"; var ipb_lang_tpl_q1 = "Please enter a page number to jump to between 1 and"; var ipb_var_s = "f2e0d2b492f248ec27ef34ae291a1db4"; var ipb_var_phpext = "php"; var ipb_var_base_url = "http://www.zymic.com/forum/index.php?s=f2e0d2b492f248ec27ef34ae291a1db4&"; var ipb_var_image_url = "style_images/v6"; var ipb_input_f = "34"; var ipb_input_t = "5188"; var ipb_input_p = ""; var ipb_var_cookieid
= ""; var ipb_var_cookie_ in public_html/list/main/output.php on line 22 retrieve pagetitle Warning: file_get_contents(<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://...#import url( "http://www.zymic.com/forum/style_images/v6/folder_editor_images/css_rte.css" ); </style> </head> <body> <div id="ipbwrapper"> <!--ipb.javascript.start--> <script type="text/javascript"> //<![CDATA[ var ipb_var_st = "0"; var ipb_lang_tpl_q1 = "Please enter a page number to jump to between 1 and"; var ipb_var_s = "f2e0d2b492f248ec27ef34ae291a1db4"; var ipb_var_phpext = "php"; var ipb_var_base_url = "http://www.zymic.com/forum/index.php?s=f2e0d2b492f248ec27ef34ae291a1db4&"; var ipb_var_image_url = "style_images/v6"; var ipb_input_f = "34"; var ipb_input_t = "5188"; var ipb_input_p = ""; var ipb_var_cookieid
= ""; var ipb_var_coo in /public_html/list/main/output.php on line 27
Here is the code:
////Use Curl Library to get page content for security
$url = 'http://en.wikipedia.org/wiki/Category:Lists_of_lists';
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($curl, CURLOPT_USERAGENT, 'ListBot 1.0: Used for compiling a DB of lists across the internet.');
$str = curl_exec($curl);
curl_close($curl);
//get metadata
$tags = get_meta_tags($str);
//Get page title
function get_page_title($str){
if( !($data = file_get_contents($str)) ) return false;
if( preg_match("#<title>(.+)<\/title>#iU", $data, $t)) {
return trim($t[1]);
} else {
return false;
}
}
///////////
echo('retrieve pagetitle');
$tags['title'] = get_page_title($str);

get_meta_tags expects a file location (commonly a url).
You could request the url directly and parse the headers, but you'd probably get better results doing a regular expression match on the string you retrieved with curl.
You have a nice bit of code that grabs the title. Simply modify that to grab all the meta tags.
In the php.net page describing "get_meta_tags()" jstel at 126 dot com contributed this nice function call:
preg_match_all("/<meta[^>]+(http-equiv|name)=\"([^\"])\"[^>]" . "+content=\"([^\"])\"[^>]*>/i", $v, $split_content[], PREG_PATTERN_ORDER);
Which will search string $v for meta data and dump matches into $split_content. In his sample he does a bunch of looping that seems unneeded, but I'd suggest looking at his code and seeing if you can adapt it.

Related

Attempting to stimulate page to produce server sent event with POST data from another page not working

When I open update.php on its own (with self supplied test vars), it sends the SSE to testsse.php just fine and there are no issues (Everything I need to be printed is showing up in inspect element), However, I am trying to have POST data from another page (In this case mootssetest.php) get received by update.php so it may send out the SSE containing the data. I am not sure what I am doing wrong, but this test rig is not working. Guidance would be appreciated.
testsse.php (front end page meant to receive SSE and print)
<!DOCTYPE html>
<html lang="en">
<head>
<title>Using Server-Sent Events</title>
<script>
window.onload = function() {
var link = new EventSource("update.php");
var antispam;
var inputthing = event.data;
var splitted;
link.onmessage = function(event) {
inputthing = event.data;
splitted = inputthing.split(" ");
if (splitted[0] != antispam && splitted[1] == <?php echo $page; ?>) {
document.getElementById("livemsg").innerHTML += "<div id=\"post-" + splitted[0] + "\" class=\"reply\">" + "</div>";
antispam = splitted[0];
};
};
};
</script>
</head>
<body>
<div id="livemsg">
<!--Server response will be inserted here-->
</div>
</body>
</html>
update.php (SSE sender, post receiver)
<?php
$data = json_decode(file_get_contents('php://input'), true);
$postnum = $data[0];
$bread = $data[1];
postnum = 32;
bread = 4;
function liveupdate($postnum, $bread)
{
header("Content-Type: text/event-stream");
header("Cache-Control: no-cache");
echo "data: " . $postnum . " " . $bread . "\n\n";
flush();
}
liveupdate($postnum, $bread);
?>
mootssetest.php (POST sender)
function httppost($postnum, $bread)
{
$url = "http://localhost/update.php";
$data = array($postnum, $bread);
$curl = curl_init($url);
$jsondata = json_encode($data);
curl_setopt( $ch, CURLOPT_POSTFIELDS, $jsondata );
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "POST");
curl_setopt( $ch, CURLOPT_HTTPHEADER, array('Content-Type:application/json'));
curl_setopt( $ch, CURLOPT_RETURNTRANSFER, true );
$result = curl_exec($ch);
curl_close($ch);
return $result;
}
httppost(420, 4);
?>
(For context, I am trying to have this print a new post in some forum software every time a function is called without refreshing the page for the user)
you haven't included the event in your window.onload() function, please fix it first and try again.

what should i do for get all http links in cURL

I created a program in php using CURL, in which i can take data of any site and can display it in the browser. Another part of the program is that the data can be saved in the file using file handling and after saving this data, I can find all the http links within the body tag of the saved file. My code is showing all the sites in the browser which I took, but I can not find all http links
Kindly help me out this problem.
PHP Code:
<!DOCTYPE html>
<html>
<head>
<title>Display links using Curl</title>
</head>
<body>
<?php
$GetData = curl_init();
$url = "http://www.ucertify.com/";
curl_setopt($GetData, CURLOPT_URL, $url);
curl_setopt($GetData, CURLOPT_RETURNTRANSFER, 1);
$data = curl_exec($GetData);
curl_close($GetData);
$file=fopen("content.txt","w");
fputs($file,$data);
fclose($file);
echo $data;
function links() {
$file_content = file_get_contents("http://www.ucertify.com/");
$dom_obj = new DOMDocument();
#$dom_obj->loadHTML($file_content);
$xpath = new DOMXPath($dom_obj);
$links_href = $xpath->evaluate("/html/body//a");
for ($i = 0; $i<$links_href->length; $i++) {
$href = $links_href->item($i);
$url = $href->getAttribute("href");
if(strstr($url,"#")||strstr($url,"javascript:void(0)")||$url=="javascript:;"||$url=="javascript:"){}
else {
echo "<div>".$url."<div/>";
}
}
}
echo links();
?>
</body>
</html>
You can use regex like this
preg_match("/<body[^>]*>(.*?)<\/body>/is", $file_data, $body_content);
preg_match_all("/\b(?:(?:https?|ftp):\/\/|www\.)[-a-z0-9+&##\/%?=~_|!:,.;]*[-a-z0-9+&##\/%=~_|]/i",$body_content[1],$matches);
foreach($matches[0] as $d) {
echo $d."<br>";
}

jQuery send file chunks to PHP upload to Ooyala

I have been stuck on this for over a week and I think I am long overdue for asking on here.. I am trying to get my users to upload their video files using the jQuery File Upload Plugin. We do not want to save the file on our server. The final result is having the file saved in our Backlot using the Ooyala API. I have tried various approaches and I am successful in creating the asset in Backlot and getting my upload URLs, but I do not know how to upload the file chunks using the URLs into Backlot. I have tried FileReader(), FormData(), etc. I am pasting the last code I had that created the asset, and gave me the upload URLs, but did not save any chunks into Backlot. I assume I may be getting stuck in one of my AJAX calls, but I am not very sure.
I keep getting:
Uncaught InvalidStateError: An attempt was made to use an object that is not, or is no longer, usable.
Here is my page with the JS for the jQuery File Upload widget by BlueImp:
<html>
<head>
<script src="//ajax.googleapis.com/ajax/libs/jquery/1.9.1/jquery.min.js"></script>
<script type="text/javascript" src="<?php print base_path() . path_to_theme() ?>/res/js/jQuery-File-Upload/js/vendor/jquery.ui.widget.js"></script>
<script type="text/javascript" src="<?php print base_path() . path_to_theme() ?>/res/js/jQuery-File-Upload/js/jquery.iframe-transport.js"></script>
<script type="text/javascript" src="<?php print base_path() . path_to_theme() ?>/res/js/jQuery-File-Upload/js/jquery.fileupload.js"></script>
</head>
<body>
<input id="fileupload" type="file" accept="video/*">
<script>
//var reader = FileReader();
var blob;
$('#fileupload').fileupload({
forceIframeTransport: true,
maxChunkSize: 500000,
type: 'POST',
add: function (e, data) {
var goUpload = true;
var ext = ['avi','flv','mkv','mov','mp4','mpg','ogm','ogv','rm','wma','wmv'];
var uploadFile = data.files[0];
var fileName = uploadFile.name;
var fileExtension = fileName.substring(fileName.lastIndexOf('.') + 1);
if ($.inArray( fileExtension, ext ) == -1) {
alert('You must upload a video file only');
goUpload = false;
}
if (goUpload == true) {
$.post('../sites/all/themes/episcopal/parseUploadJSON.php', 'json=' + JSON.stringify(data.files[0]), function (result) {
var returnJSON = $.parseJSON(result);
data.filechunk = data.files[0].slice(0, 500000);
data.url = returnJSON[0];
//reader.onloadend = function(e) {
//if (e.target.readyState == FileReader.DONE) { // DONE == 2
//data.url = returnJSON[0];
// }
//}
//$.each(returnJSON, function(i, item) {
//data.url = returnJSON[0];
//blob = data.files[0].slice(0, 500000);
//console.log(blob);
//reader.readAsArrayBuffer(blob);
//data.submit();
//});
data.submit();
});
}
},//end add
submit: function (e, data) {
console.log(data); //Seems fine
//console.log($.active);
$.post('../sites/all/themes/episcopal/curlTransfer.php', data, function (result) { //fails
console.log(result);
});
return false;
}
});
</script>
</body></html>
Then there is the parseUploadJSON.php code, please keep in mind that my real code has the right Backlot keys. I am sure of this:
<?php
if(isset($_POST['json'])){
include_once('OoyalaAPI.php');
$OoyalaObj = new OoyalaApi("key", "secret",array("baseUrl"=>"https://api.ooyala.com"));
$expires = time()+15*60; //Adding 15 minutes in seconds to the current time
$file = json_decode($_POST['json']);
$responseBody = array("name" => $file->name,"file_name"=> $file->name,"asset_type" => "video","file_size" => $file->size,"chunk_size" => 500000);
$response = $OoyalaObj->post("/v2/assets",$responseBody);
$upload_urls = $OoyalaObj->get("/v2/assets/".$response->embed_code."/uploading_urls");
$url_json_string = "{";
foreach($upload_urls as $key => $url){
if($key+1 != count($upload_urls)){
$url_json_string .= '"' . $key . '":"' . $url . '",';
}else {
$url_json_string .= '"' . $key . '":"' . $url . '"';
}
}
$url_json_string .= "}";
echo $url_json_string;
}
?>
Then I have the curlTransfer.php:
<?php
echo "starting curl transfer";
echo $_POST['filechunk'] . " is the blob";
if(isset($_FILES['filechunk']) && isset($_POST['url'])){
echo "first test passed";
$url = $_POST['url'];
//print_r(file_get_contents($_FILES['filechunk']));
$content = file_get_contents($_FILES['filechunk']);
print_r($content);
$ch = curl_init($url);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt ($ch, CURLOPT_HTTPHEADER, Array("Content-Type: multipart/mixed"));
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "PUT");
curl_setopt($ch, CURLOPT_POSTFIELDS, $content);
try {
//echo 'success';
return httpRequest($ch);
}catch (Exception $e){
throw $e;
}
}
/****Code from Ooyala****/
function httpRequest($ch){
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
$response = curl_exec($ch);
if(curl_error($ch)){
curl_close($ch);
return curl_error($ch);
}
$head=curl_getinfo($ch);
$content = $head["content_type"];
$code = $head["http_code"];
curl_close($ch);
}
?>
And the OoyalaApi.php is here (I saved a copy on my server):
https://github.com/ooyala/php-v2-sdk/blob/master/OoyalaApi.php
I apologize in advance if the code is messy and there's a lot of parts commented out. I have changed this code so much and I cannot get it. I appreciate all of your time and effort.
EDIT
I went back to trying FileReader out as this post Send ArrayBuffer with other string in one Ajax call through jQuery kinda worked for me, but I think it would be safer to read it using readAsArrayBuffer and now I am having trouble saving the array buffer chunks in some sort of array...
We have implemented ooyala file chunk upload in Ruby On Rails by referring this.
We have used the entire JS file as it is from this link.
https://github.com/ooyala/backlot-ingestion-library

Get input value from parent document

I'm using CURL to scrape a website like this:
<?php
$url = "http://www.bbc.com/news/";
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$curl_scraped_page = curl_exec($ch);
curl_close($ch);
$curl_scraped_page = preg_replace("#(<\s*a\s+[^>]*href\s*=\s*[\"'])(?!http)([^\"'>]+)([\"'>]+)#",'$1http://www.bbc.com/news/$2$3', $curl_scraped_page);
echo $curl_scraped_page;
?>
As you can see the URL is set for BBC news. However, I would like the URL to be a variable instead. The variable would have to be the value of parent.document. In JQuery for example I would do this:
var value = $("input", parent.document.body).val();
How do I set something like that in PHP? I have Googled but I couldn't find anything about parent.document in PHP.
PHP is a server-side scripting language and therefore has no access to the current HTML page. It is processed before the HTML is sent to the client's browser, therefore parent.document doesn't even exist at the time the script is being processed.
If you would like to pass data from an HTML page to a PHP script, you can do so using an HTML <form> or through JavaScript/JQuery AJAX requests.
For example, the following code will pass the value of input to the PHP script:
<html>
<head>
<script type="text/javascript" src="http://code.jquery.com/jquery-1.9.1.min.js"></script>
<script type="text/javascript">
function pass(){
var value = $("input", parent.document.body).val();
$.ajax({
type: "POST",
url: "myscript.php",
data: { mydata: value }
}).done(function( msg ) {
alert( "Data Saved: " + msg );
});
}
</script>
</head>
<body>
<input type="text" />
<button onclick="pass();return false;">Pass Value</button>
</body>
</html>
And the revised script (myscript.php):
<?php
$url = isset($_POST['mydata']) ? $_POST['mydata'] : '';
$curl_scraped_page = '';
if(!empty($url)){
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$curl_scraped_page = curl_exec($ch);
curl_close($ch);
$curl_scraped_page = preg_replace("#(<\s*a\s+[^>]*href\s*=\s*[\"'])(?!http)([^\"'>]+)([\"'>]+)#",'$1'.$url.'$2$3', $curl_scraped_page);
}
echo $curl_scraped_page;
?>
I would recommend using $(id) to retrieve the value of an <input> instead of $("input",context).
E.g.
var value = $('#txt').val();
And in the HTML:
<input type="text" id="txt" />
For more info on JQuery.ajax see here.

unable to parse xml with javascript loadXMLString()

I am trying to parse the xml file from a webservice. I am using javascript loadXMLString function to parse the xml into html. with local file it was working fine if i insert the xml code in to a variable. but for getting xml from external link i have used php function here like this:
<?php
$request = "http://www.somewebsite.com/feeds/get-cities.php?vendor_key=xxx";
$response = file_get_contents($request);
$xmlstring = htmlspecialchars($response, ENT_QUOTES);
?>
<script language="javascript">
function loadXMLString(txt)
{
if (window.DOMParser)
{
parser=new DOMParser();
xmlDoc=parser.parseFromString(txt,"text/xml");
}
else // Internet Explorer
{
xmlDoc=new ActiveXObject("Microsoft.XMLDOM");
xmlDoc.async=false;
xmlDoc.loadXML(txt);
}
return xmlDoc;
}//function loadXMLString ends
text = <?php $xmlstring;?>
xmlDoc=loadXMLString(text);
document.write("<table border='1'>");
var x=xmlDoc.getElementsByTagName("city");
for (i=0;i<x.length;i++)
{
document.write("<tr style='background:#dddddd;'><td>");
document.write(x[i].getElementsByTagName("name")[0].childNodes[0].nodeValue);
document.write("</td><td>");
document.write(x[i].getElementsByTagName("country")[0].childNodes[0].nodeValue);
document.write("</td></tr>");
}
document.write("</table>");
</script>
in the above code i am trying to insert the xml code from a php variable $xmlstring to javascript variable text. but it display nothing. but if i put the xml code inside the script like below it works perfectly:
text="<cities>"
text=text+"<city>";
text=text+"<name>bulga</name>";
text=text+"<country>Giada De Laurentiis</country>";
text=text+"<city_id>2005</city_id>";
text=text+"</city>";
text=text+"</cities>";
does any body know how can i parse it. or if somebody have a better solution please suggest me that also.
Try to change following line in your code
text = <?php echo $xmlstring;?>
it should echo your variable value.
With the help of GBD i have written the following code and its start displaying the city list. but when i try this with different xml code it does not work. may be someone have better solution for this
<?php
function curl_get_file_contents($URL)
{
$c = curl_init();
curl_setopt($c, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($c, CURLOPT_URL, $URL);
$contents = curl_exec($c);
curl_close($c);
if ($contents) return $contents;
else return FALSE;
}
$xmlString = curl_get_file_contents("http://www.somesite.com/feeds/get-cities.php?vendor_key=xxx");
?>
<script language="javascript">
function loadXMLString(text)
{
if (window.DOMParser)
{
parser=new DOMParser();
xmlDoc=parser.parseFromString(text,"text/xml");
}
else // Internet Explorer
{
xmlDoc=new ActiveXObject("Microsoft.XMLDOM");
xmlDoc.async=false;
xmlDoc.loadXML(text);
}
return xmlDoc;
}
var text = "<?php echo substr_replace($xmlString,"",0,39);?>";
xmlDoc=loadXMLString(text);
document.write("<table border='1'>");
var x=xmlDoc.getElementsByTagName("city");
for (i=0;i<x.length;i++)
{
document.write("<tr style='background:#dddddd;'><td>");
document.write(x[i].getElementsByTagName("name")[0].childNodes[0].nodeValue);
document.write("</td><td>");
document.write(x[i].getElementsByTagName("country")[0].childNodes[0].nodeValue);
document.write("</td></tr>");
}
document.write("</table>");
</script>

Categories