This question already has answers here:
How do you parse and process HTML/XML in PHP?
(31 answers)
Closed 9 years ago.
I'm trying to create a social bookmarking site using php and mysql.
When I save a website's URL, I want to be able to save the site's title, favicon and description in a table in my database, then print them on my page using ajax.
How can I extract those elements from a website?
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Untitled Document</title>
</head>
<body>
<?php
$myServer = "localhost";
$myUser = "root";
$myPass = "'100pushups'";
$myDB = "social_bookmarking";
//connection to the database
$connect = mysqli_connect($myServer,$myUser, $myPass)
or die("Couldn't connect to SQLServer on $myServer");
//select a database to work with
$selected = mysqli_select_db($connect, $myDB)
or die("Couldn't open database $myDB");
var_dump($_POST);
//declare the SQL statement that will query the database
$url = "INSERT INTO url (url ) VALUES ('$_POST[url]')";
if (isset($_POST['value']))
{
// Instructions if $_POST['value'] exist
echo 'Your url is ' .$url;
}
$data = get_meta_tags($url);
print_r($data);
if (!mysqli_query($connect, $url)) {
die('Error: ' . mysql_error());
}
else
{
echo "Your information was added to the database";
}
mysqli_close($connect);
?>
</body>
</html>
I know I'm doing something wrong with my url there, but I don't know how to use a variable as an argument in get_meta_tags, since the function only accepts filenames or strings.
You can get the title by using: (courtesy of https://stackoverflow.com/users/54680/jonathan-sampson)
<?php
if ( $_POST["url"] ) {
$doc = new DOMDocument();
#$doc->loadHTML( file_get_contents( $_POST["url"] ) );
$xpt = new DOMXPath( $doc );
$output = $xpt->query("//title")->item(0)->nodeValue;
} else {
$output = "URL not provided";
}
echo $output;
?>
You can get the favicon using:
<?php
$url = $_POST['url'];
$doc = new DOMDocument();
$doc->strictErrorChecking = FALSE;
$doc->loadHTML(file_get_contents($url));
$xml = simplexml_import_dom($doc);
$arr = $xml->xpath('//link[#rel="shortcut icon"]');
echo $arr[0]['href'];
?>
Finally for the description you can use:
<?php
$tags = get_meta_tags($_POST['url']);
$description = $tags['description'];
echo $description;
?>
There are very smart scripts/classes out there that help getting content from the dom. For instance using smart selectors. I recommend using one of those.
This is a nice example:
http://simplehtmldom.sourceforge.net/
To get the content of the page, use file_get_contents or equal function.
You can use file_get_contents() function to get the favicon for a site(unless it thwarts you for https). Example:
$icon = file_get_contents("http://stackoverflow.com/favicon.ico");
// now save it
Another option is using curl. It's an awesome php extension if you know how to use it.
Using these methods, you can fetch the html content from the sites too. And then can parse them any HTML parser library of PHP. Or can use REGEX(which experts doesn't recommend often).
Related
I'm trying to write an Joomla plugin to add width and height tag to each <img> in HTML file.
Some image file names are Persian, and getimagesize faces error.
The code is this:
#$dom->loadHTML('<?xml version="1.0" encoding="UTF-8"?>' . "\n" . '
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<img src="images\banners\س.jpg" style="max-width: 90%;" >
</body>
</html>
');
$x = new DOMXPath($dom);
foreach($x->query("//img") as $node)
{
$imgtag = $node->getAttribute("src");
$imgtag = pathinfo($imgtag);
$imgtag = $imgtag['dirname'].'\\'.$imgtag['basename'];
$imgtag = getimagesize($imgtag);
$node->setAttribute("width",$imgtag[0]);
$node->setAttribute("height",$imgtag[1]);
}
$newHtml = urldecode($dom->saveHtml($dom->documentElement));
And when Persian characters exist in file name, getimagesize shows:
Warning: getimagesize(images\banners\س.jpg): failed to open stream: No such file or directory in C:\wamp64\www\plugin.php
How can I solve this?
Thanks to all,
I couldn't reach to results on WAMP server (local server on Windows),
but when I migrated to Linux server, finally this code worked properly.
$html = $app->getBody();
setlocale(LC_ALL, '');
$dom = new DOMDocument();
#$dom->loadHTML($html);
$x = new DOMXPath($dom);
foreach($x->query("//img") as $node)
{
$imgtag = $node->getAttribute("src");
if(strpos($imgtag,"data:image")===false)
{
$imgtag = getimagesize($imgtag);
$node->setAttribute("width",$imgtag[0]);
$node->setAttribute("height",$imgtag[1]);
}
}
$bodytag = $x->query("//body");
$node = $dom->createElement("script", ' /* java script which may be necessary on client */ ');
$bodytag[0]->appendChild($node);
$html = '<!DOCTYPE html>'."\n" . $dom->saveHtml($dom->documentElement);
Some hints:
the code, shouldn't touch base64 image sources, so I added an condition to the code.
if some script (or whatever, div, p, ....) should be added to body tag, you can use appendChild method.
<!DOCTYPE html> should be added to final DOM object output :)
So I'm a bit stuck, and I've been given various solutions, none of which work. Any hotshot PHP folks out there? Here's the deal, I'm trying to get an image to display on my website, from another website, that has a randomly generated IMG. Though I'm actually trying to do this off a personal art site of mine, this example will serve perfectly.
http://commons.wikimedia.org/wiki/Special:Random/File
A random image page with an image on it pops up with that link. Now, I'd like to display THAT random image, or whatever image comes up, on another site. The two possible solutions I have encountered is gathering an array of URL LINKS from a given link. And then re displaying that array as images on another site, like a: < a href="https
The code I get back from what I'm talking about looks like this:
Array
(
[0] => https ://kfjhiakwhefkiujahefawef/awoefjoiwejfowe.jpg
[1] => https ://oawiejfoiaewjfoajfeaweoif/awoeifjao;iwejfoawiefj.png
)
Instead of the print out however, I'd like the actual images displayed, well specifically array [0], but one thing at a time. The code that's actually doing this is:
<?php
/*
Credits: Bit Repository
URL: http://www.bitrepository.com/
*/
$url = 'http://commons.wikimedia.org/wiki/Special:Random/File';
// Fetch page
$string = FetchPage($url);
// Regex that extracts the images (full tag)
$image_regex_src_url = '/<img[^>]*'.
'src=[\"|\'](.*)[\"|\']/Ui';
preg_match_all($image_regex, $string, $out, PREG_PATTERN_ORDER);
$img_tag_array = $out[0];
echo "<pre>"; print_r($img_tag_array); echo "</pre>";
// Regex for SRC Value
$image_regex_src_url = '/<img[^>]*'.
'src=[\"|\'](.*)[\"|\']/Ui';
preg_match_all($image_regex_src_url, $string, $out, PREG_PATTERN_ORDER);
$images_url_array = $out[1];
echo "<pre>"; print_r($images_url_array); echo "</pre>";
// Fetch Page Function
function FetchPage($path)
{
$file = fopen($path, "r");
if (!$file)
{
exit("The was a connection error!");
}
$data = '';
while (!feof($file))
{
// Extract the data from the file / url
$data .= fgets($file, 1024);
}
return $data;
}
for($i=0; $i<count($arr1); $i++) {
echo '<img src="'.$arr1[$i].'">';
}
?>
Solution two,
Use a file_get_contents command. Which is this:
<?php
$html =
file_get_contents("http://commons.wikimedia.org/wiki/Special:Random/File");
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$image_src = $xpath->query('//div[contains(#class,"fullImageLink")]/a/img')
[0]->getAttribute('src') ;
echo "<img src='$image_src'><br>";
?>
However, there's unfortunately an error message I get: Fatal error: Cannot use object of type DOMNodeList as array in /home/wilsons888/public_html/wiki.php on line 11. Or, if I remove a "}" at the end, I just get a blank page.
I have been told that the above code will work, but with openssl extension included. Problem is, I have no idea how to do this. (I'm very new to PHP). Anyone know how to plug it in, so to speak? Thank you so much! I feel like I'm close, just missing the last element.
I was able to load the random image, and "print it" as an image directly (so you can embed the php file directly on the IMG tag) using this code:
<?php
$html = file_get_contents("http://commons.wikimedia.org/wiki/Special:Random/File");
$dom = new DOMDocument();
$dom->loadHTML($html);
$remoteImage = $dom->getElementById("file")->firstChild->attributes[0]->textContent;
header("Content-type: image/png");
header('Content-Length: ' . filesize($remoteImage));
echo file_get_contents($remoteImage);
?>
Get a new file called showImage.php and put this code in it:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Untitled Document</title>
</head>
<body>
<img src="test.php">
</body>
</html>
Next, go to your browser and get the showImage.php path, and will show a random image fromt he site you asked...
This question already has answers here:
XML error parsing SOAP payload: Reserved XML Name
(3 answers)
Closed 9 years ago.
I'm quite new to webservices and soap, and I followed a tutorial and came with this code:
SOAP Server :
<?php
include("lib/nusoap.php");
include("getDB.php");
function getUsers()
{
$user_id = $_GET['user_id'];
$result = mysql_query("SELECT * FROM -table name- WHERE user_id = '$user_id'");
$try = mysql_fetch_array($result);
return join(",", array(
$result['username'], $result['password']
));
}
$server = new soap_server();
$server->register("getUsers");
$server->service($HTTP_RAW_POST_DATA);
?>
SOAP Client :
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title></title>
<!-- Error Reporting -->
<?php
error_reporting(E_ALL);
ini_set('display_errors', '1');
?>
</head>
<body>
<?php
include("lib/nusoap.php");
$client = new nusoap_client("http://localhost/wp-content/themes/blackbird/phpwizard/HTML5Application/public_html/Webservice.php?user_id=4");
$error = $client->getError();
if ($error)
{
echo "<h2>Constructor error</h2><pre>" . $error . "</pre>";
}
$result = $client->call("getUsers", array("category" => "books"));
if ($client->fault)
{
echo "<h2>Fault</h2><pre>";
print_r($result);
echo "</pre>";
}
else
{
$error = $client->getError();
if ($error)
{
echo "<h2>Error</h2><pre>" . $error . "</pre>";
}
else
{
echo "<h2>Books</h2><pre>";
echo $result;
echo "</pre>";
}
}
?>
</body>
</html>
Now when loading the SOAP client I'm getting the error:
XML error parsing SOAP payload on line 3: Reserved XML Name
I have no idea why this is happening.
Try to Remove the whitespace before <?xml as mentioned in this question
XML error parsing SOAP payload: Reserved XML Name
Also is it possible to paste the dump of the NuSOAP client, just like in that question so we can see whats being rendered. Lets start the debugging there and respond with what you see on your example.
Additionally, here is a tutorial that I have used in the past. Php by itself works well with SOAP so give that a try before adding in a layer of a separate library unless you need anything fancy from it. Try this example to see if it works for you.
IBM Opensource Php SoapServerClient example.
I am trying to get the title element's content that is contained in a echo statement of a PHP file.
I am using a PHP file for a website that when accessed by a Ajax call it returns only part of the page, but when accessed directly it returns the entire page.
That much is working fine. But I would like to change the title of the page when it is accessed via the Ajax call, the innerHTML of the title tag is what I'm trying to get.
if (empty($_SERVER['HTTP_X_REQUESTED_WITH']) == 'xmlhttprequest') {
echo '
<!DOCTYPE HTML>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Products at Avrent</title>
<meta http-equiv="content-type" content="text/htmlcharset=utf-8" />
With a HTML file this code works.
<?php
if(isset($_GET['url'])) {
$url = $_GET['url'];
$html = file_get_html($url);
/* get page's title */
preg_match("/<title>(.+)<\/title>/siU", $html, $matches);
$title = $matches[1];
echo $title;
}
?>
But it returns gibberish when I try using it with a PHP file.
Can someone help me find a PHP script that will work on a PHP file?
Here's what I've gathered: you have a bunch of HTML pages. You have an index.php script that takes a URL, loads up the HTML from that URL, swaps out the title, then spits the HTML back out?
First of all, why do you have things set up like that? If you insist...
You (at the very least) should do this:
index.php
Remove the RegEx. You're using an HTML parser; use that!
<?php
if(isset($_GET['url'])) {
$url = $_GET['url'];
$html = file_get_html($url);
/* get page's title */
$title = $html->find('title', 0)->innertext;
echo $title;
}
?>
ajax_page.php
Set title from variable.
if (empty($_SERVER['HTTP_X_REQUESTED_WITH']) == 'xmlhttprequest') {
echo '
<!DOCTYPE HTML>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>' . $page_title . '</title>
<meta http-equiv="content-type" content="text/htmlcharset=utf-8" />
Then, from index.php:
$page_title = "INSERT THE PAGE TITLE HERE";
require "ajax_page.php";
I am trying to capture the contents of my php page using output buffering:
<?php
function connect() {
$dbh = mysql_connect ("localhost", "user", "password") or die ('I cannot connect to the database because: ' . mysql_error());
mysql_select_db("PDS", $dbh);
return $dbh;
}
session_start();
if(isset($_SESSION['username'])){
if(isset($_POST['entryId'])){
//do something
$dbh = connect();
$ide = $_POST['entryId'];
$usertab = $_POST['usertable'];
$answertable = $usertab . "Answers";
$entrytable = $usertab . "Entries";
$query = mysql_query("SELECT e.date, q.questionNumber, q.question, q.sectionId, a.answer FROM $answertable a, Questions q, $entrytable e WHERE a.entryId = '$ide' AND a.questionId = q.questionId AND e.entryId = '$ide' ORDER BY q.questionNumber ASC;") or die("Error: " . mysql_error());
if($query){
//set variables
$sectionOne = array();
while($row=mysql_fetch_assoc($query)){
$date = $row['date'];
$sectionOne[] = $row;
}
}else{
//error - sql failed
}
}
?>
<?php
ob_start();
?>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<script src = "jQuery.js"></script>
<script>
$(document).ready(function(){
$("#export").click(function(e){
//post to html2pdfconverter.php
$("#link").val("<?php echo(ob_get_contents()); ?>"); //THIS DOESN'T WORK
$("#nm").val("Entry Report.pdf");
$("form#sendanswers").submit();
});
});
</script>
<title>Personal Diary System - Entry Report - <?php echo($date); ?></title>
</head>
<body>
<h1>Entry Report - <?php echo($date); ?></h1>
<div id = "buttons">
<form id = "sendanswers" name = "sendanswers" action="html2pdfconverter.php" method="post">
<input type = "hidden" name = "link" id = "link" value = "">
<input type = "hidden" name = "nm" id = "nm" value = "">
<input type = "button" name = "export" id = "export" value = "Export As PDF"/>
</form>
</div>
<h3>Biological Information</h3>
<?php
echo('<p>');
$i = 0;
foreach($sectionOne as &$value){
if($i == 1 || $i == 3){
$image = "assets/urine".$i.".png";
echo("<br/>");
echo($value['question']." <br/> "."<img src = \"$image\"/>");
echo("<br/>");
}else{
echo($value['question'].' : '.$value['answer']);
}
echo("<br/>");
$i++;
}
echo('</p>');
?>
</body>
</html>
<?php
}
$contents = ob_get_contents(); //THIS WORKS
ob_end();
?>
I assign the contents of ob to $contents using ob_get_contents(); This works, and echoing $contents duplicates the html page.
However, in my jQuery, I am trying to assign this to a hidden text field ('link') using:
$("#link").val("<?php echo($contents); ?>");
This doesn't work however..And I have a feeling its because I am accessing $contents too eraly but not too sure...any ideas?
$("#link").val("<?php echo(ob_get_contents()); ?>"); //THIS DOESN'T WORK
at the point you do that ob_get_contents call, you've only output about 10 lines of javascript and html. PHP will NOT reach back in time and magically fill in the rest of the document where you do this ob_get_contents().
You're basically ripping the page out of the laser printer the moment the page starts emerging, while the printer is still printing the bottom half of the page.
I fail to see why you want to embed the contents of your page into an input field. If you want to somehow cache the page's content in an input field, you can just use JS to grab the .innerHTML of $('body').
Well, you have two problems.
The first is what you suspect. You can't access that stuff until later. The second problem which you may not realize is that you will have quoting issues in JavaScript even if you manage to find a way to reorder this and make it work. It's recursive, in a bad way.
What you should do instead is change your $('#export').click handler to do an Ajax call, render the HTML you need to appear in the link on the server in a separate PHP script (no output buffering necessary) and then have your code inject the result of that call into the page the way you're trying to do in your click handler now.