I'm trying to figure it out how to take information from another site(with different domain name) and place it in my php program.
Explanation:
User inputs URL from another site.
jQuery or PHP takes information from entered URL. I know where the information is (i know its' divs ID)
And that var is put into my php program as a variable $kaina, for example.
EX:
User enters URL:http://www.sportsdirect.com/lee-cooper-bud-mens-boots-118358
And I want to get the Price. (27,99)
What lang should I use? PHP? or jquery? or anything else?
What function should I use?
How should the program look like?
Thank you for your answers :)
I'd say you have to use php (curl or file_get_contents) to download the page on to your server, parse it or use regular expression to get the price. But in this case it will be even trickier because it looks like this link leads to a page that uses javascript.
But you have to know the format of how you are going to extract the data. So PHP will do the job.
PHP's cURL library should do the trick for you: http://php.net/manual/en/book.curl.php
<?php
$ch = curl_init("http://www.example.com/");
$fp = fopen("example_homepage.txt", "w");
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_exec($ch);
curl_close($ch);
fclose($fp);
?>
You need to research on each of the step mentioned below,
One Thing That you can do is, post the message entered by the user to the server means PHP file, there you can extract the url entered by the user,
In order to extract the URL from the user post, you can use regex search:-
Check this link out:-
Extract URLs from text in PHP
Know you can curl to the url extracted from the user input.
$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_URL, $extracted_url );
$html = curl_exec ( $ch );
curl_close($ch);
The curl output will contain the complete html of the page, you can then use a HTML parser
$DOM = new DOMDocument;
$DOM->loadHTML($str);
to parse till the required div is found, to have its value.
I would proabaly do something like this:
get the contents of the file: $contents = file_get_contents("http://www.sportsdirect.com/lee-cooper-bud-mens-boots-118358")
convert the contents you just got to xml: $xml = new SimpleXMLElement($contents);
search the xml for the node with attribute itemprop="price" using xpath query
read the contents of that node, et voila, you have your price
Related
I am building a PHP script that needs to use content loaded from an external webpage, but I don't know how to send/receive data.
The external webpage is http://packer.50x.eu/ .
Basically, I want to send a script (which is manually done in the first form) and receive the output (from the second form).
I want to learn how to do it because it can surely be an useful thing in the future, but I have no clue where to start.
Can anyone help me? Thanks.
You can use curl to receive data from external page. Look this example:
$url = "http://packer.50x.eu/";
$ch = curl_init($url);
// curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); // you can use some options for this request
// curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5); // or not to use them
// you can set many others options. read about them in manual
$data = curl_exec($ch);
curl_close($ch);
var_dump($data); // < -- here is received page data
curl_setopt manual
Hope, this will help you.
You may want to look at file_get_contents($url) as it is very simple to use, simpler than CURL (more limited though), so your code could look like:
$url = "http://packer.50x.eu/";
$url_content=file_get_contents($url);
echo $url_content;
Look at the documentation as you could use offset and other tricks.
I'm using some simple php to scrape information from a website to allow reading it offline. The code seems to be working fine but I am worried about undefined behaviour. The site is a bit poorly coded and some of the elements I'm grabbing share the same id with another element. I'd imagine that getElementById traverses the DOM from top to bottom and the reason I'm not having an issue is because the element I need is the first instance with the id. Is there any way to ensure this behaviour? The element has no other real way of distinguishing it so selecting it by id seems to be the best option. I have included a stripped back example of the code I'm using below.
Thanks.
<?php
$curl_referer = "http://example.com/";
$curl_url = "http://example.com/content.php";
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, 'Scraper/0.9');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
curl_setopt($ch, CURLOPT_REFERER, "$curl_referer");
curl_setopt($ch, CURLOPT_URL, "$curl_url");
$output = curl_exec($ch);
$dom = new DOMDocument();
#$dom->loadHTML($output);
$content = $dom->getElementById('content');
echo $content->nodeValue;
?>
Try using XPath expression to get the first containing id.
Like that: //*[#id="content"][1]
The PHP code will be like that:
$xpath = new DOMXPath($dom);
$xpath->query('//*[#id="content"][1]')->item(0)->nodeValue;
And an tip: use libxml_use_internal_errors(true), you can catch they latter for logging or try tidying-up the document.
Edit
Hey, in your code you're setting the UA as "Scraper/0.9", most people that write a bad website doesn't look at that and doesn't do logging incoming requests in their pages, but, i don't recommend to put UA like that, just put an browser UA, like chrome's user-agent because if they're monitoring and see requests that contains this user-agent, they will be blacklist you (future).
Is it possible to get the information displayed in the page link given below using PHP. I want all the text content displayed on the page to be copied to a variable or to a file.
http://www.ncbi.nlm.nih.gov/nuccore/24655740?report=fasta&format=text
I have tried cURL too, but it didn't work. Where as cURL worked with a few other sites I know. But even if solutions with cURL are there do post. I might have tried various methods in which cURL can be used.
Use cURL to get the page content and then parse it - extract the <pre> section.
$ch = curl_init();
// Set query data here with the URL
curl_setopt($ch, CURLOPT_URL, 'val=24655740&db=nuccore&dopt=fasta&extrafeat=0&fmt_mask=0&maxplex=1&sendto=t&withmarkup=on&log$=seqview&maxdownloadsize=1000000');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_TIMEOUT, '3');
$content = trim(curl_exec($ch));
curl_close($ch);
// show ALL the content
print $content;
$start_index = strpos($content, '<pre>')+5;
$end_index = strpos($content, '</pre>');
$your_text = substr($content, $start_index, $end_index-$start_index);
UPDATE
Using the link from #ovitinho's answer - it now works :)
You need to request the url used by form to show this result via javascript.
I founded this final url
http://www.ncbi.nlm.nih.gov/sviewer/viewer.fcgi?val=24655740&db=nuccore&dopt=fasta&extrafeat=0&fmt_mask=0&maxplex=1&sendto=t&withmarkup=on&log$=seqview&maxdownloadsize=1000000
Pay attention to use 24655740 from your first link in this request.
You can use cURL.
I have just started a project that involves me sending data using POST in HTML forms to a another companies server. This returns XML. I need to process this XML to display certain information on a web page.
I am using PHP and have no idea where to start with how to access the XML. Once I knwo how to get at it I know how to access it using XPath.
Any tips of how to get started or links to sites with information on this would be very useful.
You should check out the DOMDocument() class, it comes as part of the standard PHP installation on most systems.
http://us3.php.net/manual/en/class.domdocument.php
ohhh, I see. You should set up a PHP script that the user form posts to. If you want to process the XML response you should then pass those fields on to the remote server using cURL.
http://us.php.net/manual/en/book.curl.php
A simple example would be something like:
<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://the_remote_server");
curl_setopt($ch, CURLOPT_POST, TRUE);
curl_setopt($ch, CURLOPT_POSTFIELDS, $_POST);
$YourXMLResponse = curl_exec($ch);
?>
I was wondering how I could download a webpage in php for parsing?
You can use something like this
$homepage = file_get_contents('http://www.example.com/');
echo $homepage;
Since you will likely want to parse the page with DOM, you can load the page directly with:
$dom = new DOMDocument;
$dom->load('http://www.example.com');
when your PHP has allow_url_fopen enabled.
But basically, any function that supports HTTP stream wrappers can be used to download a page.
With the curl library.
Just to add another option because it is there, while not the best is just to use file. Its another option that I dont see anyone has listed here.
$array = file("http://www.stackoverflow.com");
Its nice if you want it in an array of lines, whereas the already mentioned file_get_contents will put it in a string.
Just another thing you can do.
Then you can loop thru each line if this matches your goal to do so:
foreach($array as $line){
echo $line;
// do other stuff here
}
This comes in handy sometimes when certain APIs spit out plain text or html with a new entry on each line.
You can use this code
$url = 'your url';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
$data = curl_exec ($ch);
curl_close ($ch);
// you can do something with $data like explode(); or a preg match regex to get the exact information you need
//$data = strip_tags($data);
echo $data;