Hi I am trying to identify key Twitter influencers for a client and I have a list of 170 twitter id's that I need to learn more about.
I would like the script to loop through the list of Twitter Id's and save the output to a single XML file -
http://twitter.com/users/show/mattmuller.xml
http://twitter.com/users/show/welovecrowds.xml
http://twitter.com/users/show/jlyon.xml
etc
In essence I need to write a script that passes each url and saves the output as a single xml file on the server. Any ideas on how to do this with PHP and do I need to use Curl?
Thanks for any help.
Cheers
Jonathan
This is a simple example of how could you achieve this using cURL:
// array of twitter accounts
$ids = array('mattmuller', 'welovecrowds', 'jlyon' /* (...) */);
$ch = curl_init();
$url = 'http://twitter.com/users/show/';
$xml = '<?xml version ="1.0" encoding="utf-8"?>';
// make curl return the contents instead of outputting them
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
foreach($ids as $id) {
// set the url base on the account id
curl_setopt($ch, CURLOPT_URL, "$url$id.xml");
// fetch the url contents and remove the xml heading
$xml .= preg_replace ('/\<\?xml .*\?\>/i', '', curl_exec($ch));
}
// save the contents of $xml into a file
file_put_contents('users.xml', $xml);
Related
I'm making a website where I'd like the user to be able to start typing in a band name (for example, "Rad") and have Discogs API display 10 most similar suggestions to them (for example, "Radical Face", "Radiohead", etc). These suggestions could be sorted either alphabetically or, ideally, by popularity.
The problem is that I don't know how to make such a request to the Discogs API. Here's the code I'm working with now, which retrieves the content of http://api.discogs.com/releases/1 and parses it.
Any insight would be appreciated. Thank you.
<?php
$url = "http://api.discogs.com/releases/1"; // add the resource info to the url. Ex. releases/1
//initialize the session
$ch = curl_init();
//Set the User-Agent Identifier
curl_setopt($ch, CURLOPT_USERAGENT, 'SiteName/0.1 +http://your-site-here.com');
//Set the URL of the page or file to download.
curl_setopt($ch, CURLOPT_URL, $url);
//Ask cURL to return the contents in a variable instead of simply echoing them
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
//Execute the curl session
$output = curl_exec($ch);
//close the session
curl_close ($ch);
function textParser($text, $css_block_name){
$end_pattern = '], "';
switch($css_block_name){
# Add your pattern here to grab any specific block of text
case 'description';
$end_pattern = '", "';
break;
}
# Name of the block to find
$needle = "\"{$css_block_name}\":";
# Find start position to grab text
$start_position = stripos($text, $needle) + strlen($needle);
$text_portion = substr($text, $start_position, stripos($text, $end_pattern, $start_position) - $start_position + 1);
$text_portion = str_ireplace("[", "", $text_portion);
$text_portion = str_ireplace("]", "", $text_portion);
return $text_portion;
}
$blockStyle = textParser($output, 'styles');
echo $blockStyle. '<br/>';
$blockDescription = textParser($output, 'description');
echo $blockDescription. '<br/>';
?>
With the discogs API you can easily execute a search. I think you have already viewed the documentation: https://www.discogs.com/developers/#page:database,header:database-search
There you can even say that you only want to search for artists. When you retrieve the results you must either sort them alphabetically by yourself or must relay on the order of the results. I think that order is already some kind of popularity by discogs as far as I can see from the documentation. And it is the same implementation as in the website integrated search.
You should keep in mind that the result set can be very large. So sorting by alphabet wouldn't be the best idea as you have to retrieve all result pages. Here you should increase the page_size parameter to the maximum of 100 items per page.
Wasn't sure what to call this, so I will quickly elaborate.
I have a screen scraper I am trying to build, using the YQL console. The query provides the user with a choice of XML or JSON. I am targeting the YQL>data>html aspect of the console, and chose XML as my output format.
My YQL Query:
SELECT * FROM html WHERE url="http://google.com"
This will provide you with a readout of the Google.com document tree in XML. Too much output to paste into this post, so just click the link.
My problem comes with traversing the XML Tree with PHP to properly display the output from this request. I dont know how to effectively create a foreach statement (or any other statement) to effectively scrape the XML output and collect the Document tree and re-display it for my own needs.
My PHP:
$searchUrl = "google.com";
if(isset($_REQUEST['searchUrl'])) {
$searchUrl = $_REQUEST['searchUrl'];
}
$query = "select * from html where url=\"http://".$searchUrl."\"";
$url = "http://query.yahooapis.com/v1/public/yql";
// Get Subcategory Article Data
$parameterData = "q=".urlencode($query);
$parameterData .= "&diagnostics=true";
// setup CURL
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $parameterData);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_TIMEOUT, 20);
// send
$response = trim(urldecode(curl_exec($ch)));
// parse response
$xmlObjects = #simplexml_load_string($response);
foreach ($xmlObjects->diagnostics as $diagnostics) {
echo "<a href=".$diagnostics->url." target='_blank'>".$diagnostics->url."</a>";
}
foreach ($xmlObjects->results as $result) {
// here is where I would go echo $result->body or something along those lines
}
I suppose I am a bit stumped at this point due to my lack of knowledge to know where to turn next to navigate an XML tree with this type of format. After query>results>body in the XML I am unsure where to turn to collect the remaining objects, and output it into my document in a pre tag or something of that nature.
I would like to provide an input field for users to enter their own domain, and my PHP will submit the query, iterate over the response, and return the Document tree to the user for HTML viewing and debugging.
I am familiar with PHP and XML in the context of iterating a large number of parent elements with the same internal structure like an RSS feed or something of that nature. In this case I am dealing with a dynamic XML tree, with one large response object, and a fluctuating internal structure.
The following code will display the result body as html page:
<?php
// ... the code you posted in the question
// !without the diagnostics output!
// read comments of the answer to know why
?>
<html>
<head>
</head>
<?php
foreach ($xmlObjects->results as $result) {
// asXml() will return the content of body as xml string
echo $result->body->asXml();
break;
}
?>
</html>
Note that as you won't get the <head> element of the page via YQL the output will in most cases look messy.
Is it bad practice or will it be slower if I use curl within a foreach loop?
I'm planning on having an autocomplete input field, and the query in the input would be sent to an API call.
I'm getting an id from a certain link (ie: http://api.linke1.com/names)
foreach($json as j){
$id = $j->id; //from http://api.linke1.com/names
$url = "https://api.site/{$id}/photos";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
$output = curl_exec($ch);
curl_close($ch);
$jsonDecode = json_decode($output);
$results = $jsonDecode->results;
foreach($results as $result)
{
$photoURL= $result->photo->url; //from https://api.site/{$id}/photos
}
}
So every time I type in a name, it will go into the foreach searching for an id from http://api.linke1.com/names, and then it will look for the photo url from the other link. I wanted to output a list of an array, so eventually i'll have a list of data to output showing information such as name, photo, etc...
Will this slow down dramatically because each letter typed in the input field it will run through this foreach loop. Would there be an easier way?
Thanks!
Initialize the curl and the things that doesn't change before the loop and close it afterwards.
That will speed up the thing a little bit.
and you can use curl_multi_*, which can fetch several URLs in parallel.
http://se2.php.net/manual/en/ref.curl.php
i need to know if is possible to load an html page and submit the form inside this page using php. so something like:
<?php
$html = fopen("http://www.mysite.com","r");
//get position of form...
...
//submit it
?>
is possible? can someone help me? thanks!!!
EDIT:
i have to submit this form
https://annunci.ebay.it/pubblica-annuncio
my problem is that in this page there is an image upload and i don't know how to do that using php( scraping it )
You can also use curl to POST to any URL, for instance the form's action url.
$ch = curl_init('http://example.com/form_action.php');
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, array('your' => 'data', 'goes' => 'here');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$response = curl_exec($ch);
This will call the URL http://example.com/form_action.php as if it was called from a form with 'your' set to value 'data' and 'goes' set to value 'here'.
To find out the URL you need to POST, you can inspect source code. When doing that, check the "name" atribute on the <input> tags you want to send.
EDIT: If the POST url and the fields can change, you should check #Adan's answer.
Basically this is what you need to do
1- Get the content of the HTML page using file_get_contents() (bearing in mind the security risks)
2- Parse the HTML using DOMDocument
3- Get the form's attributes, most importantly (ACTION, METHOD) using DOMDocument
4- Get the form's fields' names using DOMDocument
5- Then send to the ACTION url using the method METHOD a request with the data you want replacing the fields using cURL
you can use curl for getting page in php. as mentioned in answer #Lumbendil. For parsing the HTML you can use libraries like
http://simplehtmldom.sourceforge.net/
Or you can use
http://code.google.com/p/phpquery/
As another option, which would be more clean, you could use the eBay API. It provides methods to add new items, and it probably has already built libraries for php, such as the PHP Accelerator toolkit for eBay.
I am providing a code that I got from net to get the contents of a page. After that you can use jquery(maybe) to force the submit function.
$url = "URL OF YOUR PAGE"; // I have tested page from same server
$lines = file( $url );
foreach( $lines as $line_num => $line ) {
$line = htmlspecialchars( $line );
$line = str_replace( "<", '<span><', $line );
$line = str_replace( ">", '></span>', $line );
$line = str_replace( "<!–", '<em><!–', $line );
$line = str_replace( "–>", '–></em>', $line );
echo "<span class=\"linenumber\">Line <strong>$line_num </strong></span> : " . $line . "<br/>\n";
}
The above code gave me contents from another page on same server. Now you have to find a way around to check if a form exist and then ; force submit that form.
How can I query a particular website with some fields and get the results to my webpage using php?
let say website xyz.com will give you the name of the city if you give them the zipcode. How can I acehive this easliy in php? any code snap shot will be great.
If I understand what you mean (You want to submit a query to a site and get the result back for processing and such?), you can use cURL.
Here is an example:
<?php
// create curl resource
$ch = curl_init();
// set url
curl_setopt($ch, CURLOPT_URL, "example.com");
//return the transfer as a string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// $output contains the output string
$output = curl_exec($ch);
// close curl resource to free up system resources
curl_close($ch);
?>
You can grab the Lat/Long from this site with some regexp like this:
if ( preg_match_all( "#<td>\s+-?(\d+\.\d+)\s+</td>#", $output, $coords ) ) {
list( $lat, $long ) = $coords[1];
echo "Latitude: $lat\nLongitude: $long\n";
}
Just put that after the curl_close() function.
That will return something like this (numbers changed):
Latitude: 53.5100
Longitude: 60.2200
You can use file_get_contents (and other similar fopen-class functions) to do this:
$result = file_get_contents("http://other-site.com/query?variable=value");
Do you mean something like:
include 'http://www.google.com?q=myquery'; ? or which fields do you want to get?
Can you be a bit more specific pls :)
If you want to import the html to your page and analyze it, you probably want to use cURL.
You have to have the extensions loaded to your page (it's usually part of PHP _ I think it has to be compiled in? The manual can answer that)
Here is a curl function. Set up your url like
$param='fribby';
$param2='snips';
$url="www.example.com?data=$param&data2=$param2";
function curl_page($url)
{
$response =false;
$ch = curl_init($url);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch,CURLOPT_FAILONERROR,true);
curl_setopt($ch,CURLOPT_FOLLOWLOCATION,1);
curl_setopt($ch,CURLOPT_TIMEOUT,30);
$response = curl_exec($ch);
curl_close($ch);
return $response;
}
$page_data=curl_page($url);
Then, you can get data out of the page using the DOM parsing or grep/sed/awk type stuff.