How can I query a particular website with some fields and get the results to my webpage using php?
let say website xyz.com will give you the name of the city if you give them the zipcode. How can I acehive this easliy in php? any code snap shot will be great.
If I understand what you mean (You want to submit a query to a site and get the result back for processing and such?), you can use cURL.
Here is an example:
<?php
// create curl resource
$ch = curl_init();
// set url
curl_setopt($ch, CURLOPT_URL, "example.com");
//return the transfer as a string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// $output contains the output string
$output = curl_exec($ch);
// close curl resource to free up system resources
curl_close($ch);
?>
You can grab the Lat/Long from this site with some regexp like this:
if ( preg_match_all( "#<td>\s+-?(\d+\.\d+)\s+</td>#", $output, $coords ) ) {
list( $lat, $long ) = $coords[1];
echo "Latitude: $lat\nLongitude: $long\n";
}
Just put that after the curl_close() function.
That will return something like this (numbers changed):
Latitude: 53.5100
Longitude: 60.2200
You can use file_get_contents (and other similar fopen-class functions) to do this:
$result = file_get_contents("http://other-site.com/query?variable=value");
Do you mean something like:
include 'http://www.google.com?q=myquery'; ? or which fields do you want to get?
Can you be a bit more specific pls :)
If you want to import the html to your page and analyze it, you probably want to use cURL.
You have to have the extensions loaded to your page (it's usually part of PHP _ I think it has to be compiled in? The manual can answer that)
Here is a curl function. Set up your url like
$param='fribby';
$param2='snips';
$url="www.example.com?data=$param&data2=$param2";
function curl_page($url)
{
$response =false;
$ch = curl_init($url);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch,CURLOPT_FAILONERROR,true);
curl_setopt($ch,CURLOPT_FOLLOWLOCATION,1);
curl_setopt($ch,CURLOPT_TIMEOUT,30);
$response = curl_exec($ch);
curl_close($ch);
return $response;
}
$page_data=curl_page($url);
Then, you can get data out of the page using the DOM parsing or grep/sed/awk type stuff.
Related
I have a search index on my Cloudant that I query using AngularJS and PHP.
So far I'm not getting specific enough results.
For instance, on a search with fair:'Fair 2017', I'm getting all the results that include Fair, including Fair 2016 and so on.
I've tried different search types (simple, standard, classic), and it happens with all of them.
A typical object:
doc:Object
exhibitortype:"Project Space"
fair:"Fair 2017"
...
Here's my AngularJS code:
$scope.loadexhibitors = function(fair){
$scope.searchindex = fair.doc.fairname;
var $promisefairexh=$http({
url: 'databaseconnect/getexhibitors.php',
method: "GET",
params: {search: $scope.searchindex}
});
...
The PHP bit looks like this:
<?php
$search = $_GET["search"];
$newsearch = str_replace(' ', '+', $search);
$url = "https://user:pass.#user.cloudant.com/db/_design/fairs/_search/by_fair?q='$newsearch'&include_docs=true";
$ch = curl_init(); // initialize curl handle
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_BASIC);
$output = curl_exec($ch);
$info = curl_getinfo($ch);
curl_close($ch);
?>
And my Cloudant search function:
function (doc) {
index("default", doc.fair);
}
On the other hand, on the Cloudant User Interface, when I test the search index and include double quotes on the search input (for example: "Fair 2016" instead of Fair 2016), I get the desired results.
Any tips?
Try using double quotes in your search instead of single quotes, for example:
$url = "https://user:pass.#user.cloudant.com/db/_design/fairs/_search/by_fair?q=\"$newsearch\"&include_docs=true";
Note the change to the q param:
q=\"$newsearch\"
I'm making a website where I'd like the user to be able to start typing in a band name (for example, "Rad") and have Discogs API display 10 most similar suggestions to them (for example, "Radical Face", "Radiohead", etc). These suggestions could be sorted either alphabetically or, ideally, by popularity.
The problem is that I don't know how to make such a request to the Discogs API. Here's the code I'm working with now, which retrieves the content of http://api.discogs.com/releases/1 and parses it.
Any insight would be appreciated. Thank you.
<?php
$url = "http://api.discogs.com/releases/1"; // add the resource info to the url. Ex. releases/1
//initialize the session
$ch = curl_init();
//Set the User-Agent Identifier
curl_setopt($ch, CURLOPT_USERAGENT, 'SiteName/0.1 +http://your-site-here.com');
//Set the URL of the page or file to download.
curl_setopt($ch, CURLOPT_URL, $url);
//Ask cURL to return the contents in a variable instead of simply echoing them
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
//Execute the curl session
$output = curl_exec($ch);
//close the session
curl_close ($ch);
function textParser($text, $css_block_name){
$end_pattern = '], "';
switch($css_block_name){
# Add your pattern here to grab any specific block of text
case 'description';
$end_pattern = '", "';
break;
}
# Name of the block to find
$needle = "\"{$css_block_name}\":";
# Find start position to grab text
$start_position = stripos($text, $needle) + strlen($needle);
$text_portion = substr($text, $start_position, stripos($text, $end_pattern, $start_position) - $start_position + 1);
$text_portion = str_ireplace("[", "", $text_portion);
$text_portion = str_ireplace("]", "", $text_portion);
return $text_portion;
}
$blockStyle = textParser($output, 'styles');
echo $blockStyle. '<br/>';
$blockDescription = textParser($output, 'description');
echo $blockDescription. '<br/>';
?>
With the discogs API you can easily execute a search. I think you have already viewed the documentation: https://www.discogs.com/developers/#page:database,header:database-search
There you can even say that you only want to search for artists. When you retrieve the results you must either sort them alphabetically by yourself or must relay on the order of the results. I think that order is already some kind of popularity by discogs as far as I can see from the documentation. And it is the same implementation as in the website integrated search.
You should keep in mind that the result set can be very large. So sorting by alphabet wouldn't be the best idea as you have to retrieve all result pages. Here you should increase the page_size parameter to the maximum of 100 items per page.
Hi Im attempting to crawl google search results, just for my own learning, but also to see can I speed up getting access to direct URLS (Im aware of their API but I just thought Id try this for now).
It was working fine but it seems to have stopped, its simply returning nothing now, Im unsure if its something I did, but I can say that I had this in a for loop to allow the start parameter to increase and Im wondering may that have caused problems.
Is it possible Google can block an IP from crawling?
Thanks..
$url = "https://www.google.ie/search?q=adrian+de+cleir&start=1&ie=utf-8&oe=utf-8&rls=org.mozilla:en-US:official&client=firefox-a&channel=fflb&gws_rd=cr&ei=D730U7KgGfDT7AbNpoBY#channel=fflb&q=adrian+de+cleir&rls=org.mozilla:en-US:official";
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$html = curl_exec($ch);
curl_close($ch);
# Create a DOM parser object
$dom = new DOMDocument();
# Parse the HTML from Google.
# The # before the method call suppresses any warnings that
# loadHTML might throw because of invalid HTML in the page.
#$dom->loadHTML($html);
# Iterate over all the <a> tags
foreach($dom->getElementsByTagName('h3') as $link) {
$actual_link = $link->getElementsbyTagName('a');
foreach ($actual_link as $single_link) {
# Show the <a href>
echo '<pre>';
print_r($single_link->getAttribute('href'));
echo '</pre>';
}
}
Given below is the program I have written in python. But it is not completed fully. Right now it only gets the first page and prints all the href links found on the result.
We can use sets and remove the redundant links from the result set.
import requests<br>
from bs4 import BeautifulSoup
def search_spider(max_pages, search_string):
page = 0
search_string = search_string.replace(' ','+')
while page <= max_pages:
url = 'https://www.google.com/search?num=10000&q=' + search_string + '#q=' + search_string + '&start=' + str(page)
print("URL to search - " + url)
source_code = requests.get(url)
count = 1
plain_text = source_code.text
soup = BeautifulSoup(plain_text)
for link in soup.findAll("a", {"class" : ""}):
href = link.get('href')
input_string = slice_string(href)
print(input_string)
count += 1
page += 10
def slice_string(input_string):
input_string = input_string.lstrip("/url?q=")
index_c = input_string.find('&')
input_string = input_string[:index_c]
return input_string
search_spider(1,"bangalore cabs")
This program will search for bangalore cabs in google.
Thanks,
Karan
You can check if Google blocked you by the following simple curl script command:
curl -sSLA Mozilla "http://www.google.com/search?q=linux" | html2text -width 80
You may install html2text in order to convert html into plain text.
Normally you should use Custom Search API provided by Google to avoid any limitations, so you could retrieve search results in easier way by having access to different formats (such as XML or JSON).
Is it bad practice or will it be slower if I use curl within a foreach loop?
I'm planning on having an autocomplete input field, and the query in the input would be sent to an API call.
I'm getting an id from a certain link (ie: http://api.linke1.com/names)
foreach($json as j){
$id = $j->id; //from http://api.linke1.com/names
$url = "https://api.site/{$id}/photos";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
$output = curl_exec($ch);
curl_close($ch);
$jsonDecode = json_decode($output);
$results = $jsonDecode->results;
foreach($results as $result)
{
$photoURL= $result->photo->url; //from https://api.site/{$id}/photos
}
}
So every time I type in a name, it will go into the foreach searching for an id from http://api.linke1.com/names, and then it will look for the photo url from the other link. I wanted to output a list of an array, so eventually i'll have a list of data to output showing information such as name, photo, etc...
Will this slow down dramatically because each letter typed in the input field it will run through this foreach loop. Would there be an easier way?
Thanks!
Initialize the curl and the things that doesn't change before the loop and close it afterwards.
That will speed up the thing a little bit.
and you can use curl_multi_*, which can fetch several URLs in parallel.
http://se2.php.net/manual/en/ref.curl.php
I've written a PHP function to get plus ones count for a URL
function makeApiCall($destinationUrl, $stringOfParams){
$curl = curl_init();
echo $destinationUrl.$stringOfParams."<br>";
curl_setopt($curl, CURLOPT_URL, $destinationUrl.$stringOfParams);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
$result = curl_exec($curl);
curl_close($curl);
echo $result;
}
While inputting https://plusone.google.com/u/0/_/+1/fastbutton as destination URL and inputting the correct string of params, the result I'm receiving in $result is HTML.
The problem is that I would like using PHP to get the count and not using JavaScript.
How can I do that?
Using preg_match, you can achieve it.
Assuming you are calling a url like that :
https://plusone.google.com/_/+1/fastbutton?bsv=pr&url=http://www.google.com
You are looking for:
<div id="aggregateCount" class="t1">118k</div>
or
<div id="aggregateCount" class="t1">12</div>
So you can perform:
preg_match('/\<div id=\"aggregateCount\" class=\"t1\"\>\>?([0-9]*k?)\<\/div\>/i', $result, $matches);
And $matches will be:
Array
(
[0] => <div id="aggregateCount" class="t1">118k</div>
[1] => 118k
)
edit:
After running the example, it seems that Google return a different number when using curl, for example, on http://www.google.com, it returns:
<div id="aggregateCount" class="t1">>9999</div>
So I've updated the regex to handle the >.