Web Scraping with PHP Goutte - php

I want to get all the item name and price from this website
For example, i want to search for "apple"
https://redmart.com/search/apple
I use Goutte for scraping the website. This is the code so far to get all item's name in the list:
$client = new Client();
$crawler = $client->request('GET', 'https://redmart.com/search/apple');
$crawler->filter('h4 > a')->each(function ($node) {
print $node->text()."\n";
});
but when i run the code, it prints nothing. How to get all the item's name and price from the list?

The redmart.com website is using react js to generate the content. You cannot use a website scraper like Goutte. Instead, try using the developer console in Firefox or Google Chrome and see what's going on.
In this case, a url is requested (via ajax) that returns JSON format and is rendered by react: https://api.redmart.com/v1.6.0/catalog/search?q=apple&pageSize=18&sort=1024&variation=BETA
With PHP, you just use json_decode on the response and you have everything you need.

Not need to scrap the web, you can just request on website rest API and use the poutput JSON, for example this is API for apple listing:
https://api.redmart.com/v1.6.0/catalog/search?q=apple&pageSize=18&sort=1024&page=1&variation=BETA

Related

What is correct Twilio AddOns syntax for php REST client?

Beginner to both Twilio & php here:
I have: Twilio php helper, Twilio account, Whitepages Pro AddOn enabled for Lookups and have successfully retrieved "basic" lookup data, ie, "Carrier->Type" (the "basic" lookup does not use the AddOn)
I need: to use Twilio Rest Client with Whitepages Pro AddOn to retrieve other data, ie, "standard_address_line1", for an individual phone number. I do not want the $0.07 per call AddOn enabled for all incoming calls, although I was able to receive this data from the AddOn that way.
Twilio API Documentation is scant. Shows output format, but not REST Client request syntax: WhitePagesPro AddOn Documentation
Here is what I tried:
<?php
require 'vendor/autoload.php';
use Twilio\Rest\Client;
$client = new Client(ACxxxxxxxxxx,Tokenxxxxxxxx);
$number = $client->lookups
->phoneNumbers("+1xxxxxxxxxx")
->fetch(
array("AddOns" => "whitepages_pro_caller_id")
);
echo $number->
results->
whitepages_pro_caller_id->
result->
results[0]->
associated_locations[0]->
standard_address_line1;
//This syntax works for 'basic' lookup
//Returns: "landline"
//
//$number = $client->lookups
// ->phoneNumbers("+1xxxxxxxxxx")
// ->fetch(
// array("type" => "carrier")
// );
//
//echo $number->carrier['type'];
?>
Throws error: Uncaught exception 'Twilio\Exceptions\TwilioException' with message 'Unknown property: results'
I'm way over my head, I don't know how to go about debugging this. Any Twilio experts?
Ideally I'd also like to know if it's possible to specify this particular data in the request vs traversing many levels of the response in order to get the data I need...
Twilio developer evangelist here.
You're very close with what you have there. The addOns results are actually returned in the addOns property of the number there. So, using your code, you can print the request SID of the call like this:
<?php
require 'vendor/autoload.php';
use Twilio\Rest\Client;
$client = new Client(ACxxxxxxxxxx,Tokenxxxxxxxx);
$number = $client->lookups
->phoneNumbers("+1xxxxxxxxxx")
->fetch(
array("AddOns" => "whitepages_pro_caller_id")
);
echo $number->addOns['results']['whitepages_pro_caller_id']['request_sid']
If you want to inspect the entire result, you can use var_dump to see the entire structure
var_dump($number->addOns['results']['whitepages_pro_caller_id'])
The structure will appear as it does in the documentation but it might be easier to see in the PHP output.
Let me know if that helps at all.

How to access youtube json date from the youtube google api?

How can I access thing like the url to a banner image, the channel title, the subscriber count, and the default logo image url from youtube google's api's for Youtube?
An example of the JSON api can be found here.
How can I access this contents using PHP?
proceed in this way:
$youtube = file_get_contents("https://www.googleapis.com/youtube/v3/channels?part=snippet,brandingSettings&id=UCyoUx3RguJRgbaMo07yc_KA&key=AIzaSyCZonTWlCv92Nd93j5CuFFcqGciLIe5rx4");
$data = json_decode($youtube,true);
echo "BANNER IMAGE URL: ".$data['items'][0]['brandingSettings']['image']['bannerImageUrl']."<br>";
echo "CHANNEL TITLE: ".$data['items'][0]['brandingSettings']['channel']['title']."<br>";
and so on....
this tools its very good to view the structure of a json and extract what you need
This simple snippet should do the trick.
$myData = json_decode(file_get_contents("https://www.googleapis.com/youtube/v3/channels?part=snippet,brandingSettings&id=UCyoUx3RguJRgbaMo07yc_KA&key=AIzaSyCZonTWlCv92Nd93j5CuFFcqGciLIe5rx4"));
var_dump($myData);
I would really recommend using cURL instead of file_get_contents() for performance reasons, however that should get you started.

How to get a Youtube channel RSS feed after 2015 April 20 (without v3 API)?

Now that API v2 is gone, what would be a way to get a simple RSS feed of a channel, without v3 API? I'm open to Yahoo Pipes or any workaround that is simpler than creating an application for v3 API if the target is a feed reader. I only need an RSS feed. It was available publicly until now and it can cease any minute now (I think). So why not let access to it without an API key anymore.
At RSS Reader section https://support.google.com/youtube/answer/6098135?hl=en there is an option to export to an OPML file your subscriptions. Then, looking at the contents of the OPML you can extract the feeds, and the structure of each feed is:
https://www.youtube.com/feeds/videos.xml?channel_id=XXXX
So you could generate new feeds from this structure if you know the channel id. This kind of feeds are not getting the "https://youtube.com/devicesupport" error, so I expect they are going to keep working.
You can get the feeds like this:
https://www.youtube.com/feeds/videos.xml?channel_id=CHANNELID
https://www.youtube.com/feeds/videos.xml?user=USERNAME
https://www.youtube.com/feeds/videos.xml?playlist_id=YOUR_YOUTUBE_PLAYLIST_NUMBER
But the JSON format which used to be supported (with additional parameter &alt=JSON) is not supported anymore.
Additionally you can request for API key for public access to your YouTube videos from your developer console and get YouTube Videos, Playlists in JSON format like this:
- Get Channels:
https://www.googleapis.com/youtube/v3/channels?part=snippet%2CcontentDetails&forUsername={YOUR_USER_NAME}&key={YOUR_API_KEY}
- Get Playlists:
https://www.googleapis.com/youtube/v3/playlists?part=snippet%2CcontentDetails&channelId={YOUR_CHANNEL_ID}&key={YOUR_API_KEY}
- Get Playlist Videos:
https://www.googleapis.com/youtube/v3/playlistItems?part=snippet%2CcontentDetails%2Cstatus&playlistId={YOUR_PLAYLIST_ID}&key={YOUR_API_KEY}
More information from YouTube v3 docs.
in you tube, click on the subscriptions on the left hand pane. This will open up all your subscriptions in the center of the page. Scroll down and you'll find a Export to RSS reader button which produces an xml file of all your subscriptions . I've done this and added it to my prefered rss reader feedly.
If you inspect any Youtube channel page, inside the <head> you will find an rss meta node like this:
<link rel="alternate"
type="application/rss+xml" title="RSS"
href="https://www.youtube.com/feeds/videos.xml?channel_id=UCn8zNIfYAQNdrFRrr8oibKw">
This should provide you with the data you need.
Get the channel id by searching for the attribute data-channel-external-id in the source code of the YouTube channel page. (thanks to helq).
This code will grab all video titles and ids from the feed and dump it into an array:
$channel_id = 'XXX'; // put the channel id here
$youtube = file_get_contents('https://www.youtube.com/feeds/videos.xml?channel_id='.$channel_id);
$xml = simplexml_load_string($youtube, "SimpleXMLElement", LIBXML_NOCDATA);
$json = json_encode($xml);
$youtube = json_decode($json, true);
$yt_vids = array();
$count = 0;
foreach ($youtube['entry'] as $k => $v) {
$yt_vids[$count]['id'] = str_replace('http://www.youtube.com/watch?v=', '', $v['link']['#attributes']['href']);
$yt_vids[$count]['title'] = $v['title'];
$count++;
}
print_r($yt_vids);
I've created a small PHP script that scrapes a Youtube URL for video links, and then outputs them as an atom feed: https://gist.github.com/Skalman/801436d9693ff03bc4ce
URLs such as https://www.youtube.com/user/scishow/videos work.
Caveats:
The tool doesn't scrape dates
Playlists won't include more than 100 videos
Playlists include the "play all" link
Author is correctly set only for channels (e.g. not playlists)
Maybe Youtube will block you if you use this too much (but hopefully the limits are high enough)
Likely several more...
There also exist RSS-Bridge witch can extract RSS feeds from a lot of services like Twitter, Google+, Flickr, Youtube, Identi.ca, etc.
source: https://github.com/sebsauvage/rss-bridge
demo server: https://bridge.suumitsu.eu/
try using this URL:
https://www.youtube.com/feeds/videos.xml?user=USERNAME
Works fine for me.
From My Blog Post: http://tcodesblog.blogspot.com/search/label/howtofindyouryoutubechannelfeed
HOW TO FIND YOUR YOUTUBE CHANNEL FEED
In the old days, it was easy (2009) but now a days it is much harder to find it (2012-present). Here is a quick way to find your new feed from your YouTube Channel. Remember to follow the list correctly!
First find your channel id: You can do this by going to your YouTube Channel in the Dashboard
Copy the channel id: Your channel id can be found when visiting your YouTube Channel from within the Dashboard
Copy your channel id: Copy your channel id and replace channelidgoeshere below with your channel id: https://www.youtube.com/feeds/videos.xml?channel_id=channelidgoeshere
Copy your entire YouTube Channel Feed and create a simplified feed: You can do this by creating a shorter feed link in FeedBurner at http://www.feedburner.com/ (Requires a Google account. Free to use.), which is also part of Google. Create a new feed (select I'm A Podcaster! to see your videos appear in the feed and to make your feed compatible with other feed readers such as: Digg Reader, Apple iPhone Apple News App, Apple iPhone Podcasts App, Feedly, etc.) -OR- edit an existing one by copying your entire YouTube Channel Feed and then click Save Feed Details as normal
Your YouTube Channel Feed now works and your videos can be seen in a feed file directly on your FeedBurner feed. Mine is at YouTube as a feed at https://www.youtube.com/feeds/videos.xml?channel_id=UCvFR6YxwnYfLt_QqRFk_r3g & at FeedBurner as http://feeds.feedburner.com/youtube/warrenwoodhouse with my videos that appear only as text format, as an example, since I need to update mine to show my videos. You can change different settings in FeedBurner and do other things so it's worth a try since it's free and easy to use. I highly recommend using FeedBurner or another feed creation service, however, FeedBurner is your best bet since it also includes cross-feed subscription service mechanism (USM - Universal Subscription Mechanism), which means your feed can be read from any compatible device such as a computer, mobile phone (with the correct app installed), via an older web browser (such as Internet Explorer which supports Web Slices & RSS/Atom/XML Feeds).
Your feed can also be opened up in Apple iPhone Apple News App & Apple iPhone Podcasts App on your Apple iPhone, Apple iPod Touch and Apple iPad if you've set the settings correctly to USM (Universal Subscription Mechanism). Once this is in effect, your feed can be viewed through different services and devices.
Your feed on FeedBurner allows you to create an Email Subscription, Headline Animator (which shows you how a link to the latest post) along with how many subscribers, Chiclets and other cool stuff.
I hope this answer proves useful and if you want to see some more cool awesome coding practices by me, please feel free to check out my T-Codes website at http://warrenwoodhouse.webs.com/codes for lots more stuff.
I have created an example Yahoo Pipes here.
http://pipes.yahoo.com/pipes/pipe.info?_id=6eeff0110a81f2ab94e8472620770b11
You can run this pipe by pressing "Run Pipe" without API Key filled. But you must provide your own API Key and channel id (which can be obtained via channels API) when cloned. Wanted to automate fetching channelId by YouTube username but not easy to pipe.
I've made a batch script that creates an RSS feed of your new subscription videos. You don't need an API key. The script uses 2 external tools: YouTube-DL and Xidel.
Anyway, read the following thread, and go to post 98 to download the script:
http://code.google.com/p/gdata-issues/issues/detail?id=3946#c98
I hope someone codes this to php, python, javascript, powershell or bash.
I think there are some changes in youtube response so i make some changes to get channel id from rss feed using Curl.
$channel_id = 'XXXXXXXX'; // put the channel id here
//using curl
$url = 'https://www.youtube.com/feeds/videos.xml?channel_id='.$channel_id.'&orderby=published';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
//curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'GET');
$response = curl_exec($ch);
curl_close($ch);
$response=simplexml_load_string($response);
$json = json_encode($response);
$youtube= json_decode($json, true);
$count = 0;
if(isset($youtube['entry']['0']) && $youtube['entry']['0']!=array())
{
foreach ($youtube['entry'] as $k => $v) {
$yt_vids[$count]['id'] = str_replace('http://www.youtube.com/watch?v=', '', $v['link']['#attributes']['href']);
$yt_vids[$count]['title'] = $v['title'];
$count++;
}
}
else
{
$yt_vids[$count]['id']=str_replace('http://www.youtube.com/watch?v=', '', $youtube['entry']['link']['#attributes']['href']);
$yt_vids[$count]['title']=$youtube['title'];
}
echo "<pre>";
print_r($yt_vids);
I used the below code to integrate Youtube Feed with wordpress custom field "ACF plugin" & FancyBox
<?php
$channel_id = get_field('youtube_chanel_id'); //ACF text field
if ($channel_id){ // if channel_id not empty -- START
$youtube = file_get_contents('https://www.youtube.com/feeds/videos.xml?channel_id='.$channel_id);
$xml = simplexml_load_string($youtube, "SimpleXMLElement", LIBXML_NOCDATA);
$json = json_encode($xml);
$youtube = json_decode($json, true);
echo'<div class="col-md-12 youtube-videos-feed">';
foreach ($youtube['entry'] as $k => $v) {
$id = str_replace(array("yt:video:"), "", $v['id']); // Remove "yt:video:" from ID value
//$date = $v['updated']; // video updated date (disabled for now)
$title = $v['title']; // video title
echo '<a class="with-video" href="https://www.youtube.com/watch?v=',$id,'&autoplay=1&rel=0&controls=0&showinfo=0&modestbranding=0" data-fancybox="videos" data-caption="',$title,'" title="',$title,'" >
<div class="col-md-3 main-image post-image img-fancy">
<img src="https://img.youtube.com/vi/',$id,'/0.jpg" alt="',$title,'" >
</div>
</a>';
}
echo'</div>';
} // if channel_id not empty -- END
?>
I found a Chrome extension named Youtube RSS-ify that injects an RSS icon on video, channel and navigation pages. It was just what I was looking for.
Icons look like this:
I would suggest using an excellent rss parser. Many of them are available, but you can try http://simplepie.org/, one of the best I used for my personal projects.
Its pretty well documented with some examples.
Usage example
Note:Used YouTube channel college humor, you can get it from the channel page itself
<?php
include_once('../autoloader.php');
// Parse it
$feed = new SimplePie();
$feed->set_feed_url('https://www.youtube.com/feeds/videos.xml?channel_id=UCPDXXXJj9nax0fr0Wfc048g');
$feed->enable_cache(false);
$feed->init();
$items = $feed->get_items();
foreach ($items as $item)
{
echo $item->get_title() . "\n";
}
var_dump($feed->get_item_quantity());
Easiest way to get the channel id:
Open Subscription Manager (left panel, down below subscriptions) and click on the desired user.
The url will be in the form:
https://www.youtube.com/channel/XXXXXXXXXXXXXXXXX
So the feed url should be:
https://www.youtube.com/feeds/videos.xml?channel_id=XXXXXXXXXXXXXXXXX
Note: Better use channel ids rather than user names because user names may change.

Rendering SoundCloud widget for a private track using PHP API

I am trying to render a SoundCloud HTML5 widget using the PHP API, but every time I run the command I think should return the HTML for the widget, I simply get an Exception:
The requested URL responded with HTTP code 302
I realise this is a redirect. What I don't know is why that's all I ever get, or what to do about it to actually get the widget HTML.
The documentation on the API says that to embed the widget using PHP you should do this:
<?php
require_once 'Services/Soundcloud.php';
// create a client object with your app credentials
$client = new Services_Soundcloud('YOUR_CLIENT_ID', 'YOUR_CLIENT_SECRET');
// get a tracks oembed data
$track_url = 'http://soundcloud.com/forss/flickermood';
$embed_info = $client->get('/oembed', array('url' => $track_url));
// render the html for the player widget
print $embed_info['html'];
I'm running this:
// NB: Fully authorised SoundCloud API instance all working prior to this line
// $this->api refers to an authorised instance of Services_Soundcloud
try {
$widget = array_pop(
json_decode( $this->api->get('oembed', array('url' => $track_url)) )
);
print_r($widget);
} catch (Exception $e)
{
print_r($e->getMessage());
}
where "track_url" is actually the URL I get back when asking SoundCloud for a track object earlier in the app using the same API.
I'm not actually sure this URL is correct in the first place, because the track object I get back gives the 'uri' in the form:
[uri] => https://api.soundcloud.com/tracks/62556508
The documentation examples all have a straight http://soundcloud.com/username/track-permalink URL - but even using a known path to a public track the attempt to run the API oembed method fails... I still get a 302 Exception.
Finally, there are mentions of setting "allow_redirects" to false in the 'get' command, but this has no effect when I add to the parameters used to build the query to the API. I also tried adding additional cURL options, but that too had no effect.
I have definitely enabled API access to the track within SoundCloud.
Kind of banging my head off the wall on this. If anyone has any pointers I'd be very grateful to hear them. Just for clarity's sake, I am able to access all the user data, comments etc. via the API instance I have created, so it appears to be working fine.
Thanks for pointing this out. There was a bug in the documentation that lead you astray. Sorry about that. I've updated the docs to fix the bug. Here's the updated code sample:
<?php
require_once 'Services/Soundcloud.php';
// create a client object with your app credentials
$client = new Services_Soundcloud('YOUR_CLIENT_ID', 'YOUR_CLIENT_SECRET');
$client->setCurlOptions(array(CURLOPT_FOLLOWLOCATION => 1));
// get a tracks oembed data
$track_url = 'http://soundcloud.com/forss/flickermood';
$embed_info = json_decode($client->get('oembed', array('url' => $track_url)));
// render the html for the player widget
print $embed_info->html;
Note the differences:
You need to set CURLOPT_FOLLOWLOCATION to 1 as mentioned in the comments above.
You need to wrap the return from $client->get in json_decode
The result is an stdClass object, not an Array and so the html property has to be accessed using the -> operator.
Hope that helps. Feel free to comment in case you're still having problems and I'll amend my answer.

trying to fetch user tweet stream in php

I am using the following code, but it showing a 404 error
$url = "http://api.twitter.com/version/statuses/user_timeline.json";
$call = file_get_contents($url);
There's no 'version' version. The Twitter API is currently version 1, so you need http://api.twitter.com/1/statuses/user_timeline.json.
Do note that Twitter can't read your mind, so you'll need to tell Twitter which user's timeline you want to fetch... i.e. http://api.twitter.com/1/statuses/user_timeline.json?screen_name=ceejayoz

Categories