I'm all new to twig and I got involved in this project where I'm forced to figure a solution out. And that's why I had to ask this question here.
I want to display the videos on a youtube playlist which is easy enough when it comes to pure PHP and the use of (for example) simplexml_load_file($feed).
$xml = simplexml_load_file($feed);
foreach ($xml->entry as $key => $x) {
$title = $x->title;
$id = $x->children('yt', True)->videoId;
$uri = $x->author->uri;
$desc = $x->children('media', True)->group->description;
}
But since I know little to nothing about twig I can't really get an understanding how to do the similar thing. Or if it is even possible.
Related
My goal is to scrape search results with PHP Simple HTML DOM Parser
which is working fine for me. But after every one or two days, Google changes their HTML structure and my code stop working.
Here's my code that was working before:
include("simple_html_dom.php");
$data = file_get_contents('https://www.google.com/search?q=stackoverflow');
$html = str_get_html($data);
$i=0;
$linkObjs = $html->find('h3[class=r] a');
foreach ($linkObjs as $linkObj) {
$i++;
$url = trim($linkObj->href);
$trim = substr($url, 0, 7);
if ($trim=="/url?q=") {
$url = substr($url, 7);
}
$trim_2 = stripos($url, '&sa=U');
if ($trim_2 != false) {
$url = substr($url, 0, $trim_2);
}
echo "$i:".$url.'<br>';
}
They usually change class names and tag name along with HTML links structure
I had the same problem. Try
$linkObjs = $html->find('div[class=jfp3ef] a');
and it will work again.
I had a similar experience. When I search Google from the ordinary user interface, the URLs of the "hit" pages are still showing up in an A tag (of course) after a div class 'r'. But when I run my scraping program with the exact same search terms and parameters, the 'r' changes to 'kCrYT'. I changed that in my code and got the program working again. (Yay!)
But I suspect the class will change regularly when Google detects that someone is submitting the search programmatically. So this might not be a permanent solution.
Maybe I could add a little extra code that determines what class name is currently being used for this, so that my program could automatically adapt to these changes.
I've seen different versions of this question asked but nothing that specifically answered mine.
I'm trying to parse this Rss feed that pulls the results from a search on a Pet adoption site and turns it into a RSS/Atom feed.
<?php
//RSS solution
$feed = simplexml_load_file('http://www.serverstrategies.com/rss.php?sid=WV87&len=20&rand=0&drop_1=');
$children = $feed->children('http://www.w3.org/2005/Atom');
echo $children->entry[1]->item[0]->title;
?>
I've tried a lot of different variations of this but I've yet to get anything to print out.
Hope this solution will be helpful in getting all the titles and descriptions.
<?php
$feed = simplexml_load_file('http://www.serverstrategies.com/rss.php?sid=WV87&len=20&rand=0&drop_1=');
$result=array();
foreach($feed->channel->item as $node)
{
$result[]=array("title"=>(string)$node->title,
"description"=> strip_tags((string)$node->description));
}
print_r($result);
I have made a function that uses keywords like "Dell laptop x500" or something and it is trying to search for it. I did a hacky way of just adding it to the keywords search url, but it will give me different results compared to if i typed the text in the search box and pressed submit. Then it grabs the first results link back. Sometimes this works correctly and sometimes it does not.
function getAmazonLink($keywords){
$keywords = preg_replace("/[^a-z0-9_\s-]/", "%20", $keywords);
$link = "http://www.amazon.com/s/ref=nb_sb_noss_2?url=search-alias%3Daps&field-keywords=$keywords";
//return $link;
$content = getContents($link);
$doc = new DOMDocument();
$doc->loadHTML($content);
$as = $doc->getElementsByTagName('a');
foreach ( $as as $a){
if($a->parentNode->nodeName == 'h3'){
if($a->parentNode->getAttribute('class') == 'newaps'){
if($a->parentNode->parentNode->getAttribute('id') == 'result_0'){
return $a->getAttribute('href');
}
}
}
}
return $link;
Amazon, like many other online stores, will tailor the search results based on your account's purchase/search history. Since your webapp isn't using a logged in Amazon account, it is getting results which aren't tailored to anyone's account history. In the comments you asked if there is a way to "work around this", but there's nothing to work around -- it's giving you valid results, just not ones which are tailored to a specific person's Amazon account. This is the expected result, not a bug.
URL : http://www.sayuri.co.jp/used-cars
Example : http://www.sayuri.co.jp/used-cars/B37753-Toyota-Wish-japanese-used-cars
Hey guys , need some help with one of my personal projects , I've already wrote the code to fetch data from each single car url (example) and post on my site
Now i need to go through the main url : sayuri.co.jp/used-cars , and :
1) Make an array / list / nodes of all the urls for all the single cars in it , then run my internal code for each one to fetch data , then move on to the next one
I already have the code to save each url into a log file when completed (don't think it will be necessary if it goes link by link without starting from the top but will ensure no repetition.
2) When all links are done for the page , it should move to the next page and do the same thing until the end ( there are 5-6 pages max )
I've been stuck on this part since last night and would really appreciate any help . Thanks
My code to get data from the main url :
$content = file_get_contents('http://www.sayuri.co.jp/used-cars/');
// echo $content;
and
$dom = new DOMDocument;
$dom->loadHTML($content);
//echo $dom;
I'm guessing you already know this since you say you've gotten data from the car entries themselves, but a good point to start is by dissecting the page's DOM and seeing if there are any elements you can use to jump around quickly. Most browsers have page inspection tools to help with this.
In this case, <div id="content"> serves nicely. You'll note it contains a collection of tables with the required links and a <div> that contains the text telling us how many pages there are.
Disclaimer, but it's been years since I've done PHP and I have not tested this, so it is probably neither correct or optimal, but it should get you started. You'll need to tie the functions together (what's the fun in me doing it?) to achieve what you want, but these should grab the data required.
You'll be working with the DOM on each page, so a convenience to grab the DOMDocument:
function get_page_document($index) {
$content = file_get_contents("http://www.sayuri.co.jp/used-cars/page:{$index}");
$document = new DOMDocument;
$document->loadHTML($content);
return $document;
}
You need to know how many pages there are in total in order to iterate over them, so grab it:
function get_page_count($document) {
$content = $document->getElementById('content');
$count_div = $content->childNodes->item($content->childNodes->length - 4);
$count_text = $count_div->firstChild->textContent;
if (preg_match('/Page \d+ of (\d+)/', $count_text, $matches) === 1) {
return $matches[1];
}
return -1;
}
It's a bit ugly, but the links are available inside each <table> in the contents container. Rip 'em out and push them in an array. If you use the link itself as the key, there is no concern for duplicates as they'll just rewrite over the same key-value.
function get_page_links($document) {
$content = $document->getElementById('content');
$tables = $content->getElementsByTagName('table');
$links = array();
foreach ($tables as $table) {
if ($table->getAttribute('class') === 'itemlist-table') {
// table > tbody > tr > td > a
$link = $table->firstChild->firstChild->firstChild->firstChild->getAttribute('href');
// No duplicates because they just overwrite the same entry.
$links[$link] = "http://www.sayuri.co.jp{$link}";
}
}
return $links;
}
Perhaps also obvious, but these will break if this site changes their formatting. You'd be better off asking if they have a REST API or some such available for long term use, though I'm guessing you don't care as much if it's just a personal project for tinkering.
Hope it helps prod you in the right direction.
To add an array of $keywords to my ad group I am currently using the following code:
$adGroupCriterionService = $adwordsUser->GetService('AdGroupCriterionService', 'v201109');
$operations = array();
foreach ($keywords AS $keyword) {
$keywordobj = new Keyword();
$keywordobj->text = $keyword;
$keywordobj->matchType = 'BROAD';
$keywordAdGroupCriterion = new BiddableAdGroupCriterion();
$keywordAdGroupCriterion->adGroupId = $identifier;
$keywordAdGroupCriterion->criterion = $keywordobj;
$keywordAdGroupCriterionOperation = new AdGroupCriterionOperation();
$keywordAdGroupCriterionOperation->operand = $keywordAdGroupCriterion;
$keywordAdGroupCriterionOperation->operator = 'ADD';
$operations[] = $keywordAdGroupCriterionOperation;
}
$result = $adGroupCriterionService->mutate($operations);
This works fine. However, I've started to realise that doing such operations uses up API Units rather more quickly than I had anticipated. Is there a more API Unit friendly approach to doing this operation? Or is this simply the 'catch' with the Google Adwords API pricing?
Depending on how many keywords you're uploading at a time, you can use the MutateJobService; the coding is a little more complicated but you should save 50% of the unit cost.
If someone needs a quick code example, http://code.google.com/p/google-api-adwords-php/source/browse/trunk/examples/v201109/CampaignManagement/AddKeywordsInBulk.php shows how to use MutateJobService, it is much more simpler than the old BulkMutateJobService. Also, the original video from API workshop days is here, http://www.youtube.com/watch?v=CV_kOTW3ldQ, presentations here: https://sites.google.com/site/awapiworkshops/slides-and-links. Same links as JoeR posted, but linked to the original site this time.
For any AdWords API related questions, the official forum (http://groups.google.com/group/adwords-api) is the best place to ask questions. The group is very active, and Googlers from the API team regularly answer questions here.
Cheers,
Anash