Extracting prices from html using jQuery

Extracting prices from html using jQuery - php

I'm parsing an html document and I need to extract all the prices in it (the format is $99.00). So what I want to do is extract all the elements that contain the substring "Price" (or "price") in its class or id attribute. But I tried using something like $("[class*='Price']") or $("[id*='Price']") and then concatenate the results on an array but the jquery selector part is not working properly, is not finding anything. Am I doing something wrong, or is there a better way to do this? Any suggestions for a better approach?
Thank you.
UPDATE:I'm actually using a jQuery port called phpQuery for php.
UPDATE2: I don't know the exact class or id of the elements since this is a generic script that I will run on different e-commerce sites, so that's why I'm using the *= wildcard to get all elements (mostly a, div, span, etc, I don't need input). I figured it out and this is what I have so far:
function getPrice($doc){
phpQuery::selectDocument($doc);
$prices = array();
foreach(pq("[class*='Price'], [class*='price'], [id*='Price'], [id*='price']") as $res){
$each = pq($res);
if(preg_match('/\$\d+(?:\.\d+)?/', $each->text(), $matches)){
echo '<br>'.$matches[0].'</br>';
$prices[] = $each->html();
}
}
}
This is printing the correct elements. Now I need to extract the font-size of those elements so I can sort the array by font-size.

Select by ID: $("#Price")
Select by Class: $(".Price")
Showing your HTML would help.

Give this ago
$(function(){
$('.price').each(function(){
//create your array here of $(this).val() or $(this).html()
});
});

Related

PHP JSON Removing HTML Tags

I'm working with an API that returns JSON and I'm using PHP to display it. Inside the JSON is HTML tags. I've been reading about the many ways you can remove them from the returned JSON but I have so many unique tags that I'm wondering what the easiest method would be? A lot of other questions seem to focus on specific tags and finding a solution to remove them. Is it possible to just remove all known HTML tags or do I need to program each one individually? If the answer is yes, what is the method for doing so?
Thank you for your time and input.

Simply do this:
$data = json_decode($your_json_string, TRUE);
array_walk_recursive($data, function(&$v) { $v = htmlentities($v); });
or to remove tags completely
array_walk_recursive($data, function(&$v) { $v = strip_tags($v); });

How to combine the text node of 2 pieces of extracted data using Goutte/Domcrawler

I've been trying to figure out how to combine two pieces of extracted text into a single result (array). In this case, the title and subtitle of a variety of books.
<td class="item_info">
<span class="item_title">Carrots Like Peas</span>
<em class="item_subtitle">- And Other Fun Facts</em>
</td>
The closest I've been able to get is:
$holds = $crawler->filter('span.item_title,em.item_subtitle');
Which I've managed to output with the following:
$holds->each(function ($node) {
echo '<pre>';
print $node->text();
echo '</pre>';
});
And results in
<pre>Carrots Like Peas</pre>
<pre>- And Other Fun Facts</pre>
Another problem is that not all the books have subtitles, so I need to avoid combining two titles together.
How would I go about combining those two into a single result (or array)?

In my case, I took a roundabout way to get where I wanted to be. I stepped back one level in the DOM to the td tag and grabbed everything and dumped it into the array.
I realized that DomCrawler's documentation had the example code to place the text nodes into an array.
$items_out = $crawler->filter('td.item_info')->each(function (Crawler $node, $i) {
return $node->text();
});
I'd tried to avoid capturing the td because author's were also included in those cells. After even more digging, I was able to strip the authors from the array with the following:
foreach ($items_out as &$items) {
$items = substr($items,0, strpos($items,' - by'));
}
Just took me five days to get it all sorted out. Now onto the next problem!

As per Goutte Documentation, Goutte utilizes the Symfony DomCrawler component. Information on adding content to a DomCrawler object can be found atSymfony DomCrawler - Adding Content

Extract number in Preg_match_all of jQuery

Just trying to extract the second td (which is a number) of the table that is, the value of 235. Values of numbers change dynamically.
jQuery('.content').append('<tr><td>Volume1</td><td>56</td><td>123</td></tr>');
jQuery('.content').append('<tr><td>Volume2</td><td>235</td><td>789</td></tr>');
In php I have this:
if(preg_match_all("/jQuery\(\'.content\'\).append\((\'<tr><td>Volume2</td><td>(.*?)\')\);/", $result))
Of course it does not work because I don't know how to use regex in html tags. Please help
Edit:
The solution must be in php because jQuery receive from another server using the curl php.

You can do:
$(".content tr:eq(1) td:eq(1)").text();
Demo: http://jsfiddle.net/zuhb9/
If you have multiple class content then you need to loop through all of them:
$(".content").each(function() {
console.log($(this).find("tr:eq(1) td:eq(1)").text());
});
Demo: http://jsfiddle.net/zuhb9/1/

PHP: How can I access this XML entity when its name contains a reserved word?

I'm trying to parse this feed: http://musicbrainz.org/ws/1/artist/c0b2500e-0cef-4130-869d-732b23ed9df5?type=xml&inc=url-rels
I want to grab the URLs inside the 'relation-list' tag.
I've tried fetching the URL with PHP using simplexml_load_file(), but I can't access it using $feed->artist->relation-list as PHP interprets "list" as the list() function.
I have a feeling I'm going about this wrong (not much XML experience), and even if I was able to get hold of the elements I want, I don't know how to extract their attributes (I just want the type and target fields).
Can anyone gently nudge me in the right direction?
Thanks.
Matt

Have a look at the examples on the php.net page, they actually tell you how to solve this:
// $feed->artist->relation-list
$feed->artist->{'relation-list'}
To get an attribute of a node, just use the attribute name as array index on the node:
foreach( $feed->artist->{'relation-list'}->relation as $relation ) {
$target = (string)$relation['target'];
$type = (string)$relation['type'];
// Do something with it
}
(Untested)

Get a single element with PHP and XPath

Lots of tutorials around the net but none of them can explain me this:
How do I select a single element (in a table, for example), having its absolute XPath?
Example:
I have this:
/html/body/table/tbody/tr[2]/td[2]/table/tbody/tr/td/table[3]/tbody/tr/td/table/tbody/tr[3]/td/table/tbody/tr[4]/td[5]/span
What's that PHP function to get the text of that element?!
Really I could not find an answer. Found lots of guides and hints to get all the elements of the table, all the buttons of a form, etc, but not what I need.
Thank you.

$xml = simplexml_load_string($html_content_string);
$arr = $xml->xpath("//body/table/tbody/tr[2]/td[2]/table/tbody/tr/td/table[3]/tbody/tr/td/table/tbody/tr[3]/td/table/tbody/tr[4]/td[5]/span");
var_dump($arr);

Load you HTML document into a DOM object then make a DOMXPath object from it and let it evaluate your query string.
It's all described in detail here: http://php.net/manual/en/book.dom.php

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Extracting prices from html using jQuery - php

Select by ID: $("#Price") Select by Class: $(".Price") Showing your HTML would help.

Give this ago $(function(){ $('.price').each(function(){ //create your array here of $(this).val() or $(this).html() }); });

Related

PHP JSON Removing HTML Tags

How to combine the text node of 2 pieces of extracted data using Goutte/Domcrawler

Extract number in Preg_match_all of jQuery

PHP: How can I access this XML entity when its name contains a reserved word?

Get a single element with PHP and XPath

Categories

Resources