This question already has answers here:
RegEx match open tags except XHTML self-contained tags
(35 answers)
Closed 6 years ago.
<div id="plugin-description">
<p itemprop="description" class="shortdesc">
BuddyPress helps you build any type of community website using WordPress, with member profiles, activity streams, user groups, messaging, and more. </p>
<div class="description-right">
<p class="button">
<a itemprop="downloadUrl" href="https://downloads.wordpress.org/plugin/buddypress.2.6.1.1.zip">Download Version 2.6.1.1</a>
i need description just with this code
<p itemprop="description" class="shortdesc">[a-z]</p>
i need download link
<a itemprop="downloadUrl" href="[A-Z]"></a>
There are better tools for parsing HTML than regular expressions. That said, there are times when parsing HTML with regular expressions works safely and consistently, so don't be bullied out of trying it. These cases are usually for small, known sets of HTML markup.
For this particular case, it seems that using an HTML parser would be effective leave you with more legible code. To illustrate this, I'll use a command line tool like pup, which will help you retrieve your content pretty simply. Let's pretend that the markup is stored at /tmp/input on your computer.
To grab the downloadUrl...
pup < /tmp/input 'a[itemprop="downloadUrl"] attr{href}'
To grab the description...
pup < /tmp/input 'p[itemprop="description"] text{}'
This I think illustrates the simplicity and benefits of using an HTML parser to grab what you're after.
And once again:
<?php
$data = <<<DATA
<div id="plugin-description">
<p itemprop="description" class="shortdesc">
BuddyPress helps you build any type of community website using WordPress.
</p>
<div class="description-right">
<p class="button">
<a itemprop="downloadUrl" href=".zip">Download Version 2.6.1.1</a>
</p>
</div>
</div>
DATA;
$dom = new DOMDocument();
$dom->loadHTML($data);
$xpath = new DOMXPath($dom);
$containers = $xpath->query("//div[#id='plugin-description']");
foreach ($containers as $container) {
$description = trim($xpath->query(".//p[#itemprop='description']", $container)->item(0)->nodeValue);
$link = $xpath->query(".//a[#itemprop='downloadUrl']/#href", $container)->item(0)->nodeValue;
echo $description . $link;
}
?>
See a demo on ideone.com.
Related
My team and I have made a database in php my admin for a restaurant, and I'm currently working on the customer dashboard. Im using for each loops to display complete orders in one of the dashboard tabs, and have the code working, but right now it just outputs regular black text. I was wondering how to style it to output the rows as a grid, similar to bootstrap grids.
I've tried to just add containers with rows and columns to the foreach echo itself, but its just not working as I thought it would.
<div id="CurrentOrders" class="tabcontent" style="margin-left: 24px">
<!-- This information will be pulled from the Orders table in the DB -->
<h3>Current Orders</h3>
<p>
<div class="container">
<?php
foreach ($orderno as $order) {
$n = $order['OrderNo'];
$menunamequery = "SELECT * FROM OrderItem WHERE OrderNo = '{$n}'";
$currentorders = getRows($menunamequery);
foreach ($currentorders as $currentorder) {
echo "Order Number -"." ".$currentorder['OrderNo']." , "."Order -"." ".$currentorder['MenuName']." , "."Quantity -"." ".$currentorder['Quantity']."<br>";
}
}
?> </div>
</p>
</div>
The expected result is for these rows im outputting to have some sort of grid layout, the actual result is just plaintext currently.
Sorry if this is a bad question, my team and I just learned php this semester and are hoping to continue getting better at it. Any help would be appreciated.
You can simply output HTML from PHP:
echo '<span style="color: red">'.$currentorder['MenuName'].'</span>';
However, it is advised that you sanitize your output, so nobody can "create HTML" by putting tags in the database;
echo '<span style="color: red">'.htmlspecialchars($currentorder['MenuName']).'</span>';
This does exactly what it says; makes HTML entities from special characters. For example, > will be printed as >, which the browser will safely render as >, instead of trying to interpret it as an HTML element closing bracket.
Alternatively, you can simply write HTML directly if you wish, by closing and opening the PHP tags:
// PHP Code
?>
<span class="some-class"><?=htmlspecialchars($currentorder['MenuName'])?></span>
<?php
// More PHP Code
You may also want to look into templating engines to make it easier for you, although it depends on the project if it's worth it for you to look into that, since there is a little bit of a learning curve to it.
This question already has answers here:
image problems with regular expressions
(2 answers)
Closed 8 years ago.
I need a little bit of help. I got an assignment for school, I need to make a regular expressionscript which get an image (and later upload to the database, but that's not the problem). The real problem is that I get an array with all images from the page, but should be one image, which is:
data-src-l="/WebRoot/products/8020/80203122/bilder/80203122.jpg"
this is the code from the whole image:
<li>
<a href="/WebRoot/products/8020/80203122/bilder/80203122.jpg">
<img
itemprop="image"
alt="Jesus Remember Me - Taize Songs (2CD)"
src="/WebRoot/AsaphNL/Shops/asaphnl/5422/8F43/62EE/D698/EF8E/4DEB/AED5/3B0E/80203122_xs.jpg"
data-src-xs="/WebRoot/AsaphNL/Shops/asaphnl/5422/8F43/62EE/D698/EF8E/4DEB/AED5/3B0E/80203122_xs.jpg"
data-src-s="/WebRoot/products/8020/80203122/bilder/80203122_s.jpg"
data-src-m="/WebRoot/products/8020/80203122/bilder/80203122_m.jpg"
data-src-l="/WebRoot/products/8020/80203122/bilder/80203122.jpg"
/>
</a>
</li>
</ul>
This is the code with PHP:
<?php
header('Content-Type: text/html; charset=utf-8');
$url = "http://www.asaphshop.nl/epages/asaphnl.sf/nl_NL/?ObjectPath=/Shops/asaphnl/Products/80203122";
$htmlcode = file_get_contents($url);
$pattern = "/<img\s[^>]*?src\s*=\s*['\"]([^'\"]*?)['\"][^>]*?>/";
preg_match_all($pattern, $htmlcode, $matches);
//print_r ($matches);
$image = ($matches[0]);
$image = str_replace('src="/', 'src="http://www.asaphshop.nl/', $image);
print_r ($image);
?>
UPDATE: in front of the imagelink must be the link to http://www.asaphshop.nl, so it will look into the site for the image. not inside my localhost. If you dont understand me, you can ask ;)
(<img\s[^>]*?data-src-l\s*=\s*['\"])([^'\"]*?['\"])([^>]*?>)
Try this.This will give the required img.Replace by $1http://www.asaphshop.nl$2$3.See demo.
http://regex101.com/r/wQ1oW3/29
I need a little bit of help. I got an assignment for school, I need to make a regular expression script which get an image (and later upload to the database, but that's not the problem).
Tell your school that regular expressions are not the best tool for the job.
Sure, there is this argument that regular expressions are not so regular and can be used for tasks such as palindrome matching. But that doesn't mean you should use them, since it will cause a lot of headache to you and other developers that might need to work with your code later.
What you should use instead is a proper HTML/XML parser.
Fortunately enough, PHP has what it needs, and it's called DOMDocument. Take a look at its getElementsByTagName method, for example. You could use it to retrieve images. Then you could iterate through all the attributes and parse them the way you want.
Not only it's safer since you don't have to worry about edge cases, it's also more readable.
I avoided a lot to come here share my problem. I have googled a lot and find some solutions but not confirmed.
First I explain My Problem.
I have a CKEditor in my site to let the users post comments. Suppose a user clicks two posts to Multi quote them, the data will be like this in CKEditor
<div class="quote" user_name="david_sa" post_id="223423">
This is Quoted Text
</div>
<div class="quote" user_name="richard12" post_id="254555">
This is Quoted Text
</div>
<div class="original">
This is the Comment Text
</div>
I want to get all the elements separately in php as below
user_name = david_sa
post_id = 223423;
quote_text = This is Quoted Text
user_name = david_sa
post_id = richard12;
quote_text = This is Quoted Text
original_comment = This is the Comment Text
I want to get the data in above format in PHP. I have googled and found the preg_match_all() PHP function near to my problem, that uses the REGEX to match the string patterns. But I am not confirmed that is it a legitimate and efficient solution or there is some better solution. If You have any better solution Please Suggest Me.
You can use DOMDocument and DOMXPath for this. It takes very few lines of code to parse HTML and extract just about anything from it.
$doc = new DOMDocument();
$doc->loadHTML(
'<html><body>' . '
<div class="quote" user_name="david_sa" post_id="223423">
This is Quoted Text
</div>
<div class="quote" user_name="richard12" post_id="254555">
This is Quoted Text
</div>
<div class="original">
This is the Comment Text
</div>
' . '</body></html>');
$xpath = new DOMXPath($doc);
$quote = $xpath->query("//div[#class='quote']");
echo $quote->length; // 2
echo $quote->item(0)->getAttribute('user_name'); // david_sa
echo $quote->item(1)->getAttribute('post_id'); // 254555
// foreach($quote as $div) works as expected
$original = $xpath->query("//div[#class='original']");
echo $original->length; // 1
echo $original->item(0)->nodeValue; // This is the Comment Text
If you are not familiar with XPath syntax then here are a few examples to get you started.
You should not be using regex's to process HTML/XML. This is what DOMDocument and SimpleXML are built for.
You problem seems relatively simple, so you should be able to get away with using SimpleXML (aptly named, huh?)
Do not even try regex to parse html. I would recommend simple html dom. Get it here: php html parser
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to parse and process HTML with PHP?
I am selecting a column from a mySQL table which has a value something like this:
<div class="article news group">
<p class="news_info">18/10/12</p>
<h2>Hello world!</h2>
<p>ertwrt aerteartert</p>
<p>ertaertert</p>
<p>waertwertwertwaertweart</p>
</div>
I now need to only get the 3 p tags from this - so returning:
<p>ertwrt aerteartert</p>
<p>ertaertert</p>
<p>waertwertwertwaertweart</p>
Is there an easy way to do this with php or will I need to use jQuery to remove the unwanted code.
Either way, any ideas how I would do it?
Using PHP
$html = '<div class="article news group">
<p class="news_info">18/10/12</p>
<h2>Hello world!</h2>
<p>ertwrt aerteartert</p>
<p>ertaertert</p>
<p>waertwertwertwaertweart</p>
</div>';
preg_match_all('~(<p>.+?</p>)~', $html, $match);
echo implode("\n", $match[1]);
This may be help for your needs : http://api.jquery.com/siblings/
If the html you are processing is more complicated than the example you might want to consider QueryPath (http://querypath.org/)
It would be overkill for your example but can be quite useful for this type of manipulation.
I am trying to create a simple alert app for some friends.
Basically i want to be able to extract data "price" and "stock availability" from a webpage like the folowing two:
http://www.sparkfun.com/commerce/product_info.php?products_id=5
http://www.sparkfun.com/commerce/product_info.php?products_id=9279
I have made the alert via e-mail and sms part but now i want to be able to get the quantity and price out of the webpages (those 2 or any other ones) so that i can compare the price and quantity available and alert us to make an order if a product is between some thresholds.
I have tried some regex (found on some tutorials, but i an way too n00b for this) but haven't managed to get this working, any good tips or examples?
$content = file_get_contents('http://www.sparkfun.com/commerce/product_info.php?products_id=9279');
preg_match('#<tr><th>(.*)</th> <td><b>price</b></td></tr>#', $content, $match);
$price = $match[1];
preg_match('#<input type="hidden" name="quantity_on_hand" value="(.*?)">#', $content, $match);
$in_stock = $match[1];
echo "Price: $price - Availability: $in_stock\n";
It's called screen scraping, in case you need to google for it.
I would suggest that you use a dom parser and xpath expressions instead. Feed the HTML through HtmlTidy first, to ensure that it's valid markup.
For example:
$html = file_get_contents("http://www.example.com");
$html = tidy_repair_string($html);
$doc = new DomDocument();
$doc->loadHtml($html);
$xpath = new DomXPath($doc);
// Now query the document:
foreach ($xpath->query('//table[#class="pricing"]/th') as $node) {
echo $node, "\n";
}
What ever you do: Don't use regular expressions to parse HTML or bad things will happen. Use a parser instead.
1st, asking this question goes too into details. 2nd, extracting data from a website might not be legitimate. However, I have hints:
Use Firebug or Chrome/Safari Inspector to explore the HTML content and pattern of interesting information
Test your RegEx to see if the match. You may need do it many times (multi-pass parsing/extraction)
Write a client via cURL or even much simpler, use file_get_contents (NOTE that some hosting disable loading URLs with file_get_contents)
For me, I'd better use Tidy to convert to valid XHTML and then use XPath to extract data, instead of RegEx. Why? Because XHTML is not regular and XPath is very flexible. You can learn XSLT to transform.
Good luck!
You are probably best off loading the HTML code into a DOM parser like this one and searching for the "pricing" table. However, any kind of scraping you do can break whenever they change their page layout, and is probably illegal without their consent.
The best way, though, would be to talk to the people who run the site, and see whether they have alternative, more reliable forms of data delivery (Web services, RSS, or database exports come to mind).
The simplest method to extract data from Website. I've analysed that my all data is covered within <h3> tag only, so I've prepared this one.
<?php
include(‘simple_html_dom.php’);
// Create DOM from URL, paste your destined web url in $page
$page = ‘http://facebook4free.com/category/facebookstatus/amazing-facebook-status/’;
$html = new simple_html_dom();
//Within $html your webpage will be loaded for further operation
$html->load_file($page);
// Find all links
$links = array();
//Within find() function, I have written h3 so it will simply fetch the content from <h3> tag only. Change as per your requirement.
foreach($html->find(‘h3′) as $element)
{
$links[] = $element;
}
reset($links);
//$out will be having each of HTML element content you searching for, within that web page
foreach ($links as $out)
{
echo $out;
}
?>