PHP preg_match_all for new line - php

i need a data "2.5 (0.5)" and "3.5"
my pattern is '/class="match_total_goal_div">.+</s'
But it is not working.
Please help.
<div class="match_total_goal_div">
2.5 (0.5) </div>
<div class="match_half_goal_div hide" ">
</div>
</td>
<td class="text-center corner_goal_range">
<div>
<span class="newlabel">N.A.</span>
</div>
.
.
.
<div class="match_total_goal_div">
3.5 </div>
.
.
.

First, you need to add brackets around your .+ to capture the desired data. By the way you need a question mark: .+?.
Hope this can help you
$str = '<div class="match_total_goal_div">
2.5 (0.5) </div>
<div class="match_total_goal_div">
3.5 </div>';
$pattern = '/class="match_total_goal_div">(.+?)</s';
preg_match_all($pattern, $str, $matches);
var_dump($matches);

Check this code to accomplish your goal
<?php
$html = '<div class="match_total_goal_div">
2.5 (0.5) </div>
<div class="match_half_goal_div hide">
</div>
<td class="text-center corner_goal_range"></td>
<div>
<span class="newlabel">N.A.</span>
</div>
<div class="match_total_goal_div">
3.5 </div>';
$DOM = new DOMDocument();
$DOM->loadHTML($html);
$finder = new DomXPath($DOM);
$classname = 'match_total_goal_div';
$nodes = $finder->query("//*[contains(#class, '$classname')]");
foreach ($nodes as $node) {
echo $node->nodeValue."\n";
}
?>
Live demo : http://sandbox.onlinephpfunctions.com/code/b3e645ac56b9f7bf57d4519abd6b1be90ed87945

Related

php parse html data from specific positions

I have a lot of these in my html document:
<div class="thumb-under">
<p class="title">
text
</p>
<p class="metadata">
<span class="bg">
<span class="first">3A
</span>
<a href="4A">
<span class="name">5A
</span></a>
<span>
<span class="spring"> -
</span> 6A
<span class="spring">something
</span>
</span>
<span class="spring"> -
</span>
</span>
</p>
</div>
so I need to extract data from positions 1A, 2A, 3A, 4A, 5A, 6A
I tried this but something I am doing wrong:
$matches = array();
$dom = new DOMDocument;
$dom->loadHTML($html);
foreach($dom->getElementsByTagName('p') as $tr) {
if ( ! $tr->hasAttribute('class')) {
continue;
}
$class = explode(' ', $tr->getAttribute('class'));
if (in_array('title', $class)) {
$matches[] = $tr->getElementsByTagName('a');
}
}
print_r($matches);
I am totally lost...

DOMXPATH Evaluate string with variable nested position

I've been using DOMXPATH and I love it, but I need it to be a little more intuitive.
Some clients add some extra HTML in their code, which screws up our project.
Example 1:
<div id="Fooen">
<span class="FooTitle">Overdracht</span>
<span class="Foo koopprijs">
<span class="FooName">Vraagprijs</span>
<span class="FooValue">€ 299.000,-</span>
</span>
<span class="Foo aanvaarding">
<span class="FooName">Aanvaarding</span>
<span class="FooValue">In overleg</span>
</span>
</div>
We can get the SPAN name and values fine with this:
$filtered = $domxpath->query("//div[#class='Fooen']/span");
foreach ($filtered as $myItem) {
$temp_name = $domxpath->evaluate("string(descendant::span[#class='FooName'])", $myItem);
$name = strtolower(preg_replace('/\s*/', '', $temp_name));
$value = $domxpath->evaluate("string(descendant::span[#class='FooValue'])", $myItem);
}
But, sometimes the client added code, so the nodes are now deeper. I cannot seem to find an answer to this without mapping it all the way down.
Example 2:
<div id="Fooen">
<div>
<div class="blok-sizer"></div>
<div id="" class="block">
<div class="top">
<div class="center column"></div>
</div>
<div class="middle">
<div class="center column">
<span class="FooTitle">Overdracht</span>
<span class="Foo first transactiestatus">
<span class="FooName">Status</span>
<span class="FooValue">Beschikbaar</span>
</span>
<span class="Foo koopprijs">
<span class="FooName">Vraagprijs</span>
<span class="FooValue">€ 975.000,-</span>
</span>
</div>
</div>
</div>
</div>
</div>
But now, this won't work:
$filtered = $domxpath->query("//div[#class='Fooen']/span");
foreach ($filtered as $myItem) {
$temp_name = $domxpath->evaluate("string(descendant::span[#class='FooName'])", $myItem);
$name = strtolower(preg_replace('/\s*/', '', $temp_name));
$value = $domxpath->evaluate("string(descendant::span[#class='FooValue'])", $myItem);
}
I have tried variations like these:
$domxpath->evaluate("string(descendant::*[#class='FooName'])", $myItem);
$domxpath->evaluate("string(//*[#class='FooName'])", $myItem);
$domxpath->evaluate("string(*[#class='FooName'])", $myItem);
$domxpath->evaluate("string(.//span[#class='FooName'])", $myItem);
Is there a way to get the outcome of a string, even if it is not at the same place each time, thus more flexible?
Edit, here is a ready to copy/paste sample I am currently working with. First is the working one, second is the one I'd like to get working from root to end and not fixed but flexible. If I knew how to fiddle, I would, sorry.
<?php
function getDom($url = "")
{
$str = $url;
$internalErrors = libxml_use_internal_errors(true);
$dom = new \DOMDocument('1.0', 'UTF-8');
$dom->loadHTML($str);
libxml_use_internal_errors($internalErrors);
return $dom;
}
$domcode = '<div class="Fooen">
<span class="FooTitle">Overdracht</span>
<span class="Foo koopprijs">
<span class="FooName">Vraagprijs</span>
<span class="FooValue">€ 299.000,-</span>
</span>
<span class="Foo aanvaarding">
<span class="FooName">Aanvaarding</span>
<span class="FooValue">In overleg</span>
</span>
</div>';
$dom = getDom($domcode);
$html = '';
$domxpath = new \DOMXPath($dom);
$newDom = new \DOMDocument;
$newDom->formatOutput = true;
$filtered = $domxpath->query("//div[#class='Fooen']/span");
foreach ($filtered as $myItem) {
$temp_name = $domxpath->evaluate("string(descendant::span[#class='FooName'])", $myItem);
echo strtolower(preg_replace('/\s*/', '', $temp_name));
echo " = ";
echo $domxpath->evaluate("string(descendant::span[#class='FooValue'])", $myItem);
echo "<br>";
}
echo "<br>";
$domcode = '
<div class="Fooen">
<div>
<div class="blok-sizer"></div>
<div id="" class="block">
<div class="top">
<div class="center column"></div>
</div>
<div class="middle">
<div class="center column">
<span class="FooTitle">Overdracht</span>
<span class="Foo first transactiestatus">
<span class="FooName">Status</span>
<span class="FooValue">Beschikbaar</span>
</span>
<span class="Foo koopprijs">
<span class="FooName">Vraagprijs</span>
<span class="FooValue">€ 975.000,-</span>
</span>
</div>
</div>
</div>
</div>
</div>';
$dom = getDom($domcode);
$html = '';
$domxpath = new \DOMXPath($dom);
$newDom = new \DOMDocument;
$newDom->formatOutput = true;
$filtered = $domxpath->query("//div[#class='center column']/span");
foreach ($filtered as $myItem) {
$temp_name = $domxpath->evaluate("string(descendant::span[#class='FooName'])", $myItem);
echo "<br>";
echo strtolower(preg_replace('/\s*/', '', $temp_name));
echo " = ";
echo $domxpath->evaluate("string(descendant::span[#class='FooValue'])", $myItem);
}
Turns out I had been beating the wrong line of code all day. Apparently I needed to broaden the Filtered Search. If there's room for non-greedy code I'm all ears. Otherwise, I hope it helps somebody else.
$filtered = $domxpath->query("//div[#class='Fooen']/descendant::span");

How can I echo scraped div in Table format in PHP?

i am trying to scrape table which is in div inside div and inside span, i am getting data but as whole content and not formatted as table, all i want to get all data in table format like rows and columns, how to get this output in table format? any help will be appreciated.. Thanks in advance.
edit: i have provided html table code which is commented because its not working
<html>
<div class="container">
<div class="table-responsive">
<table class="table table-striped table-condensed">
<thead>
<tr bgcolor='#ddbbff'>
<th>Stations</th>
<th>Day/Date</th>
<th>Arrive</th>
<th>Depart</th>
<th>Status</th>
</tr>
</thead>
<tbody>
<?php
//error_reporting(E_ALL);
//ini_set('display_errors', '1');
$url = "https://www.railmitra.com/train-running-status/12129?day=yesterday";
$class_to_scrape="well well-sm";
$html = file_get_contents($url);
$document = new DOMDocument();
$document->loadHTML($html);
$selector = new DOMXPath($document);
$anchors = $selector->query("/html/body//div[#class='". $class_to_scrape ."']");
echo "ok, no php syntax errors. <br>Lets see what we scraped.<br>";
foreach ($anchors as $node) {
$full_content = innerHTML($node);
echo "<br>".$full_content."<br>" ;
//echo "<tr>";
//echo "<td>" . $stations[] = $cells->item(0)->textContent . "</td>";
//echo "<td>" . $dayDate[] = $cells->item(1)->textContent . "</td>";
//echo "<td>" . $arr[] = $cells->item(2)->textContent . "</td>";
//echo "<td>" . $dep[] = $cells->item(3)->textContent . "</td>";
//echo "<td>" . $status[] = $cells->item(4)->textContent . "</td>";
}
/* this function preserves the inner content of the scraped element.
** http://stackoverflow.com/questions/5349310/how-to-scrape-web-page-data-
without-losing-tags
** So be sure to go and give that post an uptick too:)
**/
function innerHTML(DOMNode $node)
{
$doc = new DOMDocument();
foreach ($node->childNodes as $child) {
$doc->appendChild($doc->importNode($child, true));
}
return $doc->saveHTML();
}
?>
</tbody>
</table>
</div>
</div>
</html>
edit: added html source
<div class="well well-sm">
<div class=row">
<div class="col-7 col-md-4">
<span class="ind-crossed">
<i class="fa fa-check-circle-o aria-hidden="true">
::before
</i>
</span>
" Pune Junction"
</div>
<div class="col-5 col-md-3">
<span> day1 </span>
<span> 09-sep </span>
</div>
<div class="col-2 col-md-1">
<span> 00:00 </span>
<br>
</div>
<div class="col-2 col-md-1">
<span> 18:25 </span>
<br>
</div>
<div class="col-8 col-md-3 text-right">
<span class="tl-msg text-success"> Right Time </span>
</div>
</div>
</div>
<div class="well well-sm">...</div>

Using DomDocument to involve a element with another element?

I have tree divs inside a father div called results_all as you can see below:
<div id="results_all">
<div class="result_information">
</div>
<div class="result_information">
</div>
<div class="result_information">
</div>
</div>
all I want to do is involve all divs whose class is called result_information, with the following code <tr><th>.
so that the final results to be:
<tr><th>
<div class="result_information">
</div>
</th></tr>
<tr><th>
<div class="result_information">
</div>
</th></tr>
<tr><th>
<div class="result_information">
</div>
</th></tr>
How I can do this kind of thing using DomDocument with PHP?
This will match the class using DOMXPath, it's then just a case of outputting the matched content with the appropriate tags when you loop over the results.
$html = '<div id="results_all">
<div class="result_information">
test3
</div>
<div class="result_information">
test2
</div>
<div class="result_information">
test
</div>
</div>
';
$classname = 'result_information';
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$results = $xpath->query("//*[contains(#class, '$classname')]");
for($i = 0; $i < $results->length; $i++ ){
print '<tr><th><div class="result_information">' . $results->item($i)->nodeValue . '</div></tr></th>';
}

How to get data from HTML using regex

I have following HTML
<table class="profile-stats">
<tr>
<td class="stat">
<div class="statnum">8</div>
<div class="statlabel"> Tweets </div>
</td>
<td class="stat">
<a href="/THEDJMHA/following">
<div class="statnum">13</div>
<div class="statlabel"> Following </div>
</a>
</td>
<td class="stat stat-last">
<a href="/THEDJMHA/followers">
<div class="statnum">22</div>
<div class="statlabel"> Followers </div>
</a>
</td>
</tr>
</table>
I want to get value from <td class="stat stat-last"> => <div class="statnum"> = 22.
I have tried the follow regex but does not any found match.
/<div\sclass="statnum">^(.)\?<\/div>/ig
Here's a way to accomplish this using a parser.
<?php
$html = '<table class="profile-stats">
<tr>
<td class="stat">
<div class="statnum">8</div>
<div class="statlabel"> Tweets </div>
</td>
<td class="stat">
<a href="/THEDJMHA/following">
<div class="statnum">13</div>
<div class="statlabel"> Following </div>
</a>
</td>
<td class="stat stat-last">
<a href="/THEDJMHA/followers">
<div class="statnum">22</div>
<div class="statlabel"> Followers </div>
</a>
</td>
</tr>
</table>';
$doc = new DOMDocument(); //make a dom object
$doc->loadHTML($html);
$tds = $doc->getElementsByTagName('td');
foreach ($tds as $cell) { //loop through all Cells
if(strpos($cell->getAttribute('class'), 'stat-last')){
$divs = $cell->getElementsByTagName('div');
foreach($divs as $div) { // loop through all divs of the cell
if($div->getAttribute('class') == 'statnum'){
echo $div->nodeValue;
}
}
}
}
Output:
22
...or using an xpath...
$doc = new DOMDocument(); //make a dom object
$doc->loadHTML($html);
$xpath = new DOMXpath($doc);
$statnums = $xpath->query("//td[#class='stat stat-last']/a/div[#class='statnum']");
foreach($statnums as $statnum) {
echo $statnum->nodeValue;
}
Output:
22
or if you realllly wanted to regex it...
<?php
$html = '<table class="profile-stats">
<tr>
<td class="stat">
<div class="statnum">8</div>
<div class="statlabel"> Tweets </div>
</td>
<td class="stat">
<a href="/THEDJMHA/following">
<div class="statnum">13</div>
<div class="statlabel"> Following </div>
</a>
</td>
<td class="stat stat-last">
<a href="/THEDJMHA/followers">
<div class="statnum">22</div>
<div class="statlabel"> Followers </div>
</a>
</td>
</tr>
</table>';
preg_match('~td class=".*?stat-last">.*?<div class="statnum">(.*?)<~s', $html, $num);
echo $num[1];
Output:
22
Regex demo: https://regex101.com/r/kM6kI2/1
I think it would be better if you use an XML parser for that instead of regex. SimpleXML can do the job for you: http://php.net/manual/en/book.simplexml.php
/<td class="stat stat-last">.*?<div class="statnum">(\d+)/si
Your match is in the first capture group. Notice the use of the s option at the end. Makes '.' match new line characters.
You can edit your pattern like that:
/<div\sclass="statnum">(.*?)<\/div>/ig

Categories