parsing .srt files - php

1
00:00:00,074 --> 00:00:02,564
Previously on Breaking Bad...
2
00:00:02,663 --> 00:00:04,393
Words...
i need to parse srt files with php and print the all subs in the file with variables.
i couldn't find the right reg exps. when doing this i need to take the id, time and the subtitle variables. and when printing there musn't be no array() s or etc. must print just the same as in the orginal file.
i mean i must print like;
$number <br> (e.g. 1)
$time <br> (e.g. 00:00:00,074 --> 00:00:02,564)
$subtitle <br> (e.g. Previously on Breaking Bad...)
by the way i have this code. but it doesn't see the lines. it must be edited but how?
$srt_file = file('test.srt',FILE_IGNORE_NEW_LINES);
$regex = "/^(\d)+ ([\d]+:[\d]+:[\d]+,[\d]+) --> ([\d]+:[\d]+:[\d]+,[\d]+) (\w.+)/";
foreach($srt_file as $srt){
preg_match($regex,$srt,$srt_lines);
print_r($srt_lines);
echo '<br />';
}

Here is a short and simple state machine for parsing the SRT file line by line:
define('SRT_STATE_SUBNUMBER', 0);
define('SRT_STATE_TIME', 1);
define('SRT_STATE_TEXT', 2);
define('SRT_STATE_BLANK', 3);
$lines = file('test.srt');
$subs = array();
$state = SRT_STATE_SUBNUMBER;
$subNum = 0;
$subText = '';
$subTime = '';
foreach($lines as $line) {
switch($state) {
case SRT_STATE_SUBNUMBER:
$subNum = trim($line);
$state = SRT_STATE_TIME;
break;
case SRT_STATE_TIME:
$subTime = trim($line);
$state = SRT_STATE_TEXT;
break;
case SRT_STATE_TEXT:
if (trim($line) == '') {
$sub = new stdClass;
$sub->number = $subNum;
list($sub->startTime, $sub->stopTime) = explode(' --> ', $subTime);
$sub->text = $subText;
$subText = '';
$state = SRT_STATE_SUBNUMBER;
$subs[] = $sub;
} else {
$subText .= $line;
}
break;
}
}
if ($state == SRT_STATE_TEXT) {
// if file was missing the trailing newlines, we'll be in this
// state here. Append the last read text and add the last sub.
$sub->text = $subText;
$subs[] = $sub;
}
print_r($subs);
Result:
Array
(
[0] => stdClass Object
(
[number] => 1
[stopTime] => 00:00:24,400
[startTime] => 00:00:20,000
[text] => Altocumulus clouds occur between six thousand
)
[1] => stdClass Object
(
[number] => 2
[stopTime] => 00:00:27,800
[startTime] => 00:00:24,600
[text] => and twenty thousand feet above ground level.
)
)
You can then loop over the array of subs or access them by array offset:
echo $subs[0]->number . ' says ' . $subs[0]->text . "\n";
To show all subs by looping over each one and displaying it:
foreach($subs as $sub) {
echo $sub->number . ' begins at ' . $sub->startTime .
' and ends at ' . $sub->stopTime . '. The text is: <br /><pre>' .
$sub->text . "</pre><br />\n";
}
Further reading: SubRip Text File Format

Group the file() array into chunks of 4 using array_chunk(), then omit the last entry, since it's a blank line like this:
foreach( array_chunk( file( 'test.srt'), 4) as $entry) {
list( $number, $time, $subtitle) = $entry;
echo $number . '<br />';
echo $time . '<br />';
echo $subtitle . '<br />';
}

That is not going to match because your $srt_file array might look like this:
Array
([0] => '1',
[1] => '00:00:00,074 --> 00:00:02,564',
[2] => 'Previously on Breaking Bad...'.
[3] => '',
[4] => '2',
...
)
Your regex isn't going to match any of those elements.
If your intent is to read the entire file into one long memory-hog-of-a-string then use file_get_contents to get the entire file contents into one string. then use a preg_match_all to get all the regex matches.
Otherwise you might try to loop through the array and try to match various regex patterns to determine if the line is an id, a time range, or text and do thing appropriately. obviously you might also want some logic to make sure you are getting values in the right order (id, then time range, then text).

I made a class to convert a .srt file to array.
Each entry of the array has the following properties:
id: a number representing the id of the subtitle (2)
start: float, the start time in seconds (24.443)
end: float, the end time in seconds (27.647)
startString: the start time in human readable format (00:00:24.443)
endString: the end time in human readable format (00:00:24.647)
duration: the duration of the subtitle, in ms (3204)
text: the text of the subtitle (the Peacocks ruled over Gongmen City.)
The code is php7:
<?php
namespace VideoSubtitles\Srt;
class SrtToArrayTool
{
public static function getArrayByFile(string $file): array
{
$ret = [];
$gen = function ($filename) {
$file = fopen($filename, 'r');
while (($line = fgets($file)) !== false) {
yield rtrim($line);
}
fclose($file);
};
$c = 0;
$item = [];
$text = '';
$n = 0;
foreach ($gen($file) as $line) {
if ('' !== $line) {
if (0 === $n) {
$item['id'] = $line;
$n++;
}
elseif (1 === $n) {
$p = explode('-->', $line);
$start = str_replace(',', '.', trim($p[0]));
$end = str_replace(',', '.', trim($p[1]));
$startTime = self::toMilliSeconds(str_replace('.', ':', $start));
$endTime = self::toMilliSeconds(str_replace('.', ':', $end));
$item['start'] = $startTime / 1000;
$item['end'] = $endTime / 1000;
$item['startString'] = $start;
$item['endString'] = $end;
$item['duration'] = $endTime - $startTime;
$n++;
}
else {
if ($n >= 2) {
if ('' !== $text) {
$text .= PHP_EOL;
}
$text .= $line;
}
}
}
else {
if (0 !== $n) {
$item['text'] = $text;
$ret[] = $item;
$text = '';
$n = 0;
}
}
$c++;
}
return $ret;
}
private static function toMilliSeconds(string $duration): int
{
$p = explode(':', $duration);
return (int)$p[0] * 3600000 + (int)$p[1] * 60000 + (int)$p[2] * 1000 + (int)$p[3];
}
}
Or check it out here: https://github.com/lingtalfi/VideoSubtitles

You can use this project: https://github.com/captioning/captioning
Sample code:
<?php
require_once __DIR__.'/../vendor/autoload.php';
use Captioning\Format\SubripFile;
try {
$file = new SubripFile('your_file.srt');
foreach ($file->getCues() as $line) {
echo 'start: ' . $line->getStart() . "<br />\n";
echo 'stop: ' . $line->getStop() . "<br />\n";
echo 'startMS: ' . $line->getStartMS() . "<br />\n";
echo 'stopMS: ' . $line->getStopMS() . "<br />\n";
echo 'text: ' . $line->getText() . "<br />\n";
echo "=====================<br />\n";
}
} catch(Exception $e) {
echo "Error: ".$e->getMessage()."\n";
}
Sample output:
> php index.php
start: 00:01:48,387<br />
stop: 00:01:53,269<br />
startMS: 108387<br />
stopMS: 113269<br />
text: ┘ç┘à╪د┘ç┘┌»█î ╪▓█î╪▒┘┘ê█î╪│ ╪ذ╪د ┌ر█î┘█î╪ز ╪ذ┘┘ê╪▒█î ┘ê ┌ر╪»┌ر x265
=====================<br />
start: 00:02:09,360<br />
stop: 00:02:12,021<br />
startMS: 129360<br />
stopMS: 132021<br />
text: .┘à╪د ┘╪ذ╪د┘è╪» ╪ز┘┘ç╪د┘è┘è ╪د┘è┘╪ش╪د ╪ذ╪د╪┤┘è┘à -
┌╪▒╪د ╪ا<br />
=====================<br />
start: 00:02:12,022<br />
stop: 00:02:14,725<br />
startMS: 132022<br />
stopMS: 134725<br />
text: ..╪د┌»┘ç ┘╛╪»╪▒╪ز -
.╪د┘ê┘ ┘ç┘è┌┘ê┘é╪ز ┘à╪ز┘ê╪ش┘ç ╪▒┘╪ز┘┘à┘ê┘ ┘┘à┘è╪┤┘ç -<br />
=====================<br />

it can be done by using php line-break.
I could do it successfully
let me show my code
$srt=preg_split("/\\r\\n\\r\\n/",trim($movie->SRT));
$result[$i]['IMDBID']=$movie->IMDBID;
$result[$i]['TMDBID']=$movie->TMDBID;
here $movie->SRT is the subtitle of having format u posted in this question.
as we see, each time space is two new line,
hope u getting answer.

Simple, natural, trivial solution
srt subs look like this, and are separated by two newlines:
3
00:00:07,350 --> 00:00:09,780
The ability to destroy a planet is
nothing next to the power of the force
Obviously you want to parse the time, using dateFormat.parse which already exists in Java, so it is instant.
class Sub {
float start;
String text;
Sub(String block) {
this.start = null; this.text = null;
String[] lines = block.split("\n");
if (lines.length < 3) { return; }
String timey = lines[1].replaceAll(" .+$", "");
try {
DateFormat dateFormat = new SimpleDateFormat("HH:mm:ss,SSS");
Date zero = dateFormat.parse("00:00:00,000");
Date date = dateFormat.parse(timey);
this.start = (float)(date.getTime() - zero.getTime()) / 1000f;
} catch (ParseException e) {
e.printStackTrace();
}
this.text = TextUtils.join(" ", Arrays.copyOfRange(lines, 2, lines.length) );
}
}
Obviously, to get all the subs in the file
List<Sub> subs = new ArrayList<>();
String[] tt = fileText.split("\n\n");
for (String s:tt) { subs.add(new Sub(s)); }

Related

Error message for processing big data

I have a problem when i process files in php. When processing small files, it can be seen the output. But, when large files it shows the warning like this :
PHP Fatal error: Out of memory (allocated 1068498944) (tried to allocate 133955584 bytes) in C:\xampp\htdocs\ujian_online\application\views\data_soal\view_proses.php on line 644
we can see the code like this :
<?php
$no_urut = 0;
//print_r($array_query_simantic_reletedness);
foreach ($val_simantic_reletedness as $doc_simantic_word) {
foreach ($array_query_simantic_reletedness as $text_simantic_current_key => $text_simantic_current_val) {
//get fix val
$status = 0;
$spe = 0;
$sr = 0;
if ($text_simantic_current_key == $doc_simantic_word) {
$status++;
$spe = 1;
}
$where_get_kamus = array('kata' => $text_simantic_current_key);
$get_kamus = $this->master->find_data($where_get_kamus, 'tb_kamus')->row_array();
// print_r($get_kamus);
//split kamus by ';'
$kamus_synonim = explode(';', $get_kamus['synonim']);
//split kata by '.'
foreach ($kamus_synonim as $key_word_split => $val_word_split) {
$kata_current_kamus = explode(';', $val_word_split);
//print_r($kata_current_kamus);
foreach ($kata_current_kamus as $key_current_word_doc => $val_current_word_doc) {
//split by doc
$word_split_doc = explode('.', $val_current_word_doc);
if (empty($word_split_doc[0])) {
continue;
}
if (isset($word_split_doc[1])) {
$word_split_slash = explode('|', $word_split_doc[1]);
}
// print_r($word_split_slash);
// ---- foreach for checking ---
foreach ($word_split_slash as $val_current_word_check) {
//print_r($val_current_word_check);
$val_current_word_check = str_replace(' ', '', $val_current_word_check);
$val_current_word_check = str_replace("\r", '', $val_current_word_check);
$val_current_word_check = str_replace("\t", '', $val_current_word_check);
$val_current_word_check = str_replace("\n", '', $val_current_word_check);
// print_r($val_current_word_check);
if ($doc_simantic_word == strtolower($val_current_word_check)) {
$status++;
// penentuan dmax
$where_get_spe_check = array('kata' => $doc_simantic_word);
$get_word_spe = $this->master->find_data($where_get_spe_check, 'tb_kamus')->row_array();
if ($get_kamus['dmax'] > $get_word_spe['dmax']) {
$dmax_spe = $get_kamus['dmax'];
} else {
$dmax_spe = $get_word_spe['dmax'];
}
//count spe
$spe = $get_kamus['dmax'] / $dmax_spe;
}
}
//---- end foreach for checking ----------------
//echo '<br>';
}
}
$no_urut++;
if ($status > 0) {
$val_view = 1;
} else {
$val_view = 0;
$spe = 0;
}
$sr = $spe * $val_view;
echo "
<tr>
<td>$no_urut</td>
<td>$text_simantic_current_key " . '=' . " $doc_simantic_word</td>
<td>$val_view</td>
<td>$spe</td>
<td>$sr</td>
</tr>
";
// echo '||';echo $no_urut;echo '|'; echo $text_simantic_current_key ; echo '=' ;echo $doc_simantic_word;echo '|'; echo $val_view;echo '|'; echo $spe;echo '|'; echo $sr;
}
}//end simantic word doc
?>
and i give the highlight for line 644 is
> echo "
<tr>
<td>$no_urut</td>
<td>$text_simantic_current_key " . '=' . " $doc_simantic_word</td>
<td>$val_view</td>
<td>$spe</td>
<td>$sr</td>
</tr>
> ";
But when i check my code in https://phpcodechecker.com/
i have the different notice like this :
Error: There is 1 more opening parenthesis '(' found
This count is unaware if parenthesis are inside of a string
and we can see the code :
$get_data_tema = $this->master->find_data(array('id_tema'=>$key_val_final),'tb_tema')->row_array();
echo 'nama tema " <b>'.$get_data_tema['nama'].'</b>"';
}
$array_post_input = array('text_soal'=> $data_post['text_soal'],
'a' =>$data_post['a'],
'b' =>$data_post['b'],
'c' =>$data_post['c'],
'd' =>$data_post['d'],
'e' =>$data_post['e'],
'answer' =>$data_post['answer'],
'guru_id'=> $this->session->userdata('id_current'),
'tema_id'=>0,
'mapel_id' =>$data_post['mapel_id'],
'gbrsoal' =>$data_post['gbrsoal'],'label_id'=>0,
);
$this->master->insert_data('tb_soal',$array_post_input);
I am using xampp version v3.2.2 and windows 8.1, RAM 8 GB with 6,94 useable, PHP version 5.6.15 with codeigniter framework and i have set in memory limit inside php.ini 51200000000000 M. So, what should i do to fix my problem? Which line must be i repair it? Thanks.
Solution for the first problem: With a large amount of data, the printing process can be very resource expensive. So, save the printable parts into an array and print them once, after all foreach iterations are finished. Note: look for $results in my code.
<?php
$results = [];
$no_urut = 0;
foreach ($val_simantic_reletedness as $doc_simantic_word) {
foreach ($array_query_simantic_reletedness as $text_simantic_current_key => $text_simantic_current_val) {
//...
$sr = $spe * $val_view;
$results[] = "<tr>
<td>$no_urut</td>
<td>$text_simantic_current_key " . '=' . " $doc_simantic_word</td>
<td>$val_view</td>
<td>$spe</td>
<td>$sr</td>
</tr>";
}
}
echo implode('', $results);
Solution for the second problem (if I understood it correctly): remove the "}" character which appears right after the line
echo 'nama tema " <b>' . $get_data_tema['nama'] . '</b>"';
Good luck.
try to insert ini_set('memory_limit', '1024M'); in php file

PHP - Searching words in a .txt file

I have just learnt some basic skill for html and php and I hope someone could help me .
I had created a html file(a.html) with a form which allow students to input their name, student id, class, and class number .
Then, I created a php file(a.php) to saved the information from a.html into the info.txt file in the following format:
name1,id1,classA,1
name2,id2,classB,24
name3,id3,classA,15
and so on (The above part have been completed with no problem) .
After that I have created another html file(b.html), which require user to enter their name and id in the form.
For example, if the user input name2 and id2 in the form, then the php file(b.php) will print the result:
Class: classB
Class Number: 24
I have no idea on how to match both name and id at the same time in the txt file and return the result in b.php
example data:
name1,id1,classA,1
name2,id2,classB,24
name3,id3,classA,15
<?php
$name2 = $_POST['name2'];
$id2 = $_POST['id2'];
$data = file_get_contents('info.txt');
if($name2!='')
$konum = strpos($data, $name2);
elseif($id2!='')
$konum = strpos($data, $id2);
if($konum!==false){
$end = strpos($data, "\n", $konum);
$start = strrpos($data, "\n", (0-$end));
$row_string = substr($data, $start, ($end - $start));
$row = explode(",",$row_string);
echo 'Class : '.$row[2].'<br />';
echo 'Number : '.$row[3].'<br />';
}
?>
Iterate through lines until you find your match. Example:
<?php
$csv=<<<CSV
John,1,A
Jane,2,B
Joe,3,C
CSV;
$data = array_map('str_getcsv', explode("\n", $csv));
$get_name = function($number, $letter) use ($data) {
foreach($data as $row)
if($row[1] == $number && $row[2] == $letter)
return $row[0];
};
echo $get_name('3', 'C');
Output:
Joe
You could use some simple regex. For example:
<?php
$search_name = (isset($_POST['name'])) ? $_POST['name'] : exit('Name input required.');
$search_id = (isset($_POST['id'])) ? $_POST['id'] : exit('ID input required.');
// First we load the data of info.txt
$data = file_get_contents('info.txt');
// Then we create a array of lines
$lines = preg_split('#\\n#', $data);
// Now we can loop the lines
foreach($lines as $line){
// Now we split the line into parts using the , seperator
$line_parts = preg_split('#\,#', $line);
// $line_parts[0] contains the name, $line_parts[1] contains the id
if($line_parts[0] == $search_name && $line_parts[1] == $search_id){
echo 'Class: '.$line_parts[2].'<br>';
echo 'Class Number: '.$line_parts[3];
// No need to execute the script any further.
break;
}
}
You can run this. I think it is what you need. Also if you use post you can change get to post.
<?php
$name = $_GET['name'];
$id = $_GET['id'];
$students = fopen('info.txt', 'r');
echo "<pre>";
// read each line of the file one by one
while( $student = fgets($students) ) {
// split the file and create an array using the ',' delimiter
$student_attrs = explode(',',$student);
// first element of the array is the user name and second the id
if($student_attrs[0]==$name && $student_attrs[1]==$id){
$result = $student_attrs;
// stop the loop when it is found
break;
}
}
fclose($students);
echo "Class: ".$result[2]."\n";
echo "Class Number: ".$result[3]."\n";
echo "</pre>";
strpos can help you find a match in your file. This script assumes you used line feed characters to separate the lines in your text file, and that each name/id pairing is unique in the file.
if ($_POST) {
$str = $_POST["name"] . "," . $_POST["id"];
$file = file_get_contents("info.txt");
$data = explode("\n", $file);
$result = array();
$length = count($data);
$i = 0;
do {
$match = strpos($data[$i], $str, 0);
if ($match === 0) {
$result = explode(",", $data[$i]);
}
} while (!$result && (++$i < $length));
if ($result) {
print "Class: " . $result[2] . "<br />" . "Class Number: " . $result[3];
} else {
print "Not found";
}
}

How to concatenate string continuously in php?

<?php
$test = ' /clothing/men/tees';
$req_url = explode('/', $test);
$c = count($req_url);
$ex_url = 'http://www.test.com/';
for($i=1; $c > $i; $i++){
echo '/'.'<a href="'.$ex_url.'/'.$req_url[$i].'">
<span>'.ucfirst($req_url[$i]).'</span>
</a>';
//echo '<br/>'.$ex_url;....//last line
}
?>
OUTPUT - 1 //when comment last line
/ Clothing / Men / Tees
OUTPUT - 2 //when un-comment last line $ex_url shows
/ Clothing
http://www.test.com// Men
http://www.test.com// Tees
http://www.test.com/
1. Required output -
In span - / Clothing / Men / Tees and last element should not be clickable
and link should created in this way
http://www.test.com/clothing/Men/tees -- when click on Tees
http://www.test.com/clothing/Men -- when click on Men
...respectively
2. OUTPUT 2 why it comes like that
Try this:
<?php
$test = '/clothing/men/tees';
$url = 'http://www.test.com';
foreach(preg_split('!/!', $test, -1, PREG_SPLIT_NO_EMPTY) as $e) {
$url .= '/'.$e;
echo '/<span>'.ucfirst($e).'</span>';
}
?>
Output:
/Clothing/Men/Tees
HTML output:
/<span>Clothing</span>/<span>Men</span>/<span>Tees</span>
Try using foreach() to iterate the array and you'll have to keep track of the path after the url. Try it like so (tested and working code):
<?php
$test = '/clothing/men/tees';
$ex_url = 'http://www.test.com';
$items = explode('/', $test);
array_shift($items);
$path = '';
foreach($items as $item) {
$path .= '/' . $item;
echo '/ <span>' . ucfirst($item) . '</span>';
}
Try this.
<?php
$test = '/clothing/men/tees';
$req_url = explode('/', ltrim($test, '/'));
$ex_url = 'http://www.test.com/';
$stack = array();
$reuslt = array_map(function($part) use($ex_url, &$stack) {
$stack[] = $part;
return sprintf('%s', $ex_url, implode('/', $stack), ucfirst($part));
}, $req_url);
print_r($reuslt);
<?php
$sTest= '/clothing/men/tees';
$aUri= explode( '/', $sTest );
$sBase= 'http://www.test.com'; // No trailing slash
$sPath= $sBase; // Will grow per loop iteration
foreach( $aUri as $sDir ) {
$sPath.= '/'. $sDir;
echo ' / '. ucfirst( $sDir ). ''; // Unnecessary <span>
}
?>

find a element in html and explode it for stock

I want to retrieve an HTML element in a page.
<h2 id="resultCount" class="resultCount">
<span>
Showing 1 - 12 of 40,923 Results
</span>
</h2>
I have to get the total number of results for the test in my php.
For now, I get all that is between the h2 tags and I explode the first time with space.
Then I explode again with the comma to concatenate able to convert numbers results in European format. Once everything's done, I test my number results.
define("MAX_RESULT_ALL_PAGES", 1200);
$queryUrl = AMAZON_TOTAL_BOOKS_COUNT.$searchMonthUrlParam.$searchYearUrlParam.$searchTypeUrlParam.urlencode($keyword)."&page=".$pageNum;
$htmlResultCountPage = file_get_html($queryUrl);
$htmlResultCount = $htmlResultCountPage->find("h2[id=resultCount]");
$resultCountArray = explode(" ", $htmlResultCount[0]);
$explodeCount = explode(',', $resultCountArray[5]);
$europeFormatCount = '';
foreach ($explodeCount as $val) {
$europeFormatCount .= $val;
}
if ($europeFormatCount > MAX_RESULT_ALL_PAGES) {*/
$queryUrl = AMAZON_SEARCH_URL.$searchMonthUrlParam.$searchYearUrlParam.$searchTypeUrlParam.urlencode($keyword)."&page=".$pageNum;
}
At the moment the total number of results is not well recovered and the condition does not happen even when it should.
Someone would have a solution to this problem or any other way?
I would simply fetch the page as a string (not html) and use a regular expression to get the total number of results. The code would look something like this:
define('MAX_RESULT_ALL_PAGES', 1200);
$queryUrl = AMAZON_TOTAL_BOOKS_COUNT . $searchMonthUrlParam . $searchYearUrlParam . $searchTypeUrlParam . urlencode($keyword) . '&page=' . $pageNum;
$queryResult = file_get_contents($queryUrl);
if (preg_match('/of\s+([0-9,]+)\s+Results/', $queryResult, $matches)) {
$totalResults = (int) str_replace(',', '', $matches[1]);
} else {
throw new \RuntimeException('Total number of results not found');
}
if ($totalResults > MAX_RESULT_ALL_PAGES) {
$queryUrl = AMAZON_SEARCH_URL . $searchMonthUrlParam . $searchYearUrlParam . $searchTypeUrlParam . urlencode($keyword) . '&page=' . $pageNum;
// ...
}
A regex would do it:
...
preg_match("/of ([0-9,]+) Results/", $htmlResultCount[0], $matches);
$europeFormatCount = intval(str_replace(",", "", $matches[1]));
...
Please try this code.
define("MAX_RESULT_ALL_PAGES", 1200);
// new dom object
$dom = new DOMDocument();
// HTML string
$queryUrl = AMAZON_TOTAL_BOOKS_COUNT.$searchMonthUrlParam.$searchYearUrlParam.$searchTypeUrlParam.urlencode($keyword)."&page=".$pageNum;
$html_string = file_get_contents($queryUrl);
//load the html
$html = $dom->loadHTML($html_string);
//discard white space
$dom->preserveWhiteSpace = TRUE;
//Get all h2 tags
$nodes = $dom->getElementsByTagName('h2');
// Store total result count
$totalCount = 0;
// loop over the all h2 tags and print result
foreach ($nodes as $node) {
if ($node->hasAttributes()) {
foreach ($node->attributes as $attribute) {
if ($attribute->name === 'class' && $attribute->value == 'resultCount') {
$inner_html = str_replace(',', '', trim($node->nodeValue));
$inner_html_array = explode(' ', $inner_html);
// Print result to the terminal
$totalCount += $inner_html_array[5];
}
}
}
}
// If result count grater than 1200, do this
if ($totalCount > MAX_RESULT_ALL_PAGES) {
$queryUrl = AMAZON_SEARCH_URL.$searchMonthUrlParam.$searchYearUrlParam.$searchTypeUrlParam.urlencode($keyword)."&page=".$pageNum;
}
Give this a try:
$match =array();
preg_match('/(?<=of\s)(?:\d{1,3}+(?:,\d{3})*)(?=\sResults)/', $htmlResultCount, $match);
$europeFormatCount = str_replace(',','',$match[0]);
The RegEx reads the number between "of " and " Results", it matches numbers with ',' seperator.

Need some help with XML parsing

The XML feed is located at: http://xml.betclick.com/odds_fr.xml
I need a php loop to echo the name of the match, the hour, and the bets options and the odds links.
The function will select and display ONLY the matchs of the day with streaming="1" and the bets type "Ftb_Mr3".
I'm new to xpath and simplexml.
Thanks in advance.
So far I have:
<?php
$xml_str = file_get_contents("http://xml.betclick.com/odds_fr.xml");
$xml = simplexml_load_string($xml_str);
// need xpath magic
$xml->xpath();
// display
?>
Xpath is pretty simple once you get the hang of it
you basically want to get every match tag with a certain attribute
//match[#streaming=1]
will work pefectly, it gets every match tag from underneath the parent tag with the attribute streaming equal to 1
And i just realised you also want matches with a bets type of "Ftb_Mr3"
//match[#streaming=1]/bets/bet[#code="Ftb_Mr3"]
This will return the bet node though, we want the match, which we know is the grandparent
//match[#streaming=1]/bets/bet[#code="Ftb_Mr3"]/../..
the two dots work like they do in file paths, and gets the match.
now to work this into your sample just change the final bit to
// need xpath magic
$nodes = $xml->xpath('//match[#streaming=1]/bets/bet[#code="Ftb_Mr3"]/../..');
foreach($nodes as $node) {
echo $node['name'].'<br/>';
}
to print all the match names.
I don't know how to work xpath really, but if you want to 'loop it', this should get you started:
<?php
$xml = simplexml_load_file("odds_fr.xml");
foreach ($xml->children() as $child)
{
foreach ($child->children() as $child2)
{
foreach ($child2->children() as $child3)
{
foreach($child3->attributes() as $a => $b)
{
echo $a,'="',$b,"\"</br>";
}
}
}
}
?>
That gets you to the 'match' tag which has the 'streaming' attribute. I don't really know what 'matches of the day' are, either, but...
It's basically right out of the w3c reference:
http://www.w3schools.com/PHP/php_ref_simplexml.asp
I am using this on a project. Scraping Beclic odds with:
<?php
$match_csv = fopen('matches.csv', 'w');
$bet_csv = fopen('bets.csv', 'w');
$xml = simplexml_load_file('http://xml.cdn.betclic.com/odds_en.xml');
$bookmaker = 'Betclick';
foreach ($xml as $sport) {
$sport_name = $sport->attributes()->name;
foreach ($sport as $event) {
$event_name = $event->attributes()->name;
foreach ($event as $match) {
$match_name = $match->attributes()->name;
$match_id = $match->attributes()->id;
$match_start_date_str = str_replace('T', ' ', $match->attributes()->start_date);
$match_start_date = strtotime($match_start_date_str);
if (!empty($match->attributes()->live_id)) {
$match_is_live = 1;
} else {
$match_is_live = 0;
}
if ($match->attributes()->streaming == 1) {
$match_is_running = 1;
} else {
$match_is_running = 0;
}
$match_row = $match_id . ',' . $bookmaker . ',' . $sport_name . ',' . $event_name . ',' . $match_name . ',' . $match_start_date . ',' . $match_is_live . ',' . $match_is_running;
fputcsv($match_csv, explode(',', $match_row));
foreach ($match as $bets) {
foreach ($bets as $bet) {
$bet_name = $bet->attributes()->name;
foreach ($bet as $choice) {
// team numbers are surrounded by %, we strip them
$choice_name = str_replace('%', '', $choice->attributes()->name);
// get the float value of odss
$odd = (float)$choice->attributes()->odd;
// concat the row to be put to csv file
$bet_row = $match_id . ',' . $bet_name . ',' . $choice_name . ',' . $odd;
fputcsv($bet_csv, explode(',', $bet_row));
}
}
}
}
}
}
fclose($match_csv);
fclose($bet_csv);
?>
Then loading the csv files into mysql. Running it once a minute, works great so far.

Categories