The Simple HTML DOM library is used to extract the timestamp from a webpage. strtotime is then used to convert the extracted timestamp to a MySQL timestamp.
Problem: When strtotime() is usede on a valid timestamp, NULL is returned (See 2:). However when Simple HTML DOM is not used in the 2nd example, everything works properly.
What is happening, and how can this be fixed??
Output:
1:2013-03-03, 12:06PM
2:
3:1970-01-01 00:00:00
var_dump($time)
string(25) "2013-03-03, 12:06PM"
PHP
include_once(path('app') . 'libraries/simple_html_dom.php');
// Convert to HTML DOM object
$html = new simple_html_dom();
$html_raw = '<p class="postinginfo">Posted: <date>2013-03-03, 12:06PM EST</date></p>';
$html->load($html_raw);
// Extract timestamp
$time = $html->find('.postinginfo', 0);
$pattern = '/Posted: (.*?) (.).T/s';
$matches = '';
preg_match($pattern, $time, $matches);
$time = $matches[1];
echo '1:' . $time . '<br>';
echo '2:' . strtotime($time) . '<br>';
echo '3:' . date("Y-m-d H:i:s", strtotime($time));
2nd Example
PHP (Working, without Simple HTML DOM)
// Extract posting timestamp
$time = 'Posted: 2013-03-03, 12:06PM EST';
$pattern = '/Posted: (.*?) (.).T/s';
$matches = '';
preg_match($pattern, $time, $matches);
$time = $matches[1];
echo '1:' . $time . '<br>';
echo '2:' . strtotime($time) . '<br>';
echo '3:' . date("Y-m-d H:i:s", strtotime($time));
Output (Correct)
1:2013-03-03, 12:06PM
2:1362312360
3:2013-03-03 12:06:00
var_dump($time)
string(19) "2013-03-03, 12:06PM"
According to your var_dump(), the $time string you extracted from the HTML code is 25 characters long.
The string you see, "2013-03-03, 12:06PM", is only 19 characters long.
So, where are those 6 extra characters? Well, it's pretty obvious, really: the string you're trying to parse is really "<date>2013-03-03, 12:06PM". But when you print it into an HTML document, that <date> is parsed as an HTML tag by the browser.
To see it, use the "View Source" function in your browser. Or, much better yet, use htmlspecialchars() when printing any variables that are not supposed to contain HTML code.
Related
My string is given below.
$str = 'Hi $name. This is a reminder of your appointment at $dateformat("h:i A") on $dateformat("M d,Y").';
Suppose date is (coming from DB),
$appoinmentDate = '2019-02-02'
Now, I want to do following things with string
replace $name with John(name is coming from DB)
replace $dateformat("h:i A") with 08:52 AM (time is coming from DB)
$dateformat("M d,Y") should be replace with date("M d,Y",$appoinmentDate) and its result
should be Feb 02,2019 Like this
Important thing: $dateformat("h:i A") and $dateformat("M d,Y") will be dynamic it can be any possible date format
Can any one please help me to find out the solution?
You could use a regular expression that parses $str looking for any word starting with a $, and possibly followed by a ("...") sequence.
If there is a parentheses sequence, take all between the "" and use this as date format in a DateTime->format() call.
Finally, do 1 sprintf() with all the replacements and date/time insertions done at runtime
EDIT
Try this testscript :
<?php
//creating test string
$teststr='Hi $name. This is a reminder of your appointment at $dateformat("h:i A") on $dateformat("M d,Y").' ;
// parse all date format matches
$regex= "/[\$]dateformat\(?\"?([^\"]*)\"\)/" ;
$DTmatches=Array();
// try to match all $dateformat() occurences
if (preg_match_all($regex, $teststr, $DTmatches) )
{
echo sprintf("Found following matches: %s \r\n", print_r($DTmatches,true));
// replace this with your DB timestamp
$theDT = new DateTime("now", new DateTimeZone("UTC")) ;
// count the occurences
$nofOccurences = count($DTmatches[0]) ;
// run over each occurence
for ($i=0; $i < $nofOccurences; $i++)
$teststr = str_replace( $DTmatches[0][$i], $theDT->format($DTmatches[1][$i]), $teststr ) ;
}
else
echo "no dateformat() found" ;
echo sprintf("Replaced test string is: %s \r\n", $teststr) ;
?>
$copy_date = 'Hi test. This is a reminder of your appointment at $dateformat("h:i A") on $dateformat("M d,Y").';
$str = preg_replace('/\$dateformat\(\"h:i A\"\)/', date("h:i A"), $copy_date);
$str1 = preg_replace('/\$dateformat\(\"...../', "", $str);
$str2 = preg_replace('/\"\)/', "'Your Date'", $str1);
In the above code you can replace first date with date format and in other you can set date or text instead of Your date.
If you want to replace both data then remove date("h:i A").
I'm processing each commit by using:
$author = $commit['author']['raw'];
$commitMessage = $commit['message'];
$commitMessage = trim(str_replace("\n", "", $commitMessage));
$date = str_replace('T', ' ', $commit['date']) . "\n";
$date = explode("+", $date);
$date = $date[0];
$message = $author . "\n" . $commitMessage . "\n" . $date;
And I get """ prepended and appended to the array, so:
"""
some data
"""
Does anyone have any idea on how to remove it?
I've tried using str_replace with different needles, also tried using strpos to see if it even is detected.
Using explode doesn't help either...
I fixed it...
I was using dd() (since I was just testing the output of the array), and I switched just now by returning it as a normal array.
I am writing a PHP HTML page scraper program and I need to find out the date it has been updated.
I did this $html = file_get_html(xyz.com) to get the HTML. One line of the HTML has the date like this 10/24/2016.
I did this:
if (strpos($html, '7nbsp;') !== false) {
if (strpos($html, ' </a>') !== false) {
echo "How to print drawing date--here!";
}
Now here is the dilemma, I cannot search 10/24/2016 because I have no way of knowing when the new date is when the site is updated, it could be 10/30/2016 or 11/12/2016...
Ideally, I would like the date to be in a string, like $date = "11/17/2016".
How do I search the date itself?
This code will work for you:
preg_match('/\ ([0-9]{1,2}\/[0-9]{1,2}\/[0-9]{4})/', $html, $matches);
This is a regex that searches for a date (as long as the date is in correct format). Founded matches will be stored in '$matches' variable.
#krasipenkov was close, but the OP asked for it to be in $date var:
$html = 'lblah
balh asdf asd
<mickey mouse="disney">f3rt6wergsdfg 1/19/2016 <more stuff="here">etc
asdf';
preg_match('/\ ([0-9]{1,2}\/[0-9]{1,2}\/[0-9]{4})/', $html, $matches);
$date = $matches[1];
echo "your date found is $date";
[see it run] http://sandbox.onlinephpfunctions.com/code/27419098cf4bc48a5ca2c683b046679b6c0af85c
I need a PHP script to loop through all .html files in a directory and in each one find the first instance of a long date (i.e. August 25th, 2014) and then adds a tag with that date in short format (i.e. <p class="date">08/25/14</p>).
Has anyone done something like this before? I'm guessing you'd explode the string and use a complex case statement to convert the month names and days to regular numbers and then implode using /.
But I'm having trouble figuring out the regular expression to use for finding the first long date.
Any help or advice would be greatly appreciated!
Here's how I'd do it in semi-pseudo-code...
Loop through all the files using whatever floats your boat (glob() is an obvious choice)
Load the HTML file into a DOMDocument, eg
$doc = new DOMDocument();
$doc->loadHTMLFile($filePath);
Get the body text as a string
$body = $doc->getElementsByTagName('body');
$bodyText = $body->item(0)->textContent; // assuming there's at least one body tag
Find your date string via this regex
preg_match('/(January|February|March|April|May|June|July|August|September|October|November|December) \d{1,2}(st|nd|rd|th)?, \d{4}/', $bodyText, $matches);
Load this into a DateTime object and produce a short date string
$dt = DateTime::createFromFormat('F jS, Y', $matches[0]);
$shortDate = $dt->format('m/d/y');
Create a <p> DOMElement with the $shortDate text content, insert it into the DOMDocument where you want and write back to the file using $doc->saveHTMLFile($filePath)
I incorporated the helpful response above into what I already had and it seems to work. I'm sure it's far from ideal but it still serves my purpose. Maybe it might be helpful to others:
<?php
$dir = "archive";
$a = scandir($dir);
$a = array_diff($a, array(".", ".."));
foreach ($a as $value) {
echo '</br>File name is: ' . $value . "<br><br>";
$contents = file_get_contents("archive/".$value);
if (preg_match('/(January|February|March|April|May|June|July|August|September|October|November|December) \d{1,2}(st|nd|rd|th)?, \d{4}/', $contents, $matches)) {
echo 'the date found is: ' . $matches[0] . "<br><br>";
$dt = DateTime::createFromFormat('F jS, Y', $matches[0]);
$shortDate = $dt->format('m/d/y');
$dateTag = "\n" . '<p class="date">' . $shortDate . '</p>';
$filename ="archive/".$value;
$file = fopen($filename, "a+");
fwrite($file, $dateTag);
fclose($file);
echo 'Date tag added<br><br>';
} else {
echo "ERROR: No date found<br><br>";
}
}
?>
The code assumes the files to modify are in a directory called "archive" that resides in the same directory as the script.
Needed the two different preg_match lines because I found out some dates are listed with the ordinal suffix (i.e. August 24th, 2005) and some are not (i.e. August 24, 2005). Couldn't quite puzzle out exactly how to get a single preg_match that handles both.
EDIT: replaced double preg_match with single one using \d{1,2}(st|nd|rd|th)? as suggested.
Using Zend _gdata. For some reason, recently the $when string is no longer utf-8. I need to convert it to utf-8. All the other fields are working fine.
foreach ($feed as $event) { //iterating through all events
$contentText = stripslashes($event->content->text); //striping any escape character
$contentText = preg_replace('/\<br \/\>[\n\t\s]{1,}\<br \/\>/','<br />',stripslashes($event->content->text)); //replacing multiple breaks with a single break
$contentText = explode('<br />',$contentText); //splitting data by break tag
$eventData = filterEventDetails($contentText);
$when = $eventData['when'];
$where = $eventData['where'];
$duration = $eventData['duration'];
$title = stripslashes($event->title);
echo '<li class="pastShows">' . $when . " - " . $title . ", " . $where . '</li>';
}
How do I make $when utf-8?
Thanks!
Depending on what encoding that string is using, you should be able to encode it to UTF-8 using one of the following functions :
utf8_encode()
iconv()
For example :
$when = utf8_encode($eventData['when']);
Or :
$when = iconv('ISO-8859-1', 'UTF-8', $eventData['when']);
If the string is in Latin1 you can just do what Pascal suggests.
Otherwise you need to find out which encoding it is.
Therefor check your php.ini settings or you can try to detect it by mb_detect_encoding (be aware it's not fail prove)