I have a strange problem...
I would like to search in a logfile.
$lines = file($file);
$sampleName = "T3173sGas";
foreach ($lines as &$line) {
if (strpos($line, $sampleName) !== false) {
echo "yes";
}
}
This code is not working, $sampleName is to 100% in the log file. The search works just for single characters; for example "T" or "3" but not for "T3".
Do you have an idea why it's not working? Is the encoding of the logfile wrong?
Thanks a lot for your help!
If you can only find single characters I would assume that your logfile is in some multi-byte character set like UTF-16. As you already assume similar, next step for you is to consult the documentation / specification of the logfile you're trying to operate with regarding the character encoding.
You then can use character-encoding specific string functions, the package is called http://php.net/mbstring.
$encoding = ... ; // encoding of logfile
if (mb_strpos($line, $sampleName, 0, $encoding) !== false) {
echo "yes";
}
This may work, it searches for the entire string
<?php
$filename = 'test.php';
$file = file_get_contents($filename);
$sampleName = "T3173sGas";
if(strlen(strstr($file,$sampleName))>0)
{
echo "yes";
}
?>
Related
First excuse me for the bad english.
I am trying to build a php script to search multiple webpages from a .txt file for specific word.
More specific:
I have a .txt file where i have stored many urls (every url is on one line, so if i have 10 urls the file have 10 lines) and i want the script to check the webpage content of each url for a specific word. So if the word is found on the webpage the script will return ONLINE othewise will return DOWN.
I build the script but the problem is that it always return ONLINE even if the url from file doesn't have the specific word in it's webpage content.
<?php
$allads = file("phpelist.txt");
print("Checking urls: <br><br><br><strong>");
for($index = 0; $index <count($allads); $index++)
{
$allads[$index] = ereg_replace("\n", "", $allads[$index]);
$data = file_get_contents('$allads[$index]');
$regex = '/save/';
if (preg_match($regex, $data)) {
echo "$allads[$index]</strong>...ONLINE<br><strong>";
} else {
echo "$allads[$index]</strong>...DOWN<br><strong>";
}
}
print("</strong><br><br><br>I verified all urls from file!");
?
To search the particular webpage for a given string, I'd use stripos() (case-insensitive) or strpos() (case-sensitive) instead of regular expressions:
if( stripos(haystack, needle) !== FALSE ) {
//the webpage contains the word
}
An example:
$str = 'sky is blue';
$wordToSearchFor = 'sky';
if (strpos($str, $wordToSearchFor) !== false) {
echo 'true';
}
else {
echo 'Uh oh.';
}
Demo!
Although, programmitcally skimming through webpages isn't considered a good practice and shouldn't be done unless it's absolutely necessary.
UPDATE:
In your file_get_contents call you're doing:
$data = file_get_contents('$allads[$index]');
You're using single quotes, and the variable values do not get replaced. You'll have to use double quotes to have file_get_contents fetch the actual URL. Replace it with:
$data = file_get_contents("$allads[$index]");
Another thing I noticed is that you're using the deprecated ereg_replace() function in your code. See the red box? Relying on depreacted functions are highly discouraged.
Your code, after all the above corrections, should look like:
$allads = file("phpelist.txt");
print("Checking urls: <br><br><br><strong>");
for($index = 0; $index <count($allads); $index++)
{
$allads[$index] = str_replace("\n", "", $allads[$index]);
$data = file_get_contents("$allads[$index]");
$searchTerm = 'the';
if (stripos($data, $searchTerm) !== false) {
echo "$allads[$index]</strong>...ONLINE<br><strong>";
}
else
{
echo "$allads[$index]</strong>...DOWN<br><strong>";
}
}
print("</strong><br><br><br>I verified all urls from file!");
?>
I'm running into an issue with finding specific text and replacing it with alternative text.
I'm testing my code below with .rtf and .txt files only. I'm also ensuring the files are writable, from within my server.
It's a hit and miss situation, and I'm curious if my code is wrong, or if this is just weirdness of opening and manipulating files.
<?php
$filelocation = '/tmp/demo.txt';
$firstname = 'John';
$lastname = 'Smith';
$output = file_get_contents($filelocation);
$output = str_replace('[[FIRSTNAME]]', $firstname, $output);
$output = str_replace('[[LASTNAME]]', $lastname, $output);
$output = str_replace('[[TODAY]]', date('F j, Y'), $output);
// rewrite file
file_put_contents($filelocation, $output);
?>
So, inside the demo.txt file I have about a full page of text with [[FIRSTNAME]], [[LASTNAME]], and [[TODAY]] scattered around.
It's hit and miss with the find/replace. So far, [[TODAY]] is always replaced correctly, while the names aren't.
Has anyone had this same issue?
(on a side note, I've checked error logs and so far no PHP warning/error is returned from opening the file, nor writing it)
Hard to say for sure without seeing the contents of demo.txt. My first guess is that it might be a problem with using brackets for your pointers. I would try changing to something not used by RTF like percent signs or an asterisk. ex: %%FIRSTNAME%%, **FIRSTNAME** (this is assuming of course that you have control of the contents of demo.txt.)
I have had this issue too. It seems like Microsoft Word inserts formatting codes in the tags. I have made a blog post about how to get around this on my technical blog.
http://tech.humlesite.eu/2017/01/13/using-regular-expression-to-merge-database-content-into-rich-text-format-template-documents/
The PHP example is shown here:
<?php
$file = file_get_contents('mergedoc.rtf');
// To temporary get rid of the escape characters...
$mergetext = str_replace("\\", "€€", $file);
// New seven part regex with default value detection
$regex2 = '/<<((?:€€[a-z0-9]*|\}|\{|\s)*)([a-z0-9.\-\+_æøåÆØÅA-Z]*)((?:€€[a-z0-9]*|\}|\{|\s)*)([a-z0-9.\-\+_æøåÆØÅA-Z]*)((?:€€[a-z0-9]*|\}|\{|\s)*)(?:\s*:(.*?)\s*)?((?:€€[a-z0-9]*|\}|\{|\s)*)>>/';
// Find all the matches in it....
preg_match_all($regex2,$mergetext, $out, PREG_SET_ORDER);
// Lets see the result
var_dump($out);
foreach ($out as $match) {
$whole_tag = $match[0]; // The part we actually replace.
$start = $match[1]; // The start formatting that has been injected in our tag, if any
$tag = $match[2]; // The tag word itself.
if (($match[4].$match[6]) != "") { //some sec-part tag or default value?
$end = $match[5]; // The end formatting that might be inserted.
if ($end == "") {
$end = $match[7]; // No end in 5, we try 7.
}
} else {
$end = $match[3]; // No second tag or default value, we find end in match-3
}
$secPartTag = $match[4]; // Do we have inserted some formatting inside the tag word too ?
if ($secPartTag != "") {
$tag .= $secPartTag; // Put it together with the tag word.
}
$default_value = $match[6];
// Simple selection of what we do with the tag.
switch ($tag) {
case 'COMPANY_NAME':
$txt = "MY MERGE COMPANY EXAMPLE LTD";
break;
case 'SOMEOTHERTAG':
$txt = "SOME OTHER TEXT XX";
break;
case 'THISHASDEFAULT':
$txt = "";
break;
default:
$txt = "NOTAG";
}
if ($txt == "") {
$txt = $default_value;
}
// Create RTF Line breaks in text, if any.
$txt = str_replace(chr(10), chr(10)."\\line", $txt);
// Do the replace in the file.
$mergetext = str_replace($whole_tag, $start.$txt.$end, $mergetext);
}
// Put back the escape characters.
$file = str_replace("€€", "\\", $mergetext);
// Save to file. Extention .doc makes it open in Word by default.
file_put_contents("ResultDoc.doc", $file);
?>
I was writing some commented PHP classes and I stumbled upon a problem. My name (for the #author tag) ends up with a ș (which is a UTF-8 character, ...and a strange name, I know).
Even though I save the file as UTF-8, some friends reported that they see that character totally messed up (È™). This problem goes away by adding the BOM signature. But that thing troubles me a bit, since I don't know that much about it, except from what I saw on Wikipedia and on some other similar questions here on SO.
I know that it adds some things at the beginning of the file, and from what I understood it's not that bad, but I'm concerned because the only problematic scenarios I read about involved PHP files. And since I'm writing PHP classes to share them, being 100% compatible is more important than having my name in the comments.
But I'm trying to understand the implications, should I use it without worrying? or are there cases when it might cause damage? When?
Indeed, the BOM is actual data sent to the browser. The browser will happily ignore it, but still you cannot send headers then.
I believe the problem really is your and your friend's editor settings. Without a BOM, your friend's editor may not automatically recognize the file as UTF-8. He can try to set up his editor such that the editor expects a file to be in UTF-8 (if you use a real IDE such as NetBeans, then this can even be made a project setting that you can transfer along with the code).
An alternative is to try some tricks: some editors try to determine the encoding using some heuristics based on the entered text. You could try to start each file with
<?php //Úτƒ-8 encoded
and maybe the heuristic will get it. There's probably better stuff to put there, and you can either google for what kind of encoding detection heuristics are common, or just try some out :-)
All in all, I recommend just fixing the editor settings.
Oh wait, I misread the last part: for spreading the code to anywhere, I guess you're safest just making all files only contain the lower 7-bit characters, i.e. plain ASCII, or to just accept that some people with ancient editors see your name written funny. There is no fail-safe way. The BOM is definitely bad because of the headers already sent thing. On the other side, as long as you only put UTF-8 characters in comments and so, the only impact of some editor misunderstanding the encoding is weird characters. I'd go for correctly spelling your name and adding a comment targeted at heuristics so that most editors will get it, but there will always be people who'll see bogus chars instead.
BOM would cause Headers already sent error, so, you can't use BOM in PHP files
This is an old post and have already been answered, but i can leave you some others resources that i found when i faced with this BOM issue.
http://people.w3.org/rishida/utils/bomtester/index.php with this page you can check if a specific file contains BOM.
There is also a handy script that outputs all files with BOM on your current directory.
<?php
function fopen_utf8 ($filename) {
$file = #fopen($filename, "r");
$bom = fread($file, 3);
if ($bom != b"\xEF\xBB\xBF")
{
return false;
}
else
{
return true;
}
}
function file_array($path, $exclude = ".|..|design", $recursive = true) {
$path = rtrim($path, "/") . "/";
$folder_handle = opendir($path);
$exclude_array = explode("|", $exclude);
$result = array();
while(false !== ($filename = readdir($folder_handle))) {
if(!in_array(strtolower($filename), $exclude_array)) {
if(is_dir($path . $filename . "/")) {
// Need to include full "path" or it's an infinite loop
if($recursive) $result[] = file_array($path . $filename . "/", $exclude, true);
} else {
if ( fopen_utf8($path . $filename) )
{
//$result[] = $filename;
echo ($path . $filename . "<br>");
}
}
}
}
return $result;
}
$files = file_array(".");
?>
I found that code at php.net
Dreamweaver also helps with this, it gives you the option to save the file and not include the BOM stuff
Its a late answer, but i still hope it helps.
Bye
Just so you know, there's an option in php, zend.multibyte, which allows php to read files with BOM without giving the Headers already sent error.
From the php.ini file:
; If enabled, scripts may be written in encodings that are incompatible with
; the scanner. CP936, Big5, CP949 and Shift_JIS are the examples of such
; encodings. To use this feature, mbstring extension must be enabled.
; Default: Off
;zend.multibyte = Off
In PHP, in addition to the "headers already sent" error, the presence of a BOM can also screw up the HTML in the browser in more subtle ways.
See Display problems caused by the UTF-8 BOM for an outline of the problem with some focus on PHP (W3C Internationalization).
When this occurs, not only is there usually a noticeable space at the top of the rendered page, but if you inspect the HTML in Firefox or Chrome, you may notice that the head section is empty and its elements appear to be in the body.
Of course viewing source will show everything where it was inserted, but the browser is interpreting it as body content (text) and inserting it there into the Document Object Model (DOM).
Or you could activate output buffering in php.ini which will solve the "headers already sent" problem. It is also very important to use output buffering for performance if your site has significant load.
BOM is actually the most efficient way of identifying an UTF-8 file, and both modern browsers and standards support and encourage the use of it in HTTP response bodies.
In case of PHP files its not the file but the generated output that gets sent as response so obviously it's not a good idea to save all PHP files with the BOM at the beginning, but it doesn't mean you shouldn't use the BOM in your response.
You can in fact safely inject the following code right before your doctype declaration (in case you are generating HTML as response):
<?="\u{FEFF}"?> (or before PHP 7.0.0: <?="\xEF\xBB\xBF"?>)
For further read: https://www.w3.org/International/questions/qa-byte-order-mark#transcoding
Adding to #omabena answer use this code to locate and remove bom from your files. Be sure to back up your files first just in case.
function fopen_utf8 ($filename) {
$file = #fopen($filename, "r");
$bom = fread($file, 3);
if ($bom != b"\xEF\xBB\xBF")
{
return false;
}
else
{
return true;
}
}
function file_array($path, $exclude = ".|..|design", $recursive = true) {
$path = rtrim($path, "/") . "/";
$folder_handle = opendir($path);
$exclude_array = explode("|", $exclude);
$result = array();
while(false !== ($filename = readdir($folder_handle))) {
if(!in_array(strtolower($filename), $exclude_array)) {
if(is_dir($path . $filename . "/")) {
// Need to include full "path" or it's an infinite loop
if($recursive) $result[] = file_array($path . $filename . "/", $exclude, true);
} else {
if ( fopen_utf8($path . $filename) )
{
//$result[] = $filename;
echo ($path . $filename . "<br>");
$pathname = $path . $filename; // change the pathname to your target file(s) which you want to remove the BOM.
$file_handler = fopen($pathname, "r");
$contents = fread($file_handler, filesize($pathname));
fclose($file_handler);
for ($i = 0; $i < 3; $i++){
$bytes[$i] = ord(substr($contents, $i, 1));
}
if ($bytes[0] == 0xef && $bytes[1] == 0xbb && $bytes[2] == 0xbf){
$file_handler = fopen($pathname, "w");
fwrite($file_handler, substr($contents, 3));
fclose($file_handler);
printf("%s BOM removed.<br/>n", $pathname);
}
}
}
}
}
return $result;
}
$files = file_array(".");
I was writing some commented PHP classes and I stumbled upon a problem. My name (for the #author tag) ends up with a ș (which is a UTF-8 character, ...and a strange name, I know).
Even though I save the file as UTF-8, some friends reported that they see that character totally messed up (È™). This problem goes away by adding the BOM signature. But that thing troubles me a bit, since I don't know that much about it, except from what I saw on Wikipedia and on some other similar questions here on SO.
I know that it adds some things at the beginning of the file, and from what I understood it's not that bad, but I'm concerned because the only problematic scenarios I read about involved PHP files. And since I'm writing PHP classes to share them, being 100% compatible is more important than having my name in the comments.
But I'm trying to understand the implications, should I use it without worrying? or are there cases when it might cause damage? When?
Indeed, the BOM is actual data sent to the browser. The browser will happily ignore it, but still you cannot send headers then.
I believe the problem really is your and your friend's editor settings. Without a BOM, your friend's editor may not automatically recognize the file as UTF-8. He can try to set up his editor such that the editor expects a file to be in UTF-8 (if you use a real IDE such as NetBeans, then this can even be made a project setting that you can transfer along with the code).
An alternative is to try some tricks: some editors try to determine the encoding using some heuristics based on the entered text. You could try to start each file with
<?php //Úτƒ-8 encoded
and maybe the heuristic will get it. There's probably better stuff to put there, and you can either google for what kind of encoding detection heuristics are common, or just try some out :-)
All in all, I recommend just fixing the editor settings.
Oh wait, I misread the last part: for spreading the code to anywhere, I guess you're safest just making all files only contain the lower 7-bit characters, i.e. plain ASCII, or to just accept that some people with ancient editors see your name written funny. There is no fail-safe way. The BOM is definitely bad because of the headers already sent thing. On the other side, as long as you only put UTF-8 characters in comments and so, the only impact of some editor misunderstanding the encoding is weird characters. I'd go for correctly spelling your name and adding a comment targeted at heuristics so that most editors will get it, but there will always be people who'll see bogus chars instead.
BOM would cause Headers already sent error, so, you can't use BOM in PHP files
This is an old post and have already been answered, but i can leave you some others resources that i found when i faced with this BOM issue.
http://people.w3.org/rishida/utils/bomtester/index.php with this page you can check if a specific file contains BOM.
There is also a handy script that outputs all files with BOM on your current directory.
<?php
function fopen_utf8 ($filename) {
$file = #fopen($filename, "r");
$bom = fread($file, 3);
if ($bom != b"\xEF\xBB\xBF")
{
return false;
}
else
{
return true;
}
}
function file_array($path, $exclude = ".|..|design", $recursive = true) {
$path = rtrim($path, "/") . "/";
$folder_handle = opendir($path);
$exclude_array = explode("|", $exclude);
$result = array();
while(false !== ($filename = readdir($folder_handle))) {
if(!in_array(strtolower($filename), $exclude_array)) {
if(is_dir($path . $filename . "/")) {
// Need to include full "path" or it's an infinite loop
if($recursive) $result[] = file_array($path . $filename . "/", $exclude, true);
} else {
if ( fopen_utf8($path . $filename) )
{
//$result[] = $filename;
echo ($path . $filename . "<br>");
}
}
}
}
return $result;
}
$files = file_array(".");
?>
I found that code at php.net
Dreamweaver also helps with this, it gives you the option to save the file and not include the BOM stuff
Its a late answer, but i still hope it helps.
Bye
Just so you know, there's an option in php, zend.multibyte, which allows php to read files with BOM without giving the Headers already sent error.
From the php.ini file:
; If enabled, scripts may be written in encodings that are incompatible with
; the scanner. CP936, Big5, CP949 and Shift_JIS are the examples of such
; encodings. To use this feature, mbstring extension must be enabled.
; Default: Off
;zend.multibyte = Off
In PHP, in addition to the "headers already sent" error, the presence of a BOM can also screw up the HTML in the browser in more subtle ways.
See Display problems caused by the UTF-8 BOM for an outline of the problem with some focus on PHP (W3C Internationalization).
When this occurs, not only is there usually a noticeable space at the top of the rendered page, but if you inspect the HTML in Firefox or Chrome, you may notice that the head section is empty and its elements appear to be in the body.
Of course viewing source will show everything where it was inserted, but the browser is interpreting it as body content (text) and inserting it there into the Document Object Model (DOM).
Or you could activate output buffering in php.ini which will solve the "headers already sent" problem. It is also very important to use output buffering for performance if your site has significant load.
BOM is actually the most efficient way of identifying an UTF-8 file, and both modern browsers and standards support and encourage the use of it in HTTP response bodies.
In case of PHP files its not the file but the generated output that gets sent as response so obviously it's not a good idea to save all PHP files with the BOM at the beginning, but it doesn't mean you shouldn't use the BOM in your response.
You can in fact safely inject the following code right before your doctype declaration (in case you are generating HTML as response):
<?="\u{FEFF}"?> (or before PHP 7.0.0: <?="\xEF\xBB\xBF"?>)
For further read: https://www.w3.org/International/questions/qa-byte-order-mark#transcoding
Adding to #omabena answer use this code to locate and remove bom from your files. Be sure to back up your files first just in case.
function fopen_utf8 ($filename) {
$file = #fopen($filename, "r");
$bom = fread($file, 3);
if ($bom != b"\xEF\xBB\xBF")
{
return false;
}
else
{
return true;
}
}
function file_array($path, $exclude = ".|..|design", $recursive = true) {
$path = rtrim($path, "/") . "/";
$folder_handle = opendir($path);
$exclude_array = explode("|", $exclude);
$result = array();
while(false !== ($filename = readdir($folder_handle))) {
if(!in_array(strtolower($filename), $exclude_array)) {
if(is_dir($path . $filename . "/")) {
// Need to include full "path" or it's an infinite loop
if($recursive) $result[] = file_array($path . $filename . "/", $exclude, true);
} else {
if ( fopen_utf8($path . $filename) )
{
//$result[] = $filename;
echo ($path . $filename . "<br>");
$pathname = $path . $filename; // change the pathname to your target file(s) which you want to remove the BOM.
$file_handler = fopen($pathname, "r");
$contents = fread($file_handler, filesize($pathname));
fclose($file_handler);
for ($i = 0; $i < 3; $i++){
$bytes[$i] = ord(substr($contents, $i, 1));
}
if ($bytes[0] == 0xef && $bytes[1] == 0xbb && $bytes[2] == 0xbf){
$file_handler = fopen($pathname, "w");
fwrite($file_handler, substr($contents, 3));
fclose($file_handler);
printf("%s BOM removed.<br/>n", $pathname);
}
}
}
}
}
return $result;
}
$files = file_array(".");
I am reading a CSV file with php. Many of the rows have a "check mark" which is really the square root symbol: √ and the php code is just skipping over this character every time it is encountered.
Here is my code (printing to the browser window in "CSV style" format so I can check that the lines break at the right place:
$file = fopen($uploadfile, 'r');
while (($line = fgetcsv($file)) !== FALSE) {
foreach ($line as $key => $value) {
if ($value) {
echo $value.",";
}
}
echo "<br />";
}
fclose($file);
As an interim solution, I am just finding and replacing the checkmarks with 1's manually, in Excel. Obviously I'd like a more efficient solution :) Thanks for the help!
fgetcsv() only works on standard ASCII characters; so it's probably "correct" in skipping your square root symbols. However, rather than replacing the checkmarks manually, you could read the file into a string, do a str_replace() on those characters, and then parse it using fgetcsv(). You can turn a string into a file pointer (for fgetcsv) thusly:
$fp = fopen('php://memory', 'rw');
fwrite($fp, (string)$string);
rewind($fp);
while (($line = fgetcsv($fp)) !== FALSE)
...
I had a similar problem with accented first characters of strings. I eventually gave up on fgetscv and did the following, using fgets() and explode() instead (I'm guessing your csv is comma separated):
$file = fopen($uploadfile, 'r');
while (($the_line = fgets($file)) !== FALSE) // <-- fgets
{
$line = explode(',', $the_line); // <-- explode
foreach ($line as $key => $value)
{
if ($value)
{
echo $value.",";
}
}
echo "<br />";
}
fclose($file);
You should setlocale ar written in documentation
Note:
Locale setting is taken into account by this function. If LANG is e.g. en_US.UTF-8, files in one-byte encoding are read wrong by this function.
before fgetcsv add setlocale(LC_ALL, 'en_US.UTF-8'). In my case it was 'lt_LT.UTF-8'.
This behaviour is reported as a php bug