PHP substr doesn't echo anything

PHP substr doesn't echo anything - php

I'm having problems with this code, and the PHP method 'substr' is playing up. I just don't get it. Here's a quick introduction what I'm trying to achieve. I have this massive XML-document with email-subscribers from Joomla. I'm trying to import it to Mailchimp, but Mailchimp have some rules for the syntax of the ways to import emails to a list. So at the moment the syntax is like this:
<subscriber>
<subscriber_id>615</subscriber_id>
<name><![CDATA[NAME OF SUBSCRIBER]]></name>
<email>THE_EMAIL#SOMETHING.COM</email>
<confirmed>1</confirmed>
<subscribe_date>THE DATE</subscribe_date>
</subscriber>
I want to make a simple PHP-script that takes all those emails and outputs them like this:
[THE_EMAIL#SOMETHING.COM] [NAME OF SUBSCRIBER]
[THE_EMAIL#SOMETHING.COM] [NAME OF SUBSCRIBER]
[THE_EMAIL#SOMETHING.COM] [NAME OF SUBSCRIBER]
[THE_EMAIL#SOMETHING.COM] [NAME OF SUBSCRIBER]
If I can do that, then I can just copy paste it into Mailchimp.
Now here's my PHP-script, so far:
$fileName = file_get_contents('emails.txt');
foreach(preg_split("/((\r?\n)|(\r\n?))/", $fileName) as $line){
if(strpos($line, '<name><![CDATA[')){
$name = strpos($line, '<name><![CDATA[');
$nameEnd = strpos($line, ']]></name>', $name);
$nameLength = $nameEnd-$name;
echo "<br />";
echo " " . strlen(substr($line, $name, $nameLength));
echo " " . gettype(substr($line, $name, $nameLength));
echo " " . substr($line, $name, $nameLength);
}
if(strpos($line, '<email>')){
$var1 = strpos($line, '<email>');
$var2 = strpos($line, '</email>', $var1);
$length = $var2-$var1;
echo substr($line, $var1, $length);
}
}
The first if-statement works as it should. It identifies, if there's an ''-tag on the line, and if there is, then it finds the end-tag and outputs the email with the substr-method.
The second if-statement is annoying me. If should do the same thing as the first if-statement, but it doesn't. The length is the correct length (I've checked). The type is the correct type (I've checked). But when I try to echo it, then nothing happens. The script still runs, but it doesn't write anything.
I've played around with it quite a lot and seem to have tried everything - but I can't figure it out.

Warning
This function may return Boolean FALSE, but may also return a non-Boolean value which evaluates to FALSE. Please read the section on Booleans for more information. Use the === operator for testing the return value of this function.
You should be using if(strpos($line,'...') !== false) {
That aside, your file seems to be XML, so you should use an XML parser lest you fall under the pony he comes.
DOMDocument is a good one. You could do something like this:
$dom = new DOMDocument();
$dom->load("emails.txt");
$subs = $dom->getElementsByTagName('subscriber');
$count = $subs->length;
for( $i=0; $i<$l; $i++) {
$sub = $subs->item($i);
echo $sub->getElementsByTagName('email')->item(0)->nodeValue;
echo " ";
echo $sub->getElementsByTagName('name')->item(0)->nodeValue;
echo "\n";
}
This will output the names and emails in the format you described.

So there's a few things wrong with this, including the strpos command which will actually return 0 if it finds the tag at the beginning of the line, which doesn't appear to be what you intend.
Also, if the XML is not formatted exactly as you have, with each opening and closing tag on the one line, then your logic will fail as well.
It's not a good idea to re-invent XML processing for this reason...
Here as others have proposed, is a better solution to the problem*.
$xml = simplexml_load_file('emails.txt');
foreach( $xml->subscriber as $sub )
{
// Note that SimpleXML is aware of CDATA, and only outputs the text
$output = '[' . $sub->name . ']' . ' ' . '[' . $sub->email . ']';
}
*This assumes that you XML is valid, i.e. "subscriber" blocks are contained in a single parent at the top level. You can of course use simplexml documentation to adjust for your use case.

Related

PHP - Problem skipping 0D0A lines from text file

I get a text file (.sql) which contains MySQL inserts. I found that there are times when blank lines are included. These blank lines contain hex value 0D0A (Windows newline). MySQL reports an error when a blank line is sent for the query. So, as I read/send the lines to MySQL I want to skip sending any blank lines. I came up with the following code, but it's not working as I expected. Newlines are removed but blank lines are still sent to MySQL. I traced the problem to the PHP command empty(). According to the docs " " should be considered empty. So why does it not skip blank lines? I've spent a few days working on this but nothing I try works. I need another set of eyes, please. Here is the code:
<?php
$bom = pack("H*", "EFBBBF");
if(($reading = fopen("sample.sql", "r")) !== false)
{
$sql = preg_replace("/^$bom/", "", fgets($reading));
while(!feof($reading))
{
$sql = str_replace(array("\n", "\r", "\r\n"), " ", $sql);
if(!empty($sql))
{
echo("{$sql}<br>");
$sql = fgets($reading);
}
}
if(!feof($reading))
{
echo("Unexpected read error in file." . PHP_EOL);
}
fclose($reading);
}
?>
I replace the newlines with a space (if I try to remove the newlines using "" IIS will crash). I expect the empty command to skip the space but it doesn't. The sample data you need to run this script is here.
Thanks for any and all help,
Charles

After some much needed sleep I found my problem (sort of). I still think empty() should see " " as empty, I'll check the docs again.
To fix my code I had to change the str_replace to remove the newlines completely. Then I had to move fgets out of the if statement (if the line is blank you still need to get the next line).
In case anyone else comes across this problem here is the corrected code:
<?php
$bom = pack("H*", "EFBBBF");
if(($reading = fopen("sample.sql", "r")) !== false)
{
$sql = preg_replace("/^$bom/", "", fgets($reading));
while(!feof($reading))
{
$sql = str_replace(array("\r\n", "\n", "\r"), "", $sql);
if(!empty($sql))
{
echo("{$sql}<br>");
}
$sql = fgets($reading);
}
if(!feof($reading))
{
echo("Unexpected read error in file." . PHP_EOL);
}
fclose($reading);
}
?>
Thanks for looking.

PHP Convert Full Date To Short Date

I need a PHP script to loop through all .html files in a directory and in each one find the first instance of a long date (i.e. August 25th, 2014) and then adds a tag with that date in short format (i.e. <p class="date">08/25/14</p>).
Has anyone done something like this before? I'm guessing you'd explode the string and use a complex case statement to convert the month names and days to regular numbers and then implode using /.
But I'm having trouble figuring out the regular expression to use for finding the first long date.
Any help or advice would be greatly appreciated!

Here's how I'd do it in semi-pseudo-code...
Loop through all the files using whatever floats your boat (glob() is an obvious choice)
Load the HTML file into a DOMDocument, eg
$doc = new DOMDocument();
$doc->loadHTMLFile($filePath);
Get the body text as a string
$body = $doc->getElementsByTagName('body');
$bodyText = $body->item(0)->textContent; // assuming there's at least one body tag
Find your date string via this regex
preg_match('/(January|February|March|April|May|June|July|August|September|October|November|December) \d{1,2}(st|nd|rd|th)?, \d{4}/', $bodyText, $matches);
Load this into a DateTime object and produce a short date string
$dt = DateTime::createFromFormat('F jS, Y', $matches[0]);
$shortDate = $dt->format('m/d/y');
Create a <p> DOMElement with the $shortDate text content, insert it into the DOMDocument where you want and write back to the file using $doc->saveHTMLFile($filePath)

I incorporated the helpful response above into what I already had and it seems to work. I'm sure it's far from ideal but it still serves my purpose. Maybe it might be helpful to others:
<?php
$dir = "archive";
$a = scandir($dir);
$a = array_diff($a, array(".", ".."));
foreach ($a as $value) {
echo '</br>File name is: ' . $value . "<br><br>";
$contents = file_get_contents("archive/".$value);
if (preg_match('/(January|February|March|April|May|June|July|August|September|October|November|December) \d{1,2}(st|nd|rd|th)?, \d{4}/', $contents, $matches)) {
echo 'the date found is: ' . $matches[0] . "<br><br>";
$dt = DateTime::createFromFormat('F jS, Y', $matches[0]);
$shortDate = $dt->format('m/d/y');
$dateTag = "\n" . '<p class="date">' . $shortDate . '</p>';
$filename ="archive/".$value;
$file = fopen($filename, "a+");
fwrite($file, $dateTag);
fclose($file);
echo 'Date tag added<br><br>';
} else {
echo "ERROR: No date found<br><br>";
}
}
?>
The code assumes the files to modify are in a directory called "archive" that resides in the same directory as the script.
Needed the two different preg_match lines because I found out some dates are listed with the ordinal suffix (i.e. August 24th, 2005) and some are not (i.e. August 24, 2005). Couldn't quite puzzle out exactly how to get a single preg_match that handles both.
EDIT: replaced double preg_match with single one using \d{1,2}(st|nd|rd|th)? as suggested.

Why is it starting a new line before it should in PHP array?

I'm having a problem with arrays and file writing, what I want to do is take one file, and copy it onto another file, except with formatting added to it.
To be specific, this:
Line 1
Line 2
Line 3
Would become this:
<br /><hr />Line 1
<br /><hr />Line 2
<br /><hr />Line 3
And I've sorta done that, but something weird happens. Instead of formatting all on one line, it linebreaks and keeps going. Like this
<br />1
THE END<br />7
THE END<br />0
THE END<br />Red
THE END<br />Silent
THE END<br />No ChangesTHE END
My code for this is:
<?php
$filename1 = "directorx_OLDONE.txt";
$filename2 = "directorx_NEWONE.txt";
$file1 = fopen($filename1, "r") or exit ("No");
$file2 = fopen($filename2, "w") or exit ("No");
while (!feof($file1)){
$listArrayf1[] = fgets($file1);
}
fclose($file1);
echo var_dump($listArrayf1) . "<br /><br />";
$entries = count($listArrayf1) - 1;
echo $entries;
for($i=0;$i<=$entries;$i++){
$listArrayf2[] = "<br />".$listArrayf1[$i]."THE END";
fwrite($file2, $listArrayf2[$i]);
}
fclose($file2);
echo var_dump($listArrayf2);
/*
Open file1, r
Open file2, w
While it's not the end of the file, add each line of file1 to an array.
Count the number of lines in file1 and call it Entries. -1 because gotta start at 0.
Make a new array with the values of the old one, but with tags before and after it.
*/
?>
I'm sure there's a better way to accomplish the ultimate goal I'm trying, which is detecting certain words entered into a form, (There's probably a better way than making a formatted and non-formatted copy of what gets entered.) but my PHP vocab is limited and I'd like to figure out the long, prerequisite hard ways before learning how to do em easier.
At first I thought that it was because I was writing the OLDFILE manually, using the return key. So I made a script to write it using \n instead and it changed nothing.

Eh :)
At first, please take a look on var_dump and check what the function is returning (nothing, so correct usage is var_dump( ...); echo "<br />";)
Second, fgets reads string including newline character, so I guess you see in your string something like this:
string( 10) "abcdefghi
"
So you have to remove newline manually for example with trim.
Next, I'd recommend to (at least) take a look at foreach.
So I'd wrote the whole loop as:
foreach( $listArrayf1 as $row){
$row = "<br /><hr />". trim( $row)."THE END";
fwrite($file2, $row);
$listArrayf2[] = $row;
}
You may also use foreach( $listArrayf1 as &$row) and in the end $listArrayf1 will contain exactly the same as $listArrayf2. When you need to preserve all other spaces, you should probably use $row = substr( $row, 0, -1);
btw: you can write normal code, mark it in textarea and by hitting ctrl+k it'll get indented by 4 spaces

fgets returns the newline character at the end of each line as part of its input. That's where your "extra" newline comes from.
Change this
$listArrayf1[] = fgets($file1);
to this:
$listArrayf1[] = rtrim(fgets($file1), "\r\n");
This will remove the newline characters from the end of the return value and make your strings format as intended.
However, as you said yourself you are really doing things in a roundabout way. You could read all of file1 into an array with just
$listArrayf1 = file($filename1);
That's it. No loops, no fopen, no problems with newlines. It pays to look for the most fitting way of doing things.

Switch gettext translated language with original language

I started my PHP application with all text in German, then used gettext to extract all strings and translate them to English.
So, now I have a .po file with all msgids in German and msgstrs in English. I want to switch them, so that my source code contains the English as msgids for two main reasons:
More translators will know English, so it is only appropriate to serve them up a file with msgids in English. I could always switch the file before I give it out and after I receive it, but naaah.
It would help me to write English object & function names and comments if the content text was also English. I'd like to do that, so the project is more open to other Open Source collaborators (more likely to know English than German).
I could do this manually and this is the sort of task where I anticipate it will take me more time to write an automated routine for it (because I'm very bad with shell scripts) than do it by hand. But I also anticipate despising every minute of manual computer labour (feels like an oxymoron, right?) like I always do.
Has someone done this before? I figured this would be a common problem, but couldn't find anything. Many thanks ahead.
Sample Problem:
<title><?=_('Routinen')?></title>
#: /users/ruben/sites/v/routinen.php:43
msgid "Routinen"
msgstr "Routines"
I thought I'd narrow the problem down. The switch in the .po-file is no issue of course, it is as simple as
preg_replace('/msgid "(.+)"\nmsgstr "(.+)"/', '/msgid "$2"\nmsgstr "$1"/', $str);
The problem for me is the routine that searches my project folder files for _('$msgid') and substitutes _('msgstr') while parsing the .po-file (which is probably not even the most elegant way, after all the .po-file contains comments which contain all file paths where the msgid occurs).
After fooling around with akirk's answer a little, I ran into some more problems.
Because I have a mixture of _('xxx') and _("xxx") calls, I have to be careful about (un)escaping.
Double quotes " in msgids and msgstrs have to be unescaped, but the slashes can't be stripped, because it may be that the double quote was also escaped in PHP
Single quotes have to be escaped when they're replaced into PHP, but then they also have to be changed in the .po-file. Luckily for me, single quotes only appear in English text.
msgids and msgstrs can have multiple lines, then they look like this
msgid = ""
"line 1\n"
"line 2\n"
msgstr = ""
"line 1\n"
"line 2\n"
plural forms are of course skipped at the moment, but in my case that's not an issue
poedit wants to remove strings as obsolete that seem successfully switched and I have no idea why this happens in (many) cases.
I'll have to stop working on this for tonight. Still it seems using the parser instead of RegExps wouldn't be overkill.

I built on akirk's answer and wanted to preserve what I came up with as an answer here, in case somebody has the same problem.
This is not recursive, but that could easily change of course. Feel free to comment with improvements, I will be watching and editing this post.
$po = file_get_contents("locale/en_GB/LC_MESSAGES/messages.po");
$translations = array(); // german => english
$rawmsgids = array(); // find later
$msgidhits = array(); // record success
$msgstrs = array(); // find later
preg_match_all('/msgid "(.+)"\nmsgstr "(.+)"/', $po, $matches, PREG_SET_ORDER);
foreach ($matches as $match) {
$german = str_replace('\"','"',$match[1]); // unescape double quotes (could misfire if you escaped double quotes in PHP _("bla") but in my case that was one case versus many)
$english = str_replace('\"','"',$match[2]);
$en_sq_e = str_replace("'","\'",$english); // escape single quotes
$translations['_(\''. $german . '\''] = '_(\'' . $en_sq_e . '\'';
$rawmsgids['_(\''. $german . '\''] = $match[1]; // find raw msgid with searchstr as key
$translations['_("'. $match[1] . '"'] = '_("' . $match[2] . '"';
$rawmsgids['_("'. $match[1] . '"'] = $match[1];
$translations['__(\''. $german . '\''] = '__(\'' . $en_sq_e . '\'';
$rawmsgids['__(\''. $german . '\''] = $match[1];
$translations['__("'. $match[1] . '"'] = '__("' . $match[2] . '"';
$rawmsgids['__("'. $match[1] . '"'] = $match[1];
$msgstrs[$match[1]] = $match[2]; // msgid => msgstr
}
foreach (glob("*.php") as $file) {
$code = file_get_contents($file);
$filehits = 0; // how many replacements per file
foreach($translations AS $msgid => $msgstr) {
$hits = 0;
$code = str_replace($msgid,$msgstr,$code,$hits);
$filehits += $hits;
if($hits!=0) $msgidhits[$rawmsgids[$msgid]] = 1; // this serves to record if the msgid was found in at least one incarnation
elseif(!isset($msgidhits[$rawmsgids[$msgid]])) $msgidhits[$rawmsgids[$msgid]] = 0;
}
// file_put_contents($file, $code); // be careful to test this first before doing the actual replace (and do use a version control system!)
echo "$file : $filehits <br>";
echo $code;
}
/* debug */
$found = array_keys($msgidhits, 1, true);
foreach($found AS $mid) echo $mid . " => " . $msgstrs[$mid] . "\n\n";
echo "Not Found: <br>";
$notfound = array_keys($msgidhits, 0, true);
foreach($notfound AS $mid) echo $mid . " => " . $msgstrs[$mid] . "\n\n";
/*
following steps are still needed:
* convert plurals (ngettext)
* convert multi-line msgids and msgstrs (format mentioned in question)
* resolve uniqueness conflict (msgids are unique, msgstrs are not), so you may have duplicate msgids (poedit finds these)
*/

See http://code.activestate.com/recipes/475109-regular-expression-for-python-string-literals/ for a good python-based regular expression for finding string literals, taking escapes into account. Although it's python, this might be quite good for multiline strings and other corner cases.
See http://docs.translatehouse.org/projects/translate-toolkit/en/latest/commands/poswap.html for a ready, out-of-the-box base language swapper for .po files.
For instance, the following command line will convert german-based spanish translation to english-based spanish translation. You just have to ensure that your new base language (english) is 100% translated before starting conversion:
poswap -i de-en.po -t de-es.po -o en-es.po
And finally to swap english po file to german po file, use swappo:
http://manpages.ubuntu.com/manpages/hardy/man1/swappo.1.html
After swapping files, some manual polishing of resultant files might be required. For instance headers might be broken and some duplicate texts might occur.

So if I understand you correctly you'd like to replace all German gettext calls with English ones. To replace the contents in the directory, something like this could work.
$po = file_get_contents("translation.pot");
$translations = array(); // german => english
preg_match_all('/msgid "(.+)"\nmsgstr "(.+)"/', $po, $matches, PREG_SET_ORDER);
foreach ($matches as $match) {
$translations['_("'. $match[1] . '")'] = '_("' . $match[2] . '")';
$translations['_(\''. $match[1] . '\')'] = '_(\'' . $match[2] . '\')';
}
foreach (glob("*.php") as $file) {
$code = file_get_contents($file);
$code = str_replace(array_keys($translations), array_values($translations), $code);
//file_put_contents($file, $code);
echo $code; // be careful to test this first before doing the actual replace (and do use a version control system!)
}

php return elements from file content

Pretty simple i'm sure, but..
I've got a file that is guaranteed to have only an <h1>some text</h1> and a <p>some more text</p> in it.
How would i go about returning these to elements as separate variables?

If your file is an HTML one, the general solution would be to :
Load it to a DOMDocument, with
DOMDocument::loadHTML if you have your HTML content as a string
or DOMDocument::loadHTMLFile
Use DOM methods to access your nodes
Here, DOMDocument::getElementsByTagName should be perfect
ANd, then, once you have your node, work's done ;-)
Not : if your HTML elements contain sub-elements, and you want the whole content, including sub-tags, as a string, take a look at, for example, this user note

Your file is just text, so you're going to have to parse it. Generally HTML isn't all that suitable for parsing with normal operations, but if you know the exact contents you shouldn't have a problem.
Depending on what your separator is between the two tag blocks (let's pretend it's a \n, you could do something like this:
$contents = file_get_contents("yourfile.html");
list($h1,$p) = explode("\n",$contents);
That would give you the two text blocks in $h1 and $p. You could parse the rest from there if you needed to do more work.

You can use something like this:
function strBetween($au, $au2, $text) {//gets substring beetween $au and $au2 in $text
$pau = strpos($text, $au);
if($au2 !== '') {
$pau2 = strpos($text, $au2,$pau);
if($pau !== false && $pau2 !== false)
return substr($text, $pau+strlen($au), $pau2-$pau-strlen($au));
else
return '';
} else {
return substr($text, $pau+strlen($au));
}
}
$contents = file_get_contents("yourfile.html");
$h1 = strBetween('<h1>', '</h1>', $contents);
$p = strBetween('<p>', '</p>', $contents);

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.