I've never really done much with parsing text in PHP (or any language). I've got this text:
1 (2) ,Yes,5823,"Some Name
801-555-5555",EXEC,,"Mar 16, 2009",0.00,
1 (3) ,,4821,Somebody Else,MBR,,"Mar 11, 2009",,0.00
2 (1) ,,5634,Another Guy,ASSOC,,"Mar 15, 2009",,0.00
You can see the first line has a break, I need to get it be:
1 (2) ,Yes,5823,"Some Name 801-555-5555",EXEC,,"Mar 16, 2009",0.00,
1 (3) ,,4821,Somebody Else,MBR,,"Mar 11, 2009",,0.00
2 (1) ,,5634,Another Guy,ASSOC,,"Mar 15, 2009",,0.00
I was thinking of using a regular expression to find \n within quotes, (or after a quote, since that wouldn't create false matches) and then replacing it with nothing using PHP's preg_replace(). I'm currently researching regex since I don't know any of it so I may figure this out on my own (that's always best) but no doubt a solution to a current problem of mine will help me get a handle on it ever sooner.
Thanks so much. If I could, I'd drop a bounty on this immediately.
Thanks!
If the text has that fixed format, maybe you won't need regex at all, just scanning the line for two double quotes and if there is only one, start joining lines until you find the closing one...
Problems may arise if there can be escaped quotes, single quotes to delimit the strings, etc. but as long as there are not that kind of things, you should be fine.
I don't know PHP, so here is some pseudocode:
open = False
for line in lines do
nquotes = line.count("\"")
if not open then
if nquotes == 1 then
open = True
write(line)
else #we assume nquotes == 2
writeln(line)
end
else
if nquotes == 0 then
write(line)
else #we assume nquotes == 1
open = False
writeln(line)
end
end
end
Here's essentially fortran's answer in PHP
<pre>
<?php
$data = <<<DATA
1 (2) ,Yes,5823,"Some Name
801-555-5555",EXEC,,"Mar 16, 2009",0.00,
1 (3) ,,4821,Somebody Else,MBR,,"Mar 11, 2009",,0.00
2 (1) ,,5634,Another Guy,ASSOC,,"Mar 15, 2009",,0.00
DATA;
echo $data, '<hr>';
$lines = preg_split( "/\r\n?|\n/", $data );
$filtered = "";
$open = false;
foreach ( $lines as $line )
{
if ( substr_count( $line, '"' ) & 1 && !$open )
{
$filtered .= $line;
$open = true;
} else {
$filtered .= $line . "\n";
$open = false;
}
}
echo $filtered;
?>
</pre>
Related
I have this text file:
https://drive.google.com/file/d/0B_1cAszh75fYSjNPZFRPb0trOFE/view?usp=sharing
I can print it using the following code:
$file = fopen("gl20160630.txt","r");
while(! feof($file))
{
echo fgets($file). "<br />";
}
fclose($file);
But it looks like this:
I want the contents of this text file to be separated into four columns -Line, Description, Legacy GL Code and Closing Balance. If any one of these columns is empty it should remain empty. I just want to print those lines that start with ====>
Could you please help me find a way to print the text file like the way I want?
It's actually pretty simple, since your file has a strict number of character for each column.
All you need to do is a substr on each line starting by '====>{line}', then you can read each column by there position in the file.
Here is an example using your file :
$file = fopen("gl20160630.txt","r");
while(! feof($file))
{
$fullLine = fgets($file);
$line = substr($fullLine, 5, 4);
if (is_numeric($line)) {
$liability = trim(substr($fullLine, 10, 30));
$legacy = trim(substr($fullLine, 40, 39));
$balance = trim(substr($fullLine, 79, 15));
if ($liability != null && $legacy != null && $balance != null)
echo $line." ".$liability." ".$legacy." ".$balance."\n";
}
}
fclose($file);
You can see that all I do is:
check if the character in the column 'Line' are numbers
then I get all the other element
I 'clean' them by getting rid of unwanted characters (spaces, ...) with trim
After that, I check that all elements are filed
And I finally display them
I hope that this will help you, have a nice day ;)
I was given a task to validate a telephone number (stored in the var $number) introduced by a user in my website
$number = $_POST["telephone"];
The thing is this validation is quite complex as i must validate the number to see if it is from Portugal. i thought about validating it using all the indicators from Portugal, which are 52: (50 indicators are 3 digits long and 2 indicators are 2 digits long) Example of a number:
254872272 (254 is the indicator)
i also thought about making an array with all the indicators and then with a cycle verificate somehow if the first 2/3 digits are equal to the ones in the array.
what do you guys think? how should i solve this problem?
One way is to use regular expressions with named subpatterns:
$number = 254872272;
$ind = array( 251, 252, 254 );
preg_match( '/^(?<ind>\d{3})(?<rest>\d{6})$/', $number, $match );
if ( isset($match['ind']) && in_array( (int) $match['ind'], $ind, true ) ) {
print_r( $match );
/*
Array
(
[0] => 254872272
[ind] => 254
[1] => 254
[rest] => 872272
[2] => 872272
)
*/
}
Or you can insert indicators directly into regular expression:
preg_match( '/^(?<ind>251|252|254)(?<rest>\d{6})$/', $number, $match );
There's potential REGEX ways of "solving" this, but really, all you need is in_array() with your indicators in an array. For example:
$indicators = array('254', '072', '345');
$numbers = array(
'254872272',
'225872272',
'054872272',
'072872272',
'294872272',
'974872272',
'345872272'
);
while ($number = array_shift($numbers)) {
$indicator = substr($number, 0, 3);
if (in_array($indicator, $indicators)) {
echo "$number is indicated ($indicator).\n";
} else {
echo "$number is NOT indicated ($indicator).\n";
}
}
http://codepad.org/zesUaxF7
This gives:
254872272 is indicated (254).
225872272 is NOT indicated (225).
054872272 is NOT indicated (054).
072872272 is indicated (072).
294872272 is NOT indicated (294).
974872272 is NOT indicated (974).
345872272 is indicated (345).
Also, I use strings instead of integers on purpose, since PHP is going to interpret any numbers that begin with 0 (like 0724445555) as not having a leading zero, so you need to use a string to make sure that works correctly.
Perhaps with a regular expression?
I have not tested the following, but it should check for one of the matching indicators, followed by any 6 digits, something like:
$indicators = array('123' ,'456', '78'); // etc...
$regex = '/^(' . implode('|', $indicators) . ')[0-9]{6}$/';
if(preg_match($regex, 'your test number')) {
// Run further code...
}
There's a couple of libraries around that aim to validate as many telephone number formats as possible against the actual validation format, as defined by the relevant authorities.
They are usually based on a library by Google, and there are versions for PHP.
I want to check if password contains:
minimum 2 lower cases
minimum 1 upper case
minimum 2 selected special characters
The problem is that when i want to verify this,it admits two lowercases,but only if they are consecutive,like this:paSWORD .
if I enter pASWORd,it returns an error.
This is the code
preg_match("/^(?=.*[a-z]{2})(?=.*[A-Z])(?=.*[_|!|#|#|$|%|^|&|*]{2}).+$/")
I don't see where the problem is and how to fix it.
You're looking for [a-z]{2} in your regex. That is two consecutive lowercases!
I will go out on a limb and suggest that it is probably better to individually check each of your three conditions in separate regexes rather than trying to be clever and do it in one.
I've put some extra braces in which may get your original idea to work for non-consecutive lowercase/special chars, but I think the expression is overcomplex.
preg_match("/^(?=(.*[a-z]){2})(?=.*[A-Z])(?=(.*[_!##$%^&*]){2}).+$/")
You can use this pattern to check the three rules:
preg_match("/(?=.*[a-z].*[a-z])(?=.*[A-Z])(?=.*[_!##$%^&*].*[_!##$%^&*])/");
but if you want to allow only letters and these special characters, you must add:
preg_match("/^(?=.*[a-z].*[a-z])(?=.*[A-Z])(?=.*[_!##$%^&*].*[_!##$%^&*])[a-zA-Z_!##%^&*]+$/");
a way without regex
$str = '*MauriceAimeLeJambon*';
$chars = 'abcdefghijklmnopqrtuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_!##$%^&*';
$state = array('lower' => 2, 'upper' => 1, 'special' => 2);
$strlength = strlen($str);
for ($i=0; $i<$strlength; $i++) {
$pos = strpos($chars, $str[$i]);
if (is_numeric($pos)) {
if ($state['lower'] && $pos<26) $state['lower']--;
elseif ($state['upper'] && $pos<52) $state['upper']--;
elseif ($state['special']) $state['special']--;
} else { $res = false; break; }
$res = !$state['lower'] && !$state['upper'] && !$state['special'];
}
var_dump($res);
(This version give the same result than the second pattern. If you want the same result than the first pattern, just remove the else {} and put the last line out of the for loop.)
Thank you for taking the time to read this and I will appreciate every single response no mater the quality of content. :)
Using php, I'm trying to create a script which will delete several lines within a text file (.txt) if required, based upon whether the line starts with a 0 or a negative number. Each line within the file will always start with a number, and I need to erase all the neutral and/or negative numbers.
The main part I'm struggling with is that the content within the text file isn't static (e.g. contain x number of lines/words etc.) Infact, it is automatically updated every 5 minutes with several lines. Therefore, I'd like all the lines containing a neutral or negative number to be removed.
The text file follows the structure:
-29 aullah1
0 name
4 username
4 user
6 player
If possible, I'd like Line 1 and 2 removed, since it begins with a neutral/negative number. At points, there maybe times when there are more than two neutral/negative numbers.
All assistance is appreciated and I look forward to your replies; thank you. :) If I didn't explain anything clearly and/or you'd like me to explain in more detail, please reply. :)
Thank you.
Example:
$file = file("mytextfile.txt");
$newLines = array();
foreach ($file as $line)
if (preg_match("/^(-\d+|0)/", $line) === 0)
$newLines[] = chop($line);
$newFile = implode("\n", $newLines);
file_put_contents("mytextfile.txt", $newFile);
It is important that you chop() the newline character off of the end of the line so you don't end up with empty space. Tested successfully.
Something on these lines i guess, it is untested.
$newContent = "";
$lines = explode("\n" , $content);
foreach($lines as $line){
$fChar = substr($line , 0 , 1);
if($fChar == "0" || $fChar == "-") continue;
else $newContent .= $line."\n";
}
If the file is big, its better to read it line by line as:
$fh_r = fopen("input.txt", "r"); // open file to read.
$fh_w = fopen("output.txt", "w"); // open file to write.
while (!feof($fh_r)) { // loop till lines are left in the input file.
$buffer = fgets($fh_r); // read input file line by line.
// if line begins with num other than 0 or -ve num write it.
if(!preg_match('/^(0|-\d+)\b/',$buffer)) {
fwrite($fh_w,$buffer);
}
}
fclose($fh_r);
fclose($fh_w);
Note: Err checking not included.
file_put_contents($newfile,
implode(
preg_grep('~^[1-9]~',
file($oldfile))));
php is not particularly elegant, but still...
Load whole line into variable trim it and then check if first letter is - or 0.
$newContent = "";
$lines = explode("\n" , $content);
foreach($lines as $line){
$fChar = $line[0];
if(!($fChar == '0' || $fChar == '-'))
$newContent .= $line."\n";
}
I changed malik's code for better performance and quality.
Here's another way:
class FileCleaner extends FilterIterator
{
public function __construct($srcFile)
{
parent::__construct(new ArrayIterator(file($srcFile)));
}
public function accept()
{
list($num) = explode(' ', parent::current(), 2);
return ($num > 0);
}
public function write($file)
{
file_put_contents($file, implode('', iterator_to_array($this)));
}
}
Usage:
$filtered = new FileCleaner($src_file);
$filtered->write($new_file);
Logic and methods can be added to the class for other stuff, such as sorting, finding the highest number, converting to a sane storage method such as csv, etc. And, of course, error checking.
I have a string which contains the text of an article. This is sprinkled with BBCodes (between square brackets). I need to be able to grab the first say, 200 characters of an article without cutting it off in the middle of a bbcode. So I need an index where it is safe to cut it off. This will give me the article summary.
The summary must be minimum 200 characters but can be longer to 'escape' out of a bbcode. (this length value will actually be a parameter to a function).
It must not give me a point inside a stand alone bbcode (see the pipe) like so: [lis|t].
It must not give me a point between a start and end bbcode like so: [url="http://www.google.com"]Go To Goo|gle[/url].
It must not give me a point inside either the start or end bbcode or in-between them, in the above example.
It should give me the "safe" index which is after 200 and is not cutting off any BBCodes.
Hope this makes sense. I have been struggling with this for a while. My regex skills are only moderate. Thanks for any help!
First off, I would suggest considering what you will do with a post that is entirely wrapped in BBcodes, as is often true in the case of a font tag. In other words, a solution to the problem as stated will easily lead to 'summaries' containing the entire article. It may be more valuable to identify which tags are still open and append the necessary BBcodes to close them. Of course in cases of a link, it will require additional work to ensure you don't break it.
Well, the obvious easy answer is to present your "summary" without any bbcode-driven markup at all (regex below taken from here)
$summary = substr( preg_replace( '|[[\/\!]*?[^\[\]]*?]|si', '', $article ), 0, 200 );
However, do do the job you explicitly describe is going to require more than just a regex. A lexer/parser would do the trick, but that's a moderately complicated topic. I'll see if I can come up w/something.
EDIT
Here's a pretty ghetto version of a lexer, but for this example it works. This converts an input string into bbcode tokens.
<?php
class SimpleBBCodeLexer
{
protected
$tokens = array()
, $patterns = array(
self::TOKEN_OPEN_TAG => "/\\[[a-z].*?\\]/"
, self::TOKEN_CLOSE_TAG => "/\\[\\/[a-z].*?\\]/"
);
const TOKEN_TEXT = 'TEXT';
const TOKEN_OPEN_TAG = 'OPEN_TAG';
const TOKEN_CLOSE_TAG = 'CLOSE_TAG';
public function __construct( $input )
{
for ( $i = 0, $l = strlen( $input ); $i < $l; $i++ )
{
$this->processChar( $input{$i} );
}
$this->processChar();
}
protected function processChar( $char=null )
{
static $tokenFragment = '';
$tokenFragment = $this->processTokenFragment( $tokenFragment );
if ( is_null( $char ) )
{
$this->addToken( $tokenFragment );
} else {
$tokenFragment .= $char;
}
}
protected function processTokenFragment( $tokenFragment )
{
foreach ( $this->patterns as $type => $pattern )
{
if ( preg_match( $pattern, $tokenFragment, $matches ) )
{
if ( $matches[0] != $tokenFragment )
{
$this->addToken( substr( $tokenFragment, 0, -( strlen( $matches[0] ) ) ) );
}
$this->addToken( $matches[0], $type );
return '';
}
}
return $tokenFragment;
}
protected function addToken( $token, $type=self::TOKEN_TEXT )
{
$this->tokens[] = array( $type => $token );
}
public function getTokens()
{
return $this->tokens;
}
}
$l = new SimpleBBCodeLexer( 'some [b]sample[/b] bbcode that [i] should [url="http://www.google.com"]support[/url] what [/i] you need.' );
echo '<pre>';
print_r( $l->getTokens() );
echo '</pre>';
The next step would be to create a parser that loops over these tokens and takes action as it encounters each type. Maybe I'll have time to make it later...
This does not sound like a job for (only) regex.
"Plain programming" logic is a better option:
grab a character other than a '[', increase a counter;
if you encounter an opening tag, keep advancing until you reach the closing tag (don't increase the counter!);
stop grabbing text when your counter has reached 200.
Here is a start. I don't have access to PHP at the moment, so you might need some tweaking to get it to run. Also, this will not ensure that tags are closed (i.e. the string could have [url] without [/url]). Also, if a string is invalid (i.e. not all square brackets are matched) it might not return what you want.
function getIndex($str, $minLen = 200)
{
//on short input, return the whole string
if(strlen($str) <= $minLen)
return strlen($str);
//get first minLen characters
$substr = substr($str, 0, $minLen);
//does it have a '[' that is not closed?
if(preg_match('/\[[^\]]*$/', $substr))
{
//find the next ']', if there is one
$pos = strpos($str, ']', $minLen);
//now, make the substr go all the way to that ']'
if($pos !== false)
$substr = substr($str, 0, $pos+1);
}
//now, it may be better to return $subStr, but you specifically
//asked for the index, which is the length of this substring.
return strlen($substr);
}
I wrote this function which should do just what you want. It counts n numbers of characters (except those in tags) and then closes tags which needs to be closed. Example use included in code. The code is in python, but should be really easy to port to other languages, such as php.
def limit(input, length):
"""Splits a text after (length) characters, preserving bbcode"""
stack = []
counter = 0
output = ""
tag = ""
insideTag = 0 # 0 = Outside tag, 1 = Opening tag, 2 = Closing tag, 3 = Opening tag, parameters section
for i in input:
if counter >= length: # If we have reached the max length (add " and i == ' '") to not make it split in a word
break
elif i == '[': # If we have reached a tag
insideTag = 1
elif i == '/': # If we reach a slash...
if insideTag == 1: # And we are in an opening tag
insideTag = 2
elif i == '=': # If we have reached the parameters
if insideTag >= 1: # If we actually are in a tag
insideTag = 3
elif i == ']': # If we have reached the closing of a tag
if insideTag == 2: # If we are in a closing tag
stack.pop() # Pop the last tag, we closed it
elif insideTag >= 1:# If we are in a tag, parameters or not
stack.append(tag) # Add current tag to the tag-stack
if insideTag >= 0: # If are in some type of tag
insideTag = 0
tag = ""
elif insideTag == 0: # If we are not in a tag
counter += 1
elif insideTag <= 2: # If we are in a tag and not among the parameters
tag += i
output += i
while len(stack) > 0:
output += '[/'+stack.pop()+']' # Add the remaining tags
return output
cutText = limit('[font]This should be easy:[img]yippee.png[/img][i][u][url="http://www.stackoverflow.com"]Check out this site[/url][/u]Should be cut here somewhere [/i][/font]', 60)
print cutText