pattern to root out email addresses - php

I have a text document with a slew of email addresses which I converted from a pdf.
here is an example of of what it looks like:
name1;someone#awebite1.com;;;
name2;someone#awebite2.com;;;
name3;someone#awebite3.com;;;
name4;someone#awebite4.com;;;
name5;someone#awebite5.com;;;
etc... 600+ contacts
anyone know to to write a simple php pattern/expression/regex I can use to separate the name and email one by one so I can put in database?
the database of course would be a simple: id | contact | email
any help would be gladly appreciated!
I forgot to mention, I would like to do it in php. I will incorporate the code into a form for future usage.

In PHP, you can split a string using the explode function..
$parts = explode(';', $inputString);
The returned array contains each part separated by ;.
For this, each line in your text document has to be given as inputString. So loop through the array returned by
preg_split('/\\n/',$docContent)
and call explode with each element. The above preg_splitreturns an array with each line of the input as an element.
Combining both,
$lines = preg_split('/\\n/',$docContent);
foreach($lines as $line) {
$parts = explode(';', $line);
//$parts[0] is name and $parts[1] is email. ignore remaining elements
}
Note : I have only a little knowledge in php. There may be better code.

How about something like:
LOAD DATA INFILE 'yourFile'
INTO TABLE yourTable
FIELDS TERMINATED BY ';'
LINES TERMINATED BY ';;;\n'

assuming that by "contact" you mean the very first field of each line (which says 'contact' for all shown values), something like this will work:
cat contacts.txt | awk {'split($2,A,";"); print A[1]"|"$1"|"A[2]}'

Related

Reading a text file with specific code tag information in php

I would like to read a file, generally a text file, each record is starting with a with a specific code (filed name) in the line and ended by another specific code for a complete record. Each specific code is delimited by character ^ as its value in php into dump into sql database.
text file e.g.
001^UK2000009
008^S54/01/R/M/X,
009^Male
110^text1
200^text2
001^UK2000008
008^S54/012/R/M/X
009^Female
110^text1a
200^text2a
and so on...
This is similar to php constructor File_MARC
thanks in advance
First you have to read a file with file methods in php and than you can get a specific column name and it's value by below way
First read a single line from a file and than use a explode method to break that line into different elements with space delimitation.
$columns = explode(' ', $line_variable);
After generating columns I can see that each key values are delimited by ^ (cap) symbol so for that also we can use the explode method.
$newColumn =[];
foreach($columns as $column){
$splited = explode('^', $column);
$newColumn[][$splited[0]] = $splited[1];
}
print_r($newColumn);
This is just to give you an idea that how you can achieve your task but rest is completely dependent on you.

Handling text file with unknown newline positions

My problem is simple: I have a text file, which i handle and insert all the data in a database and also do stuff with it for each new line. The problem is that the text file is a log for sms'es received in my gateway and depending on the text that is being sent I would have a line corresponding to each sms. If an SMS does not have any new lines in its body, everything is alright, on the other hand, if and SMS is sent like this:
"Test
TestOnANewLine"
I get a log file that breaks and with a new line everytime. A sample follows:
2012-01-01 10:10:10,4C64DCD6.req,192.168.999.999,+12223334444,OK -- SMPP - 999.999.999.999:9999,SubmitUser=user;Sender=sender;SMSCMsgId=999999999;Text="Test1
NewLineTest
AnotherNEwLineTEst"
The log file is interpreted like this:
date time, smsid, ip that processed it, number that is being sent to, status --connection type - ip that is sent from, user that submitted; sender name that is displayed; sms connection id; body of the sms
As for the language I am using PHP and for the functions used its a simple
foreach($lines as $line)
{ explode and do stuff }
How do I handle this situation? At this point any help is appreciated
Thanks in advance!!
fgetcsv could handle the linebreaks enclosed in '"' but with an additional '"' character in the body it would fail...
So what about some unresponsible regexp usage?
preg_match_all(#^(\d{4}-\d{2}-\d{2}[^,]+),([^,]+),([^,]+),([^,]+),([^,]+),SubmitUser=([^;])+;Sender=([^;])+;SMSCMsgId=([^;])+;Text="([\w\d\s\.\-,:;'"]+)"$#im', $file, $matches);
should do the job, for not too crazy texts, maybe you should adpot the \w\d\s.-,:;'" expression more to your needs
Couldn't you loop through the newlines until you can parse a date from it?
Maybe take into account that the previous line ended with a double quote ?
I know its not fool proof but without some recognisable "end of message" character(s). This is the best i could think of :P
First of all, thank you for all the feedback, it was really precious and it helped me on solving this issue. Also, for all the other people that will look through this post and would want a solution here is mine:
I changed the way I would interpret the end of line /r/n from the regular one to /r/n2 which means that ill consider a new line in my file reading if and only if there is a regular new line /r/n and on the new physical line there is a 2 (which is the beginning of the year)
The actual solved part is:
$data = file_get_contents($backup_file);
$lines=explode("\r\n2",$data);
foreach($lines as $line)
{
//explode and do stuff
}
Try this to get all the log entries normalized into a single array item per log entry (i.e. combine entries across multiple line breaks into a single item)
$line_array = file('/path/to/file');
$log_array = array();
$i = -1;
$date_pattern = '/^[0-9]{4}-[0-9]{2}-[0-9]{2}\s[0-9]{2}:[0-9]{2}:[0-9]{2}/';
foreach ($line_array as $line) {
if (1 === preg_match($date_pattern, $line)) {
// this is a new log entry
// let's trim the whitespace from the end of the last log array entry since we are done with it
if(isset($log_array[$i])) {
$log_array[$i] = rtrim($log_array[$i]);
}
// start a new log array entry
$i++;
$log_array[$i] = $line;
} else {
// this is not a new log entry
$log_array[$i] .= $line;
}
}
After that you should be able to work with $log_array to extract the data you need. By the way I should note that when you loop through the $log_array. It would probably be helpful to extract the msg text first. If you do a greedy preg_match on the double quotes, you shouldn't have any problems with messages that have quotes within them as the greedy match will find the largest possible matching string, which in your case would be everything between the quotes bounding the message content.

How to use a textarea to get the csv file /delimited data?

I would like to use a textarea in html form to get the delimited data
for example:
The simple data is like the following
testA#testa.com peter USA
testB#testB.com Tony USA
testC#testC.com tom USA
testA#testa.com peter USA
testA#.com peter USA
The problems are:
How to check where is each line ends? (\n)?
How to do duplication checking (only for email)? (if 3 data each row, get 1,4,7,11...data, and array_unique?)
Should i restrict the deliminator symbol or i do something to check automatically?
What If space is deliminator , but at same time my other data eg. is using space e.g. Tony Hanks ?
Thank you for any kind of help
First I would split string by line ends:
$r = explode(PHP_EOL, $data); //data is your raw data from textarea
To check the delimiter, explode first line by all delimiters that are possible and check array count.
foreach( array(' ', ';', '/') as $delimiter) {
$x = explode($delimiter, r[0]);
if(count($x) == 3) {
break;
}
}
After that use proper delimiter with str-getcsv on raw data: http://www.php.net/manual/en/function.str-getcsv.php
What If space is deliminator , but at same time my other data eg. is using space e.g. Tony Hanks ?
In that case you need to use quotes. Excel also could not handle this without quotes.
How to do duplication checking (only for email)?
Create array where keys are emails. Iterate through your parsed csv and check if key isset already or not.

How to use textarea as input to check duplication and invalid?

My idea is to use a textarea to let the user copy email name address etc.... to it.
The copied data must have delimiter between each value.
However, my questions are:
How can i detect the delimiter used in copied data?
How to manipulate? Store each into different array according to the location? but what if there is some error between eg. if one row has one more entry eg. email name address adress2 when other are email name address
Actually i am doing some process from outlook express export txt file or data copied from excel sheet
For those outlook express export file, there are some spacing for each email that without name .So the problem is occur eg.
aa#aa.com name1 bb#bb.com cc#cc.com name2
Thanks for your kindly help.
You would use explode for this (http://php.net/manual/en/function.explode.php).
If you can ask the users to enter each piece of data on a new line, you can then split the textarea contents by the \n character:
e.g.
$myarray = explode("\n", $textarea_str);
Then each element of the array can be split by the delimiting comma:
foreach ($myarray as $row)
$eachline[] = explode(",", $row);
Then validate the individual items that you've extracted from the delimited data as if they have come individually.
You have to explode(',', $_POST['emails]), then run trough that array and check/trim/validate-email all elements for proper format.
you can pre inform the user ,email address must be separated by (ur delimitor may be comma and so).On php side just explode(delimitor,$_REQUEST['text_area_name']);
Oh, nice question . You are trying to find what is the delimiter used in the copied text . I cannot say a perfect solution, but you may create an algorithm like this .
Consider possible delimiters ex : , ' " | etc.
Explode the text with each delimiter in consideration . count the number of elements returned in each array
For , which delimiter you got the higher number of elements in array , may be used by the user
You can fine tune the code
This is a simple example .
Thanks

php search and replace

I am trying to create a database field merge into a document (rtf) using php
i.e if I have a document that starts
Dear Sir,
Customer Name: [customer_name], Date of order: [order_date]
After retrieving the appropriate database record I can use a simple search and replace to insert the database field into the right place.
So far so good.
I would however like to have a little more control over the data before it is replaced. For example I may wish to Title Case it, or convert a delimited string into a list with carriage returns.
I would therefore like to be able to add extra formatting commands to the field to be replaced. e.g.
Dear Sir,
Customer Name: [customer_name, TC], Date of order: [order_date, Y/M/D]
There may be more than one formatting command per field.
Is there a way that I can now search for these strings? The format of the strings is not set in stone, so if I have to change the format then I can.
Any suggestions appreciated.
You could use a templating system like Smarty, that might make your life easier, as you can do {$customer_name|ucwords} or actually put PHP code in your email template.
Try a RegEx and preg_replace_callback:
function replace_param($matches)
{
$parts = explode(',',$matches[0]);
//$parts now contains an array like: customer_name,TC,SE,YMD
// do some substitutions and:
return $text;
}
preg_replace_callback('/\[([^\]]+)\]/','replace_param',$rtf);
You can use explode on it to separate them into array values.
For Example:
$customer_name = 'customer_name, TC';
$get_fields = explode(',', $customer_name);
foreach($get_fields as $value)
{
$new_val = trim($value);
// Now do whatever you want to these in here.
}
Sorry if I'm not understanding you.

Categories