php search text file for any wav file names - php

I have a series of files which contain raw text or json data, in these files will be wav file names. All of the wav file have the suffix of .wav
Is there anyway using php I can search an individual text or json file and return an array of of any .wav files found ?
This example of random text contains 6 .wav files, how would I search this and extract the filenames ?
Spoke as as other again ye. Hard on to roof he drew. So sell side newfile.wav ye in mr evil. Longer waited mr of nature seemed. Improving knowledge incommode objection me ye is prevailed playme.wav principle in. Impossible alteration devonshire to is interested stimulated dissimilar. To matter esteem polite do if.
Spot of come to ever test.wav hand as lady meet on. Delicate contempt received two yet advanced. Gentleman as belonging he commanded believing dejection in by. On no am winding chicken so behaved. Its preserved sex enjoyment new way behaviour. Him yet devonshire celebrated welcome.wav especially. Unfeeling one provision are smallness resembled repulsive.
Raising say express had chiefly detract demands she. Quiet led own cause three him. Front no party young abode state up. Saved he do fruit woody of to. Met defective are allowance two perceived listening consulted contained. It chicken oh colonel pressed excited suppose to shortly. He improve started no we manners another.wav however effects. Prospect humoured mistress to by proposal marianne attended. Simplicity the far admiration preference everything. Up help home head spot an he room in.
Talent she for lively eat led sister. Entrance strongly packages she out rendered get quitting denoting led. Dwelling confined improved it he no doubtful raptures. Several carried through an of up attempt gravity. Situation to be at offending elsewhere distrusts if. Particular use for considered projection cultivated. Worth of do doubt shall it their. Extensive existence up me last.wav contained he pronounce do. Excellence inquietude assistance precaution any impression man sufficient.
I've tries this, but I get no results.
$lines = file('test.txt');
foreach ($lines as $line_num => $line) {
$line = trim($line);
if (strpos($line, '*.wav') !== false) {
echo ($line);
}
}
The above text should return :
newfile.wav
playme.wav
test.wav
welcome.wav
another.wav
last.wav
Thanks
UPDATE:
Using the following:
$text = file_get_contents('test.txt');
preg_match_all('/\w+\.wav/', $text, $matches);
var_dump($matches);
results in an array of :
array(1) {
[0]=>
array(6) {
[0]=>
string(11) "newfile.wav"
[1]=>
string(10) "playme.wav"
[2]=>
string(8) "test.wav"
[3]=>
string(11) "welcome.wav"
[4]=>
string(11) "another.wav"
[5]=>
string(8) "last.wav"
}
}
So an array of the wav files with in an array, how do I get just the array of wav files ? Thanks
This does't work correctly for wav files with spaces in there names.
any ideas ?

This tool might help you to design an expression as you wish and test it, maybe something similar to:
([a-z]+\.wav)
You can also add more boundaries to it, if you might want to.
here]2]2
Graph
This graph shows how the expression would work and you can visualize other expressions in this link:
PHP Code
You could also use preg_match_all to do so, maybe something similar to:
$re = '/([a-z]+\.wav)/m';
$str = 'Spoke as as other again ye. Hard on to roof he drew. So sell side newfile.wav ye in mr evil. Longer waited mr of nature seemed. Improving knowledge incommode objection me ye is prevailed playme.wav principle in. Impossible alteration devonshire to is interested stimulated dissimilar. To matter esteem polite do if.
Spot of come to ever test.wav hand as lady meet on. Delicate contempt received two yet advanced. Gentleman as belonging he commanded believing dejection in by. On no am winding chicken so behaved. Its preserved sex enjoyment new way behaviour. Him yet devonshire celebrated welcome.wav especially. Unfeeling one provision are smallness resembled repulsive.
Raising say express had chiefly detract demands she. Quiet led own cause three him. Front no party young abode state up. Saved he do fruit woody of to. Met defective are allowance two perceived listening consulted contained. It chicken oh colonel pressed excited suppose to shortly. He improve started no we manners another.wav however effects. Prospect humoured mistress to by proposal marianne attended. Simplicity the far admiration preference everything. Up help home head spot an he room in.
Talent she for lively eat led sister. Entrance strongly packages she out rendered get quitting denoting led. Dwelling confined improved it he no doubtful raptures. Several carried through an of up attempt gravity. Situation to be at offending elsewhere distrusts if. Particular use for considered projection cultivated. Worth of do doubt shall it their. Extensive existence up me last.wav contained he pronounce do. Excellence inquietude assistance precaution any impression man sufficient. ';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
// Print the entire match result
var_dump($matches);
Test Script for RegEx
const regex = /([a-z]+\.wav)/gm;
const str = `Spoke as as other again ye. Hard on to roof he drew. So sell side newfile.wav ye in mr evil. Longer waited mr of nature seemed. Improving knowledge incommode objection me ye is prevailed playme.wav principle in. Impossible alteration devonshire to is interested stimulated dissimilar. To matter esteem polite do if.
Spot of come to ever test.wav hand as lady meet on. Delicate contempt received two yet advanced. Gentleman as belonging he commanded believing dejection in by. On no am winding chicken so behaved. Its preserved sex enjoyment new way behaviour. Him yet devonshire celebrated welcome.wav especially. Unfeeling one provision are smallness resembled repulsive.
Raising say express had chiefly detract demands she. Quiet led own cause three him. Front no party young abode state up. Saved he do fruit woody of to. Met defective are allowance two perceived listening consulted contained. It chicken oh colonel pressed excited suppose to shortly. He improve started no we manners another.wav however effects. Prospect humoured mistress to by proposal marianne attended. Simplicity the far admiration preference everything. Up help home head spot an he room in.
Talent she for lively eat led sister. Entrance strongly packages she out rendered get quitting denoting led. Dwelling confined improved it he no doubtful raptures. Several carried through an of up attempt gravity. Situation to be at offending elsewhere distrusts if. Particular use for considered projection cultivated. Worth of do doubt shall it their. Extensive existence up me last.wav contained he pronounce do. Excellence inquietude assistance precaution any impression man sufficient. `;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}

This is why regular expressions were invented.
$text = file_get_contents('test.txt');
preg_match_all('/(\w+\.wav)/', $text, $matches);
var_dump($matches[0]);
Some good resources:
preg_match
preg_replace
regex101.com allows you to test expressions realtime
output:
array(6) {
[0] => string(11) "newfile.wav"
[1] => string(10) "playme.wav"
[2] => string(8) "test.wav"
[3] => string(11) "welcome.wav"
[4] => string(11) "another.wav"
[5] => string(8) "last.wav"
}

You are almost there. You can explode the $line in terms of spaces. Now, you go through each word and check if ends with a .wav extension. If yes, you print the word.
<?php
foreach ($lines as $line_num => $line) {
$line = trim($line);
$words = explode(" ",$line);
foreach($words as $each_word){
$wav_index = strpos($each_word, '.wav');
if ($wav_index !== false && $wav_index === strlen($each_word) - 4) { // strict check to make sure string ends with a .wav and not being elsewhere
echo $each_word,PHP_EOL;
}
}
}

Related

JSON output in array using PHP explode

So, I have an output from an API using JSON:
{"listings":[{"adoption_fee":"$200","adoption_process":"For cats, please fill out our \u003ca href=\"http://www.haart.org.au/pre-adoption-form-cats/\"\u003ePre-Adoption Questionnaire - Cats\u003c/a\u003e.\r\n\r\nFor dogs, please fill out our \u003ca href=\"http://www.haart.org.au/pre-adoption-form-dogs/\"\u003ePre-Adoption Questionnaire - Dogs\u003c/a\u003e.\r\n\r\nFor more information on our Adoption Process, please visit this \u003ca href=\"http://www.haart.org.au/our-adoption-process/\"\u003elink\u003c/a\u003e.\r\n\r\nPlease make sure that you are familiar with our \u003ca href=\"http://www.haart.org.au/adoption-agreement/\"\u003eAdoption Agreement\u003c/a\u003e as it has recently changed.\r\n\r\nFor more information on any of our animals, please \u003ca href=\"http://www.haart.org.au/contact-us/\"\u003eContact Us\u003c/a\u003e.","age":"5 years 8 months","breeds":["Domestic Short Hair"],"breeds_display":"Domestic Short Hair","coat":"Short","contact_name":null,"contact_number":"08 6336 9410","contact_preferred_method":"Email","created_at":"2/9/2014 15:23","date_of_birth":"12/7/2012","desexed":true,"foster_needed":false,"gender":"Female","group":"Homeless and Abused Animal Rescue Team","heart_worm_treated":null,"id":316602,"interstate":false,"last_updated":"22/3/2018 9:40","medical_notes":"","microchip_number":"","mix":false,"multiple_animals":false,"name":"Whinney HC13-154","personality":"Whinney is an independent girl who likes lazing around the house, she's not bothered by other cats or dogs as long as they don't want to cuddle too much, then she will find her own alone space. \r\n\r\nShe will come up for the occasional cuddle but generally a cosy spot at the end of the bed or couch is all this beautiful girl craves :) \r\n\r\n** PLEASE NOTE: all HAART cats are to be adopted as indoor only cats for their safety and to comply with the legal requirements of the Cat Act. \r\nHAART recommends the use of Oscillot cat fencing or feline safe Catio's/portable caboodle to ensure they have access to the outdoors. \r\nPlease ask us for information on other suitable products **","photos":[{"small_80":"https://res.cloudinary.com/petrescue/image/upload/h_80,w_80,c_pad,q_auto:best/petrescue-production-s3/uploads/pet_photos/2014/9/2/316602_db5ee_orig.jpg","medium_130":"https://res.cloudinary.com/petrescue/image/upload/h_130,w_130,c_pad,q_auto:best/petrescue-production-s3/uploads/pet_photos/2014/9/2/316602_db5ee_orig.jpg","large_340":"https://res.cloudinary.com/petrescue/image/upload/h_340,w_340,c_pad,q_auto:best/petrescue-production-s3/uploads/pet_photos/2014/9/2/316602_db5ee_orig.jpg","xlarge_900":"https://res.cloudinary.com/petrescue/image/upload/h_900,w_900,c_pad,q_auto:best/petrescue-production-s3/uploads/pet_photos/2014/9/2/316602_db5ee_orig.jpg"},{"small_80":"https://res.cloudinary.com/petrescue/image/upload/h_80,w_80,c_pad,q_auto:best/petrescue-production-s3/uploads/pet_photos/2014/9/2/316602_fb958_orig.jpg","medium_130":"https://res.cloudinary.com/petrescue/image/upload/h_130,w_130,c_pad,q_auto:best/petrescue-production-s3/uploads/pet_photos/2014/9/2/316602_fb958_orig.jpg","large_340":"https://res.cloudinary.com/petrescue/image/upload/h_340,w_340,c_pad,q_auto:best/petrescue-production-s3/uploads/pet_photos/2014/9/2/316602_fb958_orig.jpg","xlarge_900":"https://res.cloudinary.com/petrescue/image/upload/h_900,w_900,c_pad,q_auto:best/petrescue-production-s3/uploads/pet_photos/2014/9/2/316602_fb958_orig.jpg"},{"small_80":"https://res.cloudinary.com/petrescue/image/upload/h_80,w_80,c_pad,q_auto:best/petrescue-production-s3/uploads/pet_photos/2014/9/2/316602_9030a_orig.jpg","medium_130":"https://res.cloudinary.com/petrescue/image/upload/h_130,w_130,c_pad,q_auto:best/petrescue-production-s3/uploads/pet_photos/2014/9/2/316602_9030a_orig.jpg","large_340":"https://res.cloudinary.com/petrescue/image/upload/h_340,w_340,c_pad,q_auto:best/petrescue-production-s3/uploads/pet_photos/2014/9/2/316602_9030a_orig.jpg","xlarge_900":"https://res.cloudinary.com/petrescue/image/upload/h_900,w_900,c_pad,q_auto:best/petrescue-production-s3/uploads/pet_photos/2014/9/2/316602_9030a_orig.jpg"}],"senior":false,"size":null,"species":"Cat","state":"WA","postcode":"6000","vaccinated":"Yes","wormed":"Yes"}
I want to grab the "adoption_process" and "personality" parts and using PHP break the lines when "\n" is in the output.
I have the following:
$adoption_process = {
foreach ($line in explode("\n", $json['adoption_process']))
{$line = trim($line);
}
}
Then I'm using print_r to echo the output. But it's not working.
Open to suggestions on other (better) ways to do this.
The JSON you mention is invalid. It's not properly closed array at the end. I have copy the JSON and put changes. Just have look below, Hope it will help you out.
<?php
$json = '{"listings":[{"adoption_fee":"$200","adoption_process":"For cats, please fill out our \u003ca href=\"http://www.haart.org.au/pre-adoption-form-cats/\"\u003ePre-Adoption Questionnaire - Cats\u003c/a\u003e.\r\n\r\nFor dogs, please fill out our \u003ca href=\"http://www.haart.org.au/pre-adoption-form-dogs/\"\u003ePre-Adoption Questionnaire - Dogs\u003c/a\u003e.\r\n\r\nFor more information on our Adoption Process, please visit this \u003ca href=\"http://www.haart.org.au/our-adoption-process/\"\u003elink\u003c/a\u003e.\r\n\r\nPlease make sure that you are familiar with our \u003ca href=\"http://www.haart.org.au/adoption-agreement/\"\u003eAdoption Agreement\u003c/a\u003e as it has recently changed.\r\n\r\nFor more information on any of our animals, please \u003ca href=\"http://www.haart.org.au/contact-us/\"\u003eContact Us\u003c/a\u003e.","age":"5 years 8 months","breeds":["Domestic Short Hair"],"breeds_display":"Domestic Short Hair","coat":"Short","contact_name":null,"contact_number":"08 6336 9410","contact_preferred_method":"Email","created_at":"2/9/2014 15:23","date_of_birth":"12/7/2012","desexed":true,"foster_needed":false,"gender":"Female","group":"Homeless and Abused Animal Rescue Team","heart_worm_treated":null,"id":316602,"interstate":false,"last_updated":"22/3/2018 9:40","medical_notes":"","microchip_number":"","mix":false,"multiple_animals":false,"name":"Whinney HC13-154","personality":"Whinney is an independent girl who likes lazing around the house, she\'s not bothered by other cats or dogs as long as they don\'t want to cuddle too much, then she will find her own alone space. \r\n\r\nShe will come up for the occasional cuddle but generally a cosy spot at the end of the bed or couch is all this beautiful girl craves :) \r\n\r\n** PLEASE NOTE: all HAART cats are to be adopted as indoor only cats for their safety and to comply with the legal requirements of the Cat Act. \r\nHAART recommends the use of Oscillot cat fencing or feline safe Catio\'s/portable caboodle to ensure they have access to the outdoors. \r\nPlease ask us for information on other suitable products **","photos":[{"small_80":"https://res.cloudinary.com/petrescue/image/upload/h_80,w_80,c_pad,q_auto:best/petrescue-production-s3/uploads/pet_photos/2014/9/2/316602_db5ee_orig.jpg","medium_130":"https://res.cloudinary.com/petrescue/image/upload/h_130,w_130,c_pad,q_auto:best/petrescue-production-s3/uploads/pet_photos/2014/9/2/316602_db5ee_orig.jpg","large_340":"https://res.cloudinary.com/petrescue/image/upload/h_340,w_340,c_pad,q_auto:best/petrescue-production-s3/uploads/pet_photos/2014/9/2/316602_db5ee_orig.jpg","xlarge_900":"https://res.cloudinary.com/petrescue/image/upload/h_900,w_900,c_pad,q_auto:best/petrescue-production-s3/uploads/pet_photos/2014/9/2/316602_db5ee_orig.jpg"},{"small_80":"https://res.cloudinary.com/petrescue/image/upload/h_80,w_80,c_pad,q_auto:best/petrescue-production-s3/uploads/pet_photos/2014/9/2/316602_fb958_orig.jpg","medium_130":"https://res.cloudinary.com/petrescue/image/upload/h_130,w_130,c_pad,q_auto:best/petrescue-production-s3/uploads/pet_photos/2014/9/2/316602_fb958_orig.jpg","large_340":"https://res.cloudinary.com/petrescue/image/upload/h_340,w_340,c_pad,q_auto:best/petrescue-production-s3/uploads/pet_photos/2014/9/2/316602_fb958_orig.jpg","xlarge_900":"https://res.cloudinary.com/petrescue/image/upload/h_900,w_900,c_pad,q_auto:best/petrescue-production-s3/uploads/pet_photos/2014/9/2/316602_fb958_orig.jpg"},{"small_80":"https://res.cloudinary.com/petrescue/image/upload/h_80,w_80,c_pad,q_auto:best/petrescue-production-s3/uploads/pet_photos/2014/9/2/316602_9030a_orig.jpg","medium_130":"https://res.cloudinary.com/petrescue/image/upload/h_130,w_130,c_pad,q_auto:best/petrescue-production-s3/uploads/pet_photos/2014/9/2/316602_9030a_orig.jpg","large_340":"https://res.cloudinary.com/petrescue/image/upload/h_340,w_340,c_pad,q_auto:best/petrescue-production-s3/uploads/pet_photos/2014/9/2/316602_9030a_orig.jpg","xlarge_900":"https://res.cloudinary.com/petrescue/image/upload/h_900,w_900,c_pad,q_auto:best/petrescue-production-s3/uploads/pet_photos/2014/9/2/316602_9030a_orig.jpg"}],"senior":false,"size":null,"species":"Cat","state":"WA","postcode":"6000","vaccinated":"Yes","wormed":"Yes"}]}';
$data = json_decode($json,true);
foreach($data['listings'] as $key => $value){
$process = explode("\n",$value['adoption_process']);
foreach($process as $k => $v){
if(trim($v)){
$line[] = $v;
}
}
}
print_r($line);

Filtering RSS links with regular expressions

I'm a bit of a noob, but have been getting my feet wet building a site in php on localhost. The problem that I'm having is that I can't figure out how to filter RSS content that contains a regular expression in their links.
My code to display a RSS feed with PHP:
<?php
///// RSS FEED CODE
function getFeed1($feed_url) {
$content = file_get_contents($feed_url);
$x = new SimpleXmlElement($content);
echo "<ul>";
foreach($x->channel->item as $entry) {
echo "<li><a href='$entry->link' title='$entry->title'>" . $entry->title . "</a></li>";
}
echo "</ul>";
}
getFeed1("http://www.drf.com/feeds/all-articles-of-track/SA");
?>
The results are displayed as such in a browser as links to a page,
Espinoza wins George Woolf Memorial Jockey Award
Dortmund will get month to clear up foot problem
Abrams hopes McHeat stays hot for Sensational Star
Santa Anita attendance up, handle down
Hot Market returns from long absence on hillside turf course
Moon Over Paris, Divina Comedia key to pick six
Millionaire Alert Bay looks to pad bankroll in Sensational Star
Santa Anita to replace turf course this summer
Free: Santa Anita horses to watch for week of Feb. 22
Iron Rob vanned off after winning Baffle Stakes
I am trying to figure out how to use an if-statement that will filter out the links(href) that start with “http://www.drf.com/news/preview/”.
So the results will look like:
Espinoza wins George Woolf Memorial Jockey Award
Santa Anita attendance up, handle down
Millionaire Alert Bay looks to pad bankroll in Sensational Star
Santa Anita to replace turf course this summer
Iron Rob vanned off after winning Baffle Stakes
I've spent the last two days trying different variations of:
if (strpos($x, 'http://www.drf.com/news/preview/') !== false)
and
if (preg_match('http://www.drf.com/news/preview/', $x))
Yet I can't get the syntax right or I'm screwing up somewhere.
I have found post that suggest using third party filters, or the dead yahoo pipes, yet I have a feeling that what I seek can be accomplished with an if-statement. I have yet to find anything that can parse out a rss href using a regular expression.
For the people who know php, what am I missing? I have spent the last two days googling and trying different things mentioned on the internet, but to no avail. I know the chase is always better then the catch, yet I lost the tracks of my prey. Please Help by pointing me, and others who found this post, find the trail.
Thank you
This is the regular expression you are looking for:
/^(http\:\/\/www\.drf\.com\/news\/preview\/)/i
You should accept HTTPS too with a small modification:
/^(https?\:\/\/www\.drf\.com\/news\/preview\/)/i
And do not fall back on the www subdomain!
/^(https?\:\/\/(www\.)?drf\.com\/news\/preview\/)/i

United Kingdom (GB) postal code validation without regex

I have tried several regexes and still some valid postal codes sometimes get rejected.
Searching the internet, Wikipedia and SO, I could only find regex validation solutions.
Is there a validation method which does not use regex? In any language, I guess it would be easy to port.
I supose the easiest would be to compare against a postal code database, yet that would need to be maintained and updated periodically from a reliable source.
Edit: To help future visitors and keep you from posting any more regexes, here's a regex which I have tested (as of 2013-04-24) to work for all postal codes in Code Point (see #Mikkel Løkke's answer):
//PHP PCRE (it was on Wikipedia, it isn't there anymore; I might have modified it, don't remember).
$strPostalCode=preg_replace("/[\s]/", "", $strPostalCode);
$bValid=preg_match("/^(GIR 0AA)|(((A[BL]|B[ABDHLNRSTX]?|C[ABFHMORTVW]|D[ADEGHLNTY]|E[HNX]?|F[KY]|G[LUY]?|H[ADGPRSUX]|I[GMPV]|JE|K[ATWY]|L[ADELNSU]?|M[EKL]?|N[EGNPRW]?|O[LX]|P[AEHLOR]|R[GHM]|S[AEGKLMNOPRSTY]?|T[ADFNQRSW]|UB|W[ADFNRSV]|YO|ZE)[1-9]?[0-9]|((E|N|NW|SE|SW|W)1|EC[1-4]|WC[12])[A-HJKMNPR-Y]|(SW|W)([2-9]|[1-9][0-9])|EC[1-9][0-9])[0-9][ABD-HJLNP-UW-Z]{2})$/i", $strPostalCode);
I'm writing this answer based on the wiki page.
When checking on the validation part, it seems that there are 6 type of formats (A = letter and 9 = digit):
AA9A 9AA AA9A9AA AA9A9AA
A9A 9AA Removing space A9A9AA order it AA999AA
A9 9AA ------------------> A99AA -------------> AA99AA
A99 9AA A999AA A9A9AA
AA9 9AA AA99AA A999AA
AA99 9AA AA999AA A99AA
As we can see, the length may vary from 5 to 7 and we have to take in account some special cases if we want to.
So the function we are coding has to do the following:
Remove spaces and convert to uppercase (or lower case).
Check if the input is an exception, if it is it should return valid
Check if the input's length is 4 < length < 8.
Check if it's a valid postcode.
The last part is tricky, but we will split it in 3 sections by length for some overview:
Length = 7: AA9A9AA and AA999AA
Length = 6: AA99AA, A9A9AA and A999AA
Length = 5: A99AA
For this we will be using a switch(). From now on it's just a matter of checking character by character if it's a letter or a number on the right place.
So let's take a look at our PHP implementation:
function check_uk_postcode($string){
// Start config
$valid_return_value = 'valid';
$invalid_return_value = 'invalid';
$exceptions = array('BS981TL', 'BX11LT', 'BX21LB', 'BX32BB', 'BX55AT', 'CF101BH', 'CF991NA', 'DE993GG', 'DH981BT', 'DH991NS', 'E161XL', 'E202AQ', 'E202BB', 'E202ST', 'E203BS', 'E203EL', 'E203ET', 'E203HB', 'E203HY', 'E981SN', 'E981ST', 'E981TT', 'EC2N2DB', 'EC4Y0HQ', 'EH991SP', 'G581SB', 'GIR0AA', 'IV212LR', 'L304GB', 'LS981FD', 'N19GU', 'N811ER', 'NG801EH', 'NG801LH', 'NG801RH', 'NG801TH', 'SE18UJ', 'SN381NW', 'SW1A0AA', 'SW1A0PW', 'SW1A1AA', 'SW1A2AA', 'SW1P3EU', 'SW1W0DT', 'TW89GS', 'W1A1AA', 'W1D4FA', 'W1N4DJ');
// Add Overseas territories ?
array_push($exceptions, 'AI-2640', 'ASCN1ZZ', 'STHL1ZZ', 'TDCU1ZZ', 'BBND1ZZ', 'BIQQ1ZZ', 'FIQQ1ZZ', 'GX111AA', 'PCRN1ZZ', 'SIQQ1ZZ', 'TKCA1ZZ');
// End config
$string = strtoupper(preg_replace('/\s/', '', $string)); // Remove the spaces and convert to uppercase.
$exceptions = array_flip($exceptions);
if(isset($exceptions[$string])){return $valid_return_value;} // Check for valid exception
$length = strlen($string);
if($length < 5 || $length > 7){return $invalid_return_value;} // Check for invalid length
$letters = array_flip(range('A', 'Z')); // An array of letters as keys
$numbers = array_flip(range(0, 9)); // An array of numbers as keys
switch($length){
case 7:
if(!isset($letters[$string[0]], $letters[$string[1]], $numbers[$string[2]], $numbers[$string[4]], $letters[$string[5]], $letters[$string[6]])){break;}
if(isset($letters[$string[3]]) || isset($numbers[$string[3]])){
return $valid_return_value;
}
break;
case 6:
if(!isset($letters[$string[0]], $numbers[$string[3]], $letters[$string[4]], $letters[$string[5]])){break;}
if(isset($letters[$string[1]], $numbers[$string[2]]) || isset($numbers[$string[1]], $letters[$string[2]]) || isset($numbers[$string[1]], $numbers[$string[2]])){
return $valid_return_value;
}
break;
case 5:
if(isset($letters[$string[0]], $numbers[$string[1]], $numbers[$string[2]], $letters[$string[3]], $letters[$string[4]])){
return $valid_return_value;
}
break;
}
return $invalid_return_value;
}
Note that I've not added British Forces Post Office and non-geographic codes.
Usage:
echo check_uk_postcode('AE3A 6AR').'<br>'; // valid
echo check_uk_postcode('Z9 9BA').'<br>'; // valid
echo check_uk_postcode('AE3A6AR').'<br>'; // valid
echo check_uk_postcode('EE34 6FR').'<br>'; // valid
echo check_uk_postcode('A23A 7AR').'<br>'; // invalid
echo check_uk_postcode('A23A 7AR').'<br>'; // invalid
echo check_uk_postcode('WA3334E').'<br>'; // invalid
echo check_uk_postcode('A2 AAR').'<br>'; // invalid
As supplied by the UK government.
(GIR 0AA)|((([A-Z-[QVX]][0-9][0-9]?)|(([A-Z-[QVX]][A-Z-[IJZ]][0-9][0-9]?)|(([A-Z-[QVX]][0-9][A-HJKSTUW])|([A-Z-[QVX]][A-Z-[IJZ]][0-9][ABEHMNPRVWXY])))) [0-9][A-Z-[CIKMOV]]{2})
I've built London only postcode based apps using the postcodes I got from HERE. But to be honest, even with London postcodes only, you need a lot more storage than necessary. Sure, the idea is trivial.
Store the postcodes, take the user input or whatever, and see if you get a match. But you are complicating the solution far more than you think. I HAD to use actual postcodes to achieve what I wanted, but for simple validation purposes, as hard as "maintaining" a regex is, storing tens of thousands or hundreds of thousands(if not more) and validating more or less in real-time is a far more difficult task.
If a mini distributed service sounds like a more efficient solution than a regex, go for it, but I'm sure it isn't. Unless you need geo-spatial querying of your own data against UK postcodes or things like that, I doubt DB storage is a feasible solution. Just my 2 cents.
Update
According to this index, there are 1,758,417 postcodes in the UK. I can tell you I am using a few Mongo clusters (Amazon EC2 High Memory Instances) to provide reliable London only services(indexing only London postcodes), and it's quite a pricy thing, even with basic storage.
Admittedly, the app is performing medium complexity geo-spatial queries, but the storage requirements alone are very expensive and demanding.
Bottom line, just stick to regex and be done with it in two minutes.
Im looking at the Postcodes in United Kingdom link in wikipedia right now.
http://en.wikipedia.org/wiki/Postcodes_in_the_United_Kingdom
The Validation section lists six formats with a combination of letters and numbers. Then there's more information in the notes below that. The first thing that I would try is a BNF type grammar with a tool like GoldParserBuilder. You could describe the basic formats in a more readable format, with efficient parser and lexer automatically generated. In the past, I've successfully used such tools to avoid writing huge, ugly regexes.
From that point, the program has a properly formatted zip code of a known type. At this point, the specific numbers or letters might violate something. Each type of zip code can have a function programmed to look for violations of that specific type. The final product will consist of an automatically generated parser that passes unvalidated, but structured/identified, zip codes to a dedicated validation function. You can then refactor or optimize from there.
(You can also use the grammar itself to enforce or disallow certain literals and combinations. Whatever is more readable or comprehensible for you. Different people gravitate toward different ends of these things.)
Here's a page highlighting advantages of GOLD Parsing System.You can use any you like: I just promote this one b/c it's good at its job and has steadily improved over many years.
http://www.goldparser.org/about/why-use-gold.htm
I would think the RegEX, while long-winded would probably be the best solution if all you want to do is validate if something could be a valid UK post code.
If you need absolute data, consider using Ordnance Survey OpenData initiative "Code-Point® Open" dataset, which is a CSV of lots of data points in Great Britain (so not Northern Ireland I'm guessing) one of which is postcode. Be aware that the file is 20MB, so you may have to convert it to a more manageable format.
Regexes are hard to debug, hard to port from one regex flavor to another (silent "errors"), and hard to update.
That is true for most regexes, but why don't you just split it up into multiple parts? You can easily split it into six parts for the six different general rules and maybe even more if you take all of the special cases into account.
Creating a well-commented method of 20 lines with simple regexes is easy to debug (one simple regex per line) and also easy to update. The porting problem is the same, but on the other hand you do not need to use some fancy grammar lib.
Are third party services an option?
http://www.postcodeanywhere.co.uk/address-validation/
GeoNames Database:
http://www.geonames.org/postal-codes/
+1 for the "why care" comments. I have had to use the 'official' regex in various projects and while I have never attempted to break it down, it works and it does the job. I've used it with Java and PHP code without any need to convert it between regex formats.
Is there a reason why you would have to debug it or break it down?
Incidentally, the regex rule used to be found on wikipedia, but it appears to have gone.
Edit: As for the space/no-space debate, the postcode should be valid with or without the space. As the last part of the postcode (after the space) is ALWAYS three digits, it is possible to insert the space manually, which will then allow you to run it through the regex rule.
Take the list of valid postcodes and check if the one entered is in it.

User Friendly, Easy to Remember Coupon Codes

I want to create coupon codes that users can remember easily. My idea is something like:
squirrel45
nantucket23
That is, a real word chosen randomly from a long dictionary list (preferably compiled for this purpose) combined 2 random digits. My questions are:
Where can I find such a dictionary list?
Do you see any problems with the system? (security is not ultra important here, just something reasonable is fine)
Can you suggest any good improvements or alternatives?
Fwiw I am not crazy about the Markov word generators because I think their idiosyncrasies would be too hard to remember. I'd like a client to be able to keep the code in his head, and tell it to the merchant when he arrives to redeem it.
Thanks,
Jonah
Word lists are easy to find. Make sure you sanity filter them for foul words ;)
Here's a huge word list that can be easily scrubbed:
http://www.scrabble-assoc.com/boards/dictionary/10-15-20030401.txt
From there you can easily load in words into your database and create your coupon code like so:
$coupon_code = $rand_word . rand(20,99);
After you do this, simply store your coupon code in the database and whenever you make a new code, check it against existing codes before you apply it. Even slim odds are possible odds.
More word lists in various formats:
http://scrabble.wonderhowto.com/blog/ultimate-scrabble-word-list-resource-0115617/
5-letter words:
http://homepage.ntlworld.com/adam.bozon/Scrabble5.htm
6-letter words:
http://homepage.ntlworld.com/adam.bozon/Scrabble6.htm
7-letter words:
http://homepage.ntlworld.com/adam.bozon/Scrabble7.htm
8-letter words:
http://homepage.ntlworld.com/adam.bozon/Scrabble8.htm
Sample:
PIKES PIKIS PILAF PILAR PILAU PILAW PILEA PILED PILEI PILES PILIS
PILLS PILOT PILUS PIMAS PIMPS PINAS PINCH PINED PINES PINEY PINGO
PINGS PINKO PINKS PINKY PINNA PINNY PINON PINOT PINTA PINTO PINTS
PINUP PIONS PIOUS PIPAL PIPED PIPER PIPES PIPET PIPIT PIQUE PIRNS
PIROG PISCO PISOS PISTE PITAS PITCH PITHS PITHY PITON PIVOT PIXEL
PIXES PIXIE PIZZA PLACE PLACK PLAGE PLAID PLAIN PLAIT PLANE PLANK
PLANS PLANT PLASH PLASM PLATE PLATS PLATY PLAYA PLAYS PLAZA PLEAD
PLEAS PLEAT PLEBE PLEBS PLENA PLEWS PLICA PLIED PLIER PLIES PLINK
PLODS PLONK PLOPS PLOTS PLOTZ PLOWS PLOYS PLUCK PLUGS PLUMB PLUME
PLUMP PLUMS PLUMY PLUNK PLUSH PLYER POACH POCKS POCKY PODGY PODIA
POEMS POESY POETS POGEY POILU POIND POINT POISE POKED POKER POKES
With that you could generate a coupon code POACH72
Concatenating 2 words will increase the security posture of your system.
e.g. squirrel.nantucket.123
The Diceware page has a couple of long word lists, American and International. It also has a useful description of how to meet various levels of security.

Using regex to extract variables from a plain-text form letter?

I'm looking for a good example of using Regular Expressions in PHP to "reverse engineer" a form letter (with a known format, of course) that has been pasted into a multiline textbox and sent to a script for processing.
So, for example, let's assume this is the original plain-text input (taken from a USDA press release):
WASHINGTON, April 5, 2010 - North
American Bison Co-Op, a New Rockford,
N.D., establishment is recalling
approximately 25,000 pounds of whole
beef heads containing tongues that may
not have had the tonsils completely
removed, which is not compliant with
regulations that require the removal
of tonsils from cattle of all ages,
the U.S. Department of Agriculture's
Food Safety and Inspection Service
(FSIS) announced today.
For clarity, the fields that are variables are highlighted below:
[pr_city=]WASHINGTON, [pr_date=]April 5, 2010 - [corp_name=]North
American Bison Co-Op, a [corp_city=]New Rockford,
[corp_state=]N.D., establishment is recalling
approximately [amount=]25,000 pounds of [product=]whole
beef heads containing tongues that may
not have had the tonsils completely
removed, which is not compliant with
regulations that require [reason=]the removal
of tonsils from cattle of all ages,
the U.S. Department of Agriculture's
Food Safety and Inspection Service
(FSIS) announced today.
How could I efficiently extract the contents of the
pr_city
pr_date
corp_name
corp_city
corp_state
amount
product
reason
fields from my example?
Any help would be appreciated, thanks.
Well, a regex that works on your example could look like this (line breaks introduced to keep this beast legible, need to be removed prior to use):
/^(?P<pr_city>[^,]+), (?P<pr_date>[^-]+) - (?P<corp_name>.*?), a
(?P<corp_city>[^,]+), (?P<corp_state>[^,]+), establishment is
recalling approximately (?P<amount>.*?) of (?P<product>.*?),
which is not compliant with regulations that require (?P<reason>.*?),
the U\.S\. Department of Agriculture\'s Food Safety and Inspection
Service \(FSIS\) announced today\.$/
So, in PHP you could do
if (preg_match('/^(?P<pr_city>[^,]+), (?P<pr_date>[^-]+) - (?P<corp_name>.*?), a (?P<corp_city>[^,]+), (?P<corp_state>[^,]+), establishment is recalling approximately (?P<amount>.*?) of (?P<product>.*?), which is not compliant with regulations that require (?P<reason>.*?), the U\.S\. Department of Agriculture\'s Food Safety and Inspection Service \(FSIS\) announced today\.$/', $subject, $regs)) {
$prcity = $regs['pr_city'];
$prdate = $regs['pr_date'];
... etc.
} else {
$result = "";
}
This assumes a couple of things, for instance that there are no line breaks, and that the input is the entire string (and not a larger string from which this part has to be extracted from). I've tried to make assumptions about legal values that make some sense, but there is the very real chance that other inputs could break this. So some more test cases are probably needed.
If the surrounding text is constant, then something like this partial regex could do the trick:
preg_match('/^(.*?), (.*?)- (.*?), a (.*?), (.*?), establishment is recalling approximately (.*?), which is not compliant with regulations that require (.*?), the U.S. Department of Agriculture's Food Safety and Inspection Service (FSIS) announced today./', $text, $matches);
$matches[1] = 'WASHINGTON';
$matches[2] = 'April 5, 2010';
$matches[3] = ... etc...
If the surrounding text changes, then you're going to end up with a ton of false matches, no matches, etc... Essentially you'd need an AI to parse/understand PR releases.
Edit: Please disregard this crazy answer, as the other two are better. I should probably delete it, but I'm keeping it up for reference.
I have a crazy idea that just might work: build an XML string from the input by adding markups, then parse it. It might look something like this (completely untested) code:
preg_replace('([^,]*), ([^-]*)- ...etc...', '<pr_city>\1</pr_city><pr_date>\2</pr_date> ...etc...');
Parsing the XML afterwards is a needlessly complicated process that is best left to the PHP documentation: http://www.php.net/manual/en/function.xml-parse.php .
You could also consider converting it to JSON with this method, then using json_decode() to parse it. In any case, you have to think about what happens when " marks and > symbols appear in the input.
It might be easier to just match and remove one piece of the text at a time.

Categories