I have a text file with something like
Country1
city1
city2
Country2
city3
city4
I want to separate country and cities. Is there any quick way of doing it? I am thinking of some file handling and then extracting to different files, is it best way or can be done with some regex etc quickly?
countries=[]
cities=[]
with open("countries.txt") as f:
gap=True
for line in f:
line=line.strip()
if gap:
countries.append(line)
gap=False
elif line=="":
gap=True
else:
cities.append(line)
print countries
print cities
output:
['Country1', 'Country2']
['city1', 'city2', 'city3', 'city4']
if you want to write these to files:
with open("countries.txt","w") as country_file, open("cities.txt","w") as city_file:
country_file.write("\n".join(countries))
city_file.write("\n".join(cities))
f = open('b.txt', 'r')
status = True
country = []
city = []
for line in f:
line = line.strip('\n').strip()
if line:
if status:
country.append(line)
status = False
else:
city.append(line)
else:
status = True
print country
print city
output :
>>['city1', 'city2', 'city3', 'city4']
>>['Country1', 'Country2']
$countries = array();
$cities = array();
$gap = false;
$file = file('path/to/file');
foreach($file as $line)
{
if($line == '') $gap = true;
elseif ($line != '' and $gap)
{
$countries[] = $line;
$gap = false;
}
elseif ($line != '' and !$gap) $cities[] = $line;
}
Depending on how regular your file is, it may be this simple in python:
with open('inputfile.txt') as fh:
# To iterate over the entire file.
for country in fh:
cityLines = [next(fh) for _i in range(2)]
# read a blank line to advance countries.
next(fh)
That's not likely to be exactly right, because I imagine many countries have variable numbers of cities. You could modify it like so to address that:
with open('inputfile.txt') as fh:
# To iterate over the entire file.
for country in fh:
# we assume here that each country has at least 1 city.
cities = [next(fh).strip()]
while cities[-1]: # will continue until we encounter a blank line.
cities.append(next(fh).strip())
That doesn't do anything to put the data into an output file, or store it much past the file handle itself, but it's a start. You really should choose a language for your questions though. A lot of the time until
Another PHP example that doesn't read the entire file in an array.
<?php
$fh = fopen('countries.txt', 'r');
$countries = array();
$cities = array();
while ( $data = fgets($fh) )
{
// If $country is empty (or not defined), the this line is a country.
if ( ! isset($country) )
{
$country = trim($data);
$countries[] = $country;
}
// If an empty line is found, unset $country.
elseif ( ! trim($data) )
unset($country);
// City
else
$cities[$country][] = trim($data);
}
fclose($fh);
The $countries array will contain a list of countries while the $cities array will contain a list of cities by countries.
Is there some pattern that distinguishes countries from cities? Or is it that the first line after a blank line is a country and all subsequent lines are city names until the next blank line? Alternatively are you finding countries based on a look-up table (a "dictionary" in Python; an associative array in PHP; a hash in Perl --- one that includes all the officially recognized countries)?
Is it safe to assume that there are no cities whose names collide with any country? Is there a France, Iowa, USA, or the old Usa, Japan?
What do you want to do with these after you separate them? You mention "some file handling and then extracting to different files" --- are you thinking of something like one file per country containing a list of all the cities therein? Or one directory per country and one file per city?
The obvious approach would be to iterate over the file, line by line, and maintain a little state machine: empty (beginning of file, blank lines between countries?) during which you enter the "country" state (whenever you found any pattern that matches whatever criteria means you've encountered the name of a country). Once you've found a country name then you're in the city loading state. I would create a dictionary using country names as keys and set of cities as cities (though perhaps you might really need county/provice, city name tuples in cases where a country has multiple cities by the same name: Portland, Maine vs. Portland, Oregon, for example). You can also have some "error" state if the contents of your file lead to some sort of ambiguity (city names before you've determined a country, two country names in a row, whatever).
It's hard to suggest a good fragment of code given how vague your spec. here is.
Not sure that this would help, but you can try to use the following code to get dictionary and then work with it(write to files, compare and etc):
res = {}
with open('c:\\tst.txt') as f:
lines = f.readlines()
for i,line in enumerate(lines):
line = line.strip()
if (i == 0 and line):
key = line
res[key] = []
elif not line and i+1 < len(lines):
key = lines[i+1].strip()
res[key] = []
elif line and line != key:
res[key].append(line)
print res
This regex would work for your example:
/(?:^|\r\r)(.+?)\r(.+?)(?=\r\r|$)/s
Catches countries in group 1 and cities in group 2.
You may have to adjust your newline characters, depending on your system. They can be \n, \r or \r\n. edit: added a $ sign, so you don't need two linebreaks at the end. You will need the flag for dotall for the regex to work as expected.
Print fild 1 with awk - countries
awk 'BEGIN {RS="";FS="\n"} {print $1 > "countries"} {for (i=2;i<=NF;i++) print $i > "cities"}' source.txt
Related
I want to separate the area code from a phone number string by using a area code mysql database.
For example the string is 0349152023.
The endresult should be 03491 52023.
To get the endresult, i want to split the string and search every digit in database.
For example 0 and then 3 and then 4 and then take the last found result.
The code i have at the moment is only to prepare the phone number string for futher actions:
$phone1 = preg_replace('/[oO]/', '0', $phone-string);
$phone2 = preg_replace("/[^0-9]/", "", $phone1);
Then i use str_split to cut the string in pieces:
$searchArray = str_split($phone2);
Thanks for your help.
You may build an array containing all the area codes.
Then you may write something like this:
foreach ($area_codes as $code) {
if (substr($phone, 0, strlen($code)) == $code) {
$phone_string = substr($phone, 0, strlen($code))." ".substr($phone, strlen($code));
}
}
You can obviously add a controller in order to verify if the area code was found or not.
step 1: select all area codes from db and put them into an array $areaCodes
step 2: iterate over $areaCodes as $code and check if the phonenumber starts with $code. if it does, create a string that has a whitespace between the code and the rest of the number
$phonenumber = '0349152023';
$preparedPhonenumber = '';
foreach($areaCodes as $code){
if(str_pos($phonenumber, $code) === 0){
// phonenumber starts with areacode
$phoneWithoutCode = substr($phonenumber, strlen($code));
$preparedPhonenumber = $code.' '.$phoneWithoutCode;
break;
}
}
// if one of the areaCodes was 0349,
// the variable $preparedPhonenumber is now '0349 152023'
edit: you can shorten the amount of returned area codes from db by selecting only those that start with a certain string.
Let's assume the shortest area code in germany is 3 digits long (which i think is correct).
$threeDigits = substr($phonenumber,0,3);
$query = "SELECT * from areacodes
WHERE code like '".$threeDigits."%'
ORDER BY CHAR_LENGTH(code) DESC";
this will drastically shrink down the probable area codes array, therefore making the script faster.
edit 2: added order by clause in query so the above code will check for longer areacodes first. (the break; in the foreach loop is now obligatory!)
Hi Leonardo Gugliotti and Cashbee
i sort the areaCodes to get a better match. The php scripts works fine, but takes to long time to handle 5000 MySQL entries. Is it possible to make the foreach search directly in mySQL?
<?php
$sample_area_codes = array( '0350', '034', '034915', '03491', '0348', '0349', '03491', '034916', '034917',);
sort($sample_area_codes);
$phone_string = '0349152023';
foreach ($sample_area_codes as $code) {
$subString = substr($phone_string, 0, strlen($code));
if ($subString == $code) {
$phone = $subString." ".substr($phone_string, strlen($code));
}
}
if (!empty($phone)) {
echo $phone;
}
else {
echo "No AreaCode found.";
}
?>
Output: 034915 2023, which is correct
A single probe (assuming INDEX(area_code)):
SELECT ...
FROM AreaCodes
WHERE area_code < ?
ORDER BY area_code DESC
LIMIT 1;
(Where you bind the $phone_number as a string into the ?)
I think you'd better split your database into a tree, making a table for each digit.
So the third digit could refer to the second, the fourth to the third, and so on until you reach the maximum lenght of the prefix. The last table should include the name of the area.
Following your example, supposing that the maximum lenght of the area code was five digits, the fifth_digit_table should have at least four fields like these:
ID
IDref
Number
Name
10 records may have the same IDref, corresponding to the number "2" at the fourth position, linked to the previous "021" through the fourth_digit_table, the three_digit_table and so on; only one among these records, that with the Number field filled with "9", should have the Name "Haan"; the others, if there aren't any, should have the Name "Solingen".
I hope you will manage to speed up your script.
As you can see below, I'm attempting to extract the complete substring of an exploded array by using just a few characters to match the substring.
$keyword = array('Four Wheel', 'Power', 'Trailer');
function customSearch($keyword, $featurelistarray){
$key = ''; //possibly reset output
foreach($featurelistarray as $key => $arrayItem){
if( stristr( $arrayItem, $keyword ) ){
$termname = $key;
}
}
}
The array ($featurelistarray) comprises vehicle options, four wheel drive, four wheel disc brakes, power windows, power door locks, floor mats, trailer tow package, and many many more.
The point is to list all the options for a given category, and using the $keyword array to define the category.
I would also like to alphabetize the results. Thank you for the help!
To further explain, the $featurelistarray is exploded from a CSV field. The CSV field has a long length of options listed.
$featurelist=$csvdata['Options'];
$featurelistarray=explode(',',$featurelist);
$termname = $featurelistarray[0];
As you can see, $termname is assigned the first position of the exploded array. This was the original code for these features, but I need more control for $termname.
It seems to me you are trying to make database operations without database. I'd suggest to transform input into some kind of database.
This is driving me nuts.
I'm attempting to reada CSV file (done) and then work through the permutations of each row.
Each row contains several bits of data (name, price etc.).
Some of them contain slash separated lists (a/b/c/c3/c65).
What I need to do is generate all the possible variations of each row.
Example:
Row 12 =
Name = name,
Price = price,
Models = x12/x14/x56,
Codes = LP1/LP12/LP899/XP90/XP92,
From that I should be able to generate 15 variations, each with the same Name and Price, but with different Codes and varied Models;
Name Price X12 LP1
Name Price X12 LP12
Name Price X12 LP899
~
Name Price X56 XP90
Name Price X56 XP92
Yet I'm either overwriting pre-existing versions, or generating individual versions, but only getting 1 set of values changing (so I may get the 15 versions, but only Model changes, everything else stays the same).
Any help/thoughts or pointers would be appreciated!
So you have one row containing that much items,
say
$row = array('Name'=>'name', 'price'=>'price','models'=>'x12/x14/x56','codes'=>'LP1/LP12/LP899/XP90/XP92')
and you want to split models and codes with "/" then have each item as a new row in the array with all the columns those having the same value for price and name field, here is how you can do this,
$line = 0;
$result_array = array();
$result_array[$line]['name'] = $row['name'];
$result_array[$line]['price'] = $row['price'];
//split the models using explode
$tmpModels = explode("/",$row['models']);
foreach($tmpModels as $mod){
if($line > 0){
$result_array[$line]['name'] = $row['name'];
$result_array[$line]['price'] = $row['price'];
}
$result_array[$line]['model'] = $mod;
$line++;
}
$line = 0;
//now split the codes using explode
$tmpCodes = explode("/",$row['models']);
foreach($tmpCodes as $cod){
$result_array[$line]['code'] = $cod;
$line++;
}
if(count($tmpCodes) > count($tmpModels)){ // then few more rows should be added to include all from codes
foreach($tmpCodes as $cod){
$result_array[$line]['name'] = $row['name'];
$result_array[$line]['price'] = $row['price]'
$result_array[$line]['model'] = '';
$result_array[$line]['code'] = $cod;
$line++;
}
}
$result_array will have what you want.
This code is not tested, so there can be some errors, btw i hope this will surely give you an idea on how to achieve that.
Let's say you have array that looks like this:
$variant=Array();
$list[0]=array('Name'=>'Item name', 'Price'=>'$400','Models'=>'x12/x14/x56','Codes'=>'LP1/LP12/LP899/XP90/XP92');
$list[1]=array('Name'=>'Item name', 'Price'=>'$400','Models'=>'x12/x14/x56','Codes'=>'LP1/LP12/LP899/XP90/XP92'); // and more array.......
for($i=0;$i<count($list);$i++){
$Names=$list[$i]["Name"];
$Prices=$list[$i]["Price"];
$Models=explode("/",$list[$i]["Models"]);
$Codes=explode("/",$list[$i]["Codes"]);
for($i2=0;$i2<count($Codes);$i2++){
$variant[]=Array("name"=>$Names,"price"=>$Prices,"model"=>$Models[0],"code"=>$Codes[$i2]);
$variant[]=Array("name"=>$Names,"price"=>$Prices,"model"=>$Models[1],"code"=>$Codes[$i2]);
$variant[]=Array("name"=>$Names,"price"=>$Prices,"model"=>$Models[2],"code"=>$Codes[$i2]);
// You can add more models by copy paste it and change $Models[2] with next available $Models array index
}
}
var_dump($variant);
?>
The results will produce 30 array, because we have 2 rows, so that's not wrong ... okay
Reason for looping the codes
Because codes is more greater than models. So, we can catch all values.
Good luck, btw i have test it and that's worked
I have the following street names and house numbers in a text file:
Albert Dr: 4116-4230, 4510, 4513-4516
Bergundy Pl: 1300, 1340-1450
David Ln: 3400, 4918, 4928, 4825
Garfield Av: 5000, 5002, 5004, 5006, 8619-8627, 9104-9113
....
This data represents the boundary data for a local neighborhood (i.e., what houses are inside the community).
I want to make a PHP script that will take a user's input (in the form of something like "4918 David Lane" or "3000 Bergundy") search this list, and return a yes/no response whether that house exists within the boundaries.
What would be an efficient way to parse the input (regex?) and compare it to the text list?
Thanks for the help!
It's better to store this info in a database so that you don't have to parse out the data from a text file. Regexes are also not generally applicable to find a number in a range so a general purpose language is advised as well.
But... if you want to do it with regexes (and see why it's not a good idea)
To lookup the numbers for a street use
David Ln:(.*)
To then get the numbers use
[^,]*
You could simply import the file into a string. After this is done, breack each line of the file in an array so Array(Line 1=> array(), Line 2=> array(), etc. After this is done, you can explode using :. After, you'll simply need to search in the array. Not the fastest way, but it may be faster then regex.
You should sincerely consider using a database or re-think how your file are.
Try something like this, put your street names inside test.txt.. Now that you are able to get the details inside the text file, just compare it with the values that you submit in your form.
$filename = 'test.txt';
if(file_exists($filename)) {
if($handle = fopen($filename, 'r')) {
$name = array();
while(($file = fgets($handle)) !==FALSE) {
preg_match('#(.*):(.*)#', $file, $match);
$array = explode(',', $match[2]);
foreach($array as $val) {
$name[$match[1]][] = $val;
}
}
}
}
As mentioned, using a database to store street numbers that are relational to your street names would be ideal. I think a way you could implement this with your text file though is to create a a 2D array; storing the street names in the first array and the valid street numbers in their respective arrays.
Parse the file line by line in a loop. Parse the street name and store in array, then use a nested loop to parse all of the numbers (for ones in a range like 1414-1420, you can use an additional loop to get each number in the range) and build the next array in the initial street name array element. When you have your 2D array, you can do a simple nested loop to check it for a match.
I will try to make a little pseudo-code for you..
pseudocode:
$addresses = array();
$counter = 0;
$line = file->readline
while(!file->eof)
{
$addresses[$counter] = parse_street_name($line);
$numbers_array = parse_street_numbers($line);
foreach($numbers_array as $num)
$addresses[$counter][] = $num;
$line = file->readline
$counter++;
}
It's better if you store your streets in a separate table with IDs, and store numbers in separate table one row for each range or number and street id.
For example:
streets:
ID, street
-----------
1, Albert Dr
2, Bergundy Pl
3, David Ln
4, Garfield Av
...
houses:
street_id, house_min, house_max
-----------------
1, 4116, 4230
1, 4510, 4510
1, 4513, 4516
2, 1300, 1300
2, 1340, 1450
...
In the rows, where no range but one house number, you set both min and max to the same value.
You can write a script, that will parse your txt file and save all data to db. That should be as easy as several loops and explode() with different parameters and some insert queries too.
Then with first query you get street id
SELECT id FROM streets WHERE street LIKE '%[street name]%'
After that you run second query and get answer, is there such house number on that street
SELECT COUNT(*)
FROM houses
WHERE street_id = [street_id]
AND [house_num] BETWEEN house_min AND house_max
Inside [...] you put real values, dont forget to escape them to prevent sql injections...
Or you even can run just one query using JOIN.
Also you should make sure that your given house number is integer, not float.
I have an input file (exert from file shown below) with multiple lines that I need to select specific text from and put each selection into an array element:
exert from input file:
"BLOCK","PARTNO"
"ELEMENT","HEADER-"
"NAME","1AB000072186"
"REVISION","0000"
"PARTSHAPE","RECT_074_044_030"
"PACKAGE","120830E"
"PMABAR",""
"PARTCOMMENT","CAP-TANT*150uF*20%*10V7343*4.3mm"
"ELEMENT","PRTIDDT-"
"PMAPP",1
"PMADC",2
"ComponentQty",2
"BLOCK","PARTNO"
"ELEMENT","HEADER-"
"NAME","1AB030430005"
"REVISION","0000"
"PARTSHAPE","RECT_072_042_030"
"PACKAGE","120830E"
"PMABAR",""
"PARTCOMMENT","1.0000 Amp SUBMINIATURE FUSE"
"ELEMENT","PRTIDDT-"
"PMAPP",2
"PMADC",0
"ComponentQty",1
"BLOCK","PARTNO"
"ELEMENT","HEADER-"
"NAME","1AB030430001"
"REVISION","0000"
"PARTSHAPE","RECT_072_042_030"
"PACKAGE","120830E"
"PMABAR",""
"PARTCOMMENT","2.0000 Amp SUBMINIATURE FUSE"
"ELEMENT","PRTIDDT-"
"PMAPP",2
"PMADC",0
"ComponentQty",1
Notice that after each occurrence of the line with the phrase "ComponentQty" the content begins repeating...
Where I need the PartNumber that is next to the occurrence of "NAME" in one dimension of the array element and the content next to the occurrence of "PARTSHAPE" in the second dimension for each element. I am very confused on how to do this though...please help!!!
$fh = fopen('yourfile.txt', 'rb');
$found_stuff = array();
$last_component = null;
while($line = fgets($fh)) { // read a line
$parts = explode(',', $line); // split into components
switch($parts[0]) { // based on which key we're on
case '"NAME"':
$last_component = $parts[1]; // save the key's value
break;
case '"PARTSHAPE"':
$found_stuff[$last_component] = $parts[1]; // store the partshape name
break;
}
}
fclose($fh);
This should do the basic work. Read a line, explode it into pieces where commas occur. The first part will be the "key", the second part will be the value. Then simply keep reading until we either hit a NAME or a PARTSHAPE key, then store the values as appropriate.
Note that I've not stripped the double-quotes off the values. That's left as an exercise to the reader. This code also assumes that the file's format is regular and that a "NAME" will show up before any PARTSHAPE lines, and there'll be a perfect 1:1 alternation between NAME/PARTSHAPE lines. If you get two PARTSHAPES in a row, you'll lose the first one. And if a PARTSHAPE shows up before the first NAME is encounted, you'll sorta lose that one too.
The following steps worked for me:
The section pasted in my OP (repeating many times more) is defined as $PartNoContents
and $BlockData[] is the array that I need to paste selections from $PartNoContents into.
$PartNoContents = str_replace('"', '', $PartNoContents);
$PartLines = explode("\n", $PartNoContents);
$PartData = array();
foreach ($PartLines as $PartLine){
$PartData[] = explode(',', $PartLine);
}
for($p=0;$p<count($PartLines);$p++){
if ( isset( $PartData[$p][1] ) && !empty( $PartData[$p][1] ) ){
$p1 = str_replace(chr(13), '', $PartData[$p][1]);
if ( isset($BlockData[$b][0]) && !empty($BlockData[$b][0]) && $BlockData[$b][7]==$p1 ){
$BlockData[$b][13] = str_replace(chr(13), '', $PartData[$p+$PartDataIncNum][1]);
$p = count($PartLines) ;
}
}
}