So, i have two files, first is a text file, and the second is a encryption of the first file:
textfile:
cryptool (starting example for the cryptool version family 1.x)
cryptool is a comprehensive free educational program about
cryptography and cryptanalysis offering extensive online help and many
visualizations.
this is a text file, created in order to help you to make your first
steps with cryptool.
1) as a first step it is recommended you read the included online
help, this will provide a useful oversight of all available functions
within this application. the starting page of the online help can be
accessed via the menu "help -> starting page" at the top right of the
screen or using the search keyword "starting page" within the index of
the online help. press f1 to start the online help everywhere in
cryptool.
2) a possible next step would be to encrypt a file with the caesar
algorithm. this can be done via the menu "crypt/decrypt -> symmetric
(classic)".
3) there are several examples (tutorials) provided within the online
help which provide an easy way to gain an understanding of cryptology.
these examples can be found via the menu "help -> scenarios
(tutorials)".
4) you can also develop your knowledge by:
- navigating through the menus. you can press f1 at any selected menu item to get further information.
- reading the included readme file (see the menu "help -> readme").
- viewing the included colorful presentation (this presentation can be found on several ways: e.g. in the "help" menu of this application, or
via the "documentation" section found at the "starting" page of the
online help).
- viewing the webpage www.cryptool.org.
july 2010 the cryptool team
encrypted file:
ncjaezzw (delcetyr pilxawp qzc esp ncjaezzw gpcdtzy qlxtwj 1.i)
ncjaezzw td l nzxacpspydtgp qcpp pofnletzylw aczrclx lmzfe
ncjaezrclasj lyo ncjaelylwjdtd zqqpctyr piepydtgp zywtyp spwa lyo xlyj
gtdflwtkletzyd.
estd td l epie qtwp, ncplepo ty zcopc ez spwa jzf ez xlvp jzfc qtcde
depad htes ncjaezzw.
1) ld l qtcde depa te td cpnzxxpyopo jzf cplo esp tynwfopo zywtyp
spwa, estd htww aczgtop l fdpqfw zgpcdtrse zq lww lgltwlmwp qfynetzyd
htesty estd laawtnletzy. esp delcetyr alrp zq esp zywtyp spwa nly mp
lnnpddpo gtl esp xpyf "spwa -> delcetyr alrp" le esp eza ctrse zq esp
dncppy zc fdtyr esp dplcns vpjhzco "delcetyr alrp" htesty esp tyopi zq
esp zywtyp spwa. acpdd q1 ez delce esp zywtyp spwa pgpcjhspcp ty
ncjaezzw.
2) l azddtmwp ypie depa hzfwo mp ez pyncjae l qtwp htes esp nlpdlc
lwrzctesx. estd nly mp ozyp gtl esp xpyf "ncjae/opncjae -> djxxpectn
(nwlddtn)".
3) espcp lcp dpgpclw pilxawpd (efezctlwd) aczgtopo htesty esp zywtyp
spwa hstns aczgtop ly pldj hlj ez rlty ly fyopcdelyotyr zq ncjaezwzrj.
espdp pilxawpd nly mp qzfyo gtl esp xpyf "spwa -> dnpylctzd
(efezctlwd)".
4) jzf nly lwdz opgpwza jzfc vyzhwporp mj:
- ylgtrletyr esczfrs esp xpyfd. jzf nly acpdd q1 le lyj dpwpnepo xpyf tepx ez rpe qfcespc tyqzcxletzy.
- cplotyr esp tynwfopo cploxp qtwp (dpp esp xpyf "spwa -> cploxp").
- gtphtyr esp tynwfopo nzwzcqfw acpdpyeletzy (estd acpdpyeletzy nly mp qzfyo zy dpgpclw hljd: p.r. ty esp "spwa" xpyf zq estd laawtnletzy, zc
gtl esp "oznfxpyeletzy" dpnetzy qzfyo le esp "delcetyr" alrp zq esp
zywtyp spwa).
- gtphtyr esp hpmalrp hhh.ncjaezzw.zcr.
ufwj 2010 esp ncjaezzw eplx
Im counting letter ocurrences in both files, creating a dictionary, so i can go back into the encrypted file and change most of the letters to the right ones, some wont be changed but i'll do it manually later.
Problem is, i think the fact that some letters have the same number of ocurrences, its changing the same letter more than one time.
Heres my code so far, the problem is surelly in the foreach loops but im not managing to fix it. Maybe i can use flags but i have no idea how to do this in a foreach cycle.
//gets string from both text files
$reference = file_get_contents('reference_file.txt', true);
$encrypted = file_get_contents('encrypted_file.txt', true);
//Uses regex to take away all the characters wich are not letters
$azreference = preg_replace("/[^a-z]+/", "", $reference);
$azencrypted = preg_replace("/[^a-z]+/", "", $encrypted);
//Counts number of letter ocurrences and makes a string: "Char => Ocurrences"
$refarray1 = array_count_values(str_split($azreference, '1'));
$refarray2 = array_count_values(str_split($azencrypted, '1'));
foreach ($refarray1 as $key => $val) {
foreach ($refarray2 as $key2 => $val2) {
if ($val == $val2){
$encrypted = str_replace($key2, $key, $encrypted); // (replaces $key2 for $key)
}
}
}
print_r($encrypted);
The output string is, wich is kinda wrong xD:
jjdebdda (wbdjbbdj ebdbeae zdj bwe jjdebdda jejwbdd zdbbad 1.b)
jjdebdda bw d jdbejewedwbje zjee edzjdbbddda ejdjjdb dbdzb
jjdebdjjdewd ddd jjdebdddadwbw dzzejbdj ebbedwbje ddabde weae ddd bddd
jbwzdabzdbbddw. bwbw bw d bebb zbae, jjedbed bd djdej bd weae ddz bd
bdje ddzj zbjwb wbeew wbbw jjdebdda. 1) dw d zbjwb wbee bb bw
jejdbbedded ddz jedd bwe bdjazded ddabde weae, bwbw wbaa ejdjbde d
zwezza djejwbjwb dz daa djdbadbae zzdjbbddw wbbwbd bwbw deeabjdbbdd.
bwe wbdjbbdj edje dz bwe ddabde weae jdd be djjewwed jbd bwe bedz
"weae -> wbdjbbdj edje" db bwe bde jbjwb dz bwe wjjeed dj zwbdj bwe
wedjjw jedwdjd "wbdjbbdj edje" wbbwbd bwe bddeb dz bwe ddabde weae.
ejeww z1 bd wbdjb bwe ddabde weae ejejdwweje bd jjdebdda. 2) d
edwwbbae debb wbee wdzad be bd edjjdeb d zbae wbbw bwe jdewdj
dajdjbbwb. bwbw jdd be ddde jbd bwe bedz "jjdeb/dejjdeb -> wdbbebjbj
(jadwwbj)". 3) bweje dje wejejda ebdbeaew (bzbdjbdaw) ejdjbded wbbwbd
bwe ddabde weae wwbjw ejdjbde dd edwd wdd bd jdbd dd zddejwbdddbdj dz
jjdebdadjd. bwewe ebdbeaew jdd be zdzdd jbd bwe bedz "weae ->
wjeddjbdw (bzbdjbdaw)". 4) ddz jdd dawd dejeade ddzj jddwaedje bd: -
ddjbjdbbdj bwjdzjw bwe bedzw. ddz jdd ejeww z1 db ddd weaejbed bedz
bbeb bd jeb zzjbwej bdzdjbdbbdd. - jeddbdj bwe bdjazded jeddbe zbae
(wee bwe bedz "weae -> jeddbe"). - jbewbdj bwe bdjazded jdadjzza
ejewedbdbbdd (bwbw ejewedbdbbdd jdd be zdzdd dd wejejda wddw: e.j. bd
bwe "weae" bedz dz bwbw deeabjdbbdd, dj jbd bwe "ddjzbedbdbbdd"
wejbbdd zdzdd db bwe "wbdjbbdj" edje dz bwe ddabde weae). - jbewbdj
bwe webedje www.jjdebdda.djj. zzad 2010 bwe jjdebdda bedb
some wont be changed but i'll do it manually later.
So, if you are ready to fix smth later manually, and in order to avoid the problem of re-replacing (meaning replace all the vocabulary in "one hop") you can use the php function strtr (http://php.net/manual/en/function.strtr.php) and change your code just a bit, like the following:
//gets string from both text files
$reference = file_get_contents('reference_file.txt', true);
$encrypted = file_get_contents('encrypted_file.txt', true);
//Uses regex to take away all the characters wich are not letters
$azreference = preg_replace("/[^a-z]+/", "", $reference);
$azencrypted = preg_replace("/[^a-z]+/", "", $encrypted);
//Counts number of letter ocurrences and makes a string: "Char => Ocurrences"
$refarray1 = array_count_values(str_split($azreference, '1'));
$refarray2 = array_count_values(str_split($azencrypted, '1'));
$replacement = array();
foreach ($refarray1 as $key => $val) {
foreach ($refarray2 as $key2 => $val2) {
if ($val == $val2){
$replacement[$key2] = $key;
}
}
}
$encrypted = strtr($encrypted, $replacement);
print_r($encrypted);
The output will be:
cryptnnl (stnrting exnmple fnr the cryptnnl versinn fnmily 1.x)
cryptnnl is n cnmprehensive free educntinnnl prngrnm nbnut cryptngrnphy nnd cryptnnnlysis nffering extensive nnline help nnd mnny visunlijntinns.
this is n text file, crented in nrder tn help ynu tn mnke ynur first steps with cryptnnl.
1) ns n first step it is recnmmended ynu rend the included nnline help, this will prnvide n useful nversight nf nll nvnilnble functinns within this npplicntinn. the stnrting pnge nf the nnline help cnn be nccessed vin the menu "help -> stnrting pnge" nt the tnp right nf the screen nr using the senrch keywnrd "stnrting pnge" within the index nf the nnline help. press f1 tn stnrt the nnline help everywhere in cryptnnl.
2) n pnssible next step wnuld be tn encrypt n file with the cnesnr nlgnrithm. this cnn be dnne vin the menu "crypt/decrypt -> symmetric (clnssic)".
3) there nre severnl exnmples (tutnrinls) prnvided within the nnline help which prnvide nn ensy wny tn gnin nn understnnding nf cryptnlngy. these exnmples cnn be fnund vin the menu "help -> scennrins (tutnrinls)".
4) ynu cnn nlsn develnp ynur knnwledge by: - nnvignting thrnugh the menus. ynu cnn press f1 nt nny selected menu item tn get further infnrmntinn. - rending the included rendme file (see the menu "help -> rendme"). - viewing the included cnlnrful presentntinn (this presentntinn cnn be fnund nn severnl wnys: e.g. in the "help" menu nf this npplicntinn, nr vin the "dncumentntinn" sectinn fnund nt the "stnrting" pnge nf the nnline help). - viewing the webpnge www.cryptnnl.nrg.
july 2010 the cryptnnl tenmi
which is a bit better than "jjdebdda" :) , but, as you expected, still has some collisions.
Related
I'm working on some PHP code that would grab a music playlist from a remote radio page - which means it is continuously updated.
I would like to store the tracks history in my database.
My problem is that I need to detect when new entries have been added to the remote tracklist, knowing that :
I don't know how often the remote page will be updated
I don't know how many tracks are displayed on the remote page. Sometimes it will be a single track, sometimes it will be a few dozen.
A same track could show up several times.
For example, I will get this data when grabbing the page for the first time :
Dead Combo — Esse Olhar Que Era Só Teu
Myron & E — If I Gave You My Love
Hooverphonic — Badaboum
Alain Chamfort — Bambou - Pilooski / Jayvich Reprise
William Onyeabor — Atomic Bomb
Curtis Mayfield — Move on up - Extended version
Mos Def — Ms. Fat Booty
Nicki Minaj — Feeling Myself
Disclosure — You & Me (Flume remix)
Otis Redding — My Girl - Remastered Mono
Then on the second time I'll get :
Charles Aznavour — Emmenez moi
Mos Def — Ms. Fat Booty
Rag'n'Bone Man — Human
Bernard Lavilliers — Idées noires
Julien Clerc — Ma préférence
The Rolling Stones — Just Your Fool
Dead Combo — Esse Olhar Que Era Só Teu
Myron & E — If I Gave You My Love
Hooverphonic — Badaboum
Alain Chamfort — Bambou - Pilooski / Jayvich Reprise
As you can see, the second time, I got entries 7->10 that seems to be the same than the first time (so entries 1->6 are the new ones); and track #2 was already played in the first list but seems to have been replayed since.
The new entries here would be :
Charles Aznavour — Emmenez moi
Mos Def — Ms. Fat Booty
Rag'n'Bone Man — Human
Bernard Lavilliers — Idées noires
Julien Clerc — Ma préférence
The Rolling Stones — Just Your Fool
I store tracks entries in a table, and tracks history in another one.
Structure of the tracks table
| ID | artist | title | album |
--------------------------------------------------
| 12 | Mos Def | Ms. Fat Booty | |
Structure of the tracks history table
| ID | track ID | time |
------------------------------------------
| 24 | 12 | 2016-07-03 13:40:26 |
Have you got any ideas on how I could handle this ?
Thanks !
I think you're trying to find the items at the end of the second list that match those at beginning of the first?
If you can store both lists in an array (the old list in $previous and the new list in $current), this function should help:
function find_old_tracks($previous, $current)
{
for ($i = 0; $i < count($current); $i++)
{
if ($previous[$i] == $current[$i]) continue;
return find_old_tracks($previous, array_slice($current, $i + 1));
}
return array_slice($previous, 0, $i);
}
It scans through $current for contiguous matches to $previous, recursing on the remainder every time it finds a missmatch. When I run this:
$previous = array(
'Dead Combo — Esse Olhar Que Era Só Teu',
'Myron & E — If I Gave You My Love',
'Hooverphonic — Badaboum',
'Alain Chamfort — Bambou - Pilooski / Jayvich Reprise',
'William Onyeabor — Atomic Bomb',
'Curtis Mayfield — Move on up - Extended version',
'Mos Def — Ms. Fat Booty',
'Nicki Minaj — Feeling Myself',
'Disclosure — You & Me (Flume remix)',
'Otis Redding — My Girl - Remastered Mono'
);
$current = array(
'Charles Aznavour — Emmenez moi',
'Mos Def — Ms. Fat Booty',
'Rag Bone Man — Human',
'Bernard Lavilliers — Idées noires',
'Julien Clerc — Ma préférence',
'The Rolling Stones — Just Your Fool',
'Dead Combo — Esse Olhar Que Era Só Teu',
'Myron & E — If I Gave You My Love',
'Hooverphonic — Badaboum',
'Alain Chamfort — Bambou - Pilooski / Jayvich Reprise'
);
$old_tracks = find_old_tracks($previous, $current);
$new_tracks = array_slice($current, 0, count($current) - count($old_tracks));
print "NEW TRACKS: " . implode($new_tracks, '; ');
print "<br /><br />OLD TRACKS: " . implode($old_tracks, '; ');
my output is:
NEW TRACKS: Charles Aznavour — Emmenez moi; Mos Def — Ms. Fat Booty;
Rag Bone Man — Human; Bernard Lavilliers — Idées noires; Julien Clerc
— Ma préférence; The Rolling Stones — Just Your Fool
OLD TRACKS: Dead Combo — Esse Olhar Que Era Só Teu; Myron & E — If I
Gave You My Love; Hooverphonic — Badaboum; Alain Chamfort — Bambou -
Pilooski / Jayvich Reprise
You can do what you like with that info on the database end.
I need to extract data from a HTML file. In Microsoft Word I have some data that could be easily converted into HTML; I need to extract that data and insert it into an SQL table.
Record n.1354 - acidi_nucleici
Gli RNA sono diversi dal DNA perché
V - contengono uracile e ribosio
F - contengono uracile e timina
F - contengono uracile e desossiribosio
F - contengono ribosio e timidina
F - contengono ribosio e desossiribosio
Record n.1417 - acidi_nucleici
Il DNA circolare si trova
V - nei mitocondri
F - nei nucleosomi
V - nei batteri
F - nel nucleolo
F - nel Golgi
Record n.1418 - acidi_nucleici
Il DNA nelle cellule si trova
V - nel nucleo
F - nei centri organizzatori microtubulari
V - nei mitocondri
F - nei poliribosomi
F - nel citoplasma
I need to create a function that:
recognizes whether the line is an option or a question (i.e.
if before the line there is "V -" or "F -" it's an option; if there
is "Record n.*" it's the question);
if the line is an option, recognizes whether it is false ("F -") or true ("V -").
I thought of building the SQL table this way:
Column 1: id
Column 2: text
Column 3: question (0 = it's an answer; 1 = it's a question)
Column 4: relate_to (if it is an answer, relate the answer to the question ID)
Column 5: true_false (if it is an answer, is it true or false?)
The main problem is: I don't even know where to start! (except from using file_get_contents function, maybe).
How can I calculate the closest colourblind-friendly colour from a HEX colour code like #0a87af or from the three RGB values (0-255).
I'm searching for an efficient way to calculate or do this so I can implement it in PHP or Python and the algorithm can be used for better website accessibility for colourblind people.
As the others mentionned in their comments/answer, the contrast between two colours will be of importance.
The W3 already created a method defining a minimum contrast between colours in order to pass dfferent levels of accessibility.
They provide the description here and the formula to calculate it is on the same page, at the bottom, here :
contrast ratio = (L1 + 0.05) / (L2 + 0.05)
For this apparently simple formula, you will need to calculate the relative luminance noted L1 and L2 of both colours using an other formula that you find here :
L = 0.2126 * R + 0.7152 * G + 0.0722 * B where R, G and B are defined as:
if RsRGB <= 0.03928 then R = RsRGB/12.92 else R = ((RsRGB+0.055)/1.055) ^ 2.4
if GsRGB <= 0.03928 then G = GsRGB/12.92 else G = ((GsRGB+0.055)/1.055) ^ 2.4
if BsRGB <= 0.03928 then B = BsRGB/12.92 else B = ((BsRGB+0.055)/1.055) ^ 2.4
and RsRGB, GsRGB, and BsRGB are defined as:
RsRGB = R8bit/255
GsRGB = G8bit/255
BsRGB = B8bit/255
The minimum contrast ratio between text and background should be of 4.5:1 for level AA and 7:1 for level AAA. This still leaves room for creation of nice designs.
There is an example of implementation in JS by Lea Verou.
This won't give you the closest color as you asked, because on a unique background there will more than one front colour giving the same contrast result, but it's a standard way of calculating contrasts.
A single color is not a problem for color-blind users (unless you want to transport a very specific meaning of that color tone); the difference between colors is.
Given two or more colors, you can convert them to HLS using colorsys and check whether the difference in Lightness is sufficient. If the difference is too small, increase it, like this:
import colorsys
import re
def rgb2hex(r, g, b):
return '#%02x%02x%02x' % (r, g, b)
def hex2rgb(hex_str):
m = re.match(
r'^\#?([0-9a-fA-F]{2})([0-9a-fA-F]{2})([0-9a-fA-F]{2})$', hex_str)
assert m
return (int(m.group(1), 16), int(m.group(2), 16), int(m.group(3), 16))
def distinguish_hex(hex1, hex2, mindiff=50):
"""
Make sure two colors (specified as hex codes) are sufficiently different.
Returns the two colors (possibly changed). mindiff is the minimal
difference in lightness.
"""
rgb1 = hex2rgb(hex1)
rgb2 = hex2rgb(hex2)
hls1 = colorsys.rgb_to_hls(*rgb1)
hls2 = colorsys.rgb_to_hls(*rgb2)
l1 = hls1[1]
l2 = hls2[1]
if abs(l1 - l2) >= mindiff: # ok already
return (hex1, hex2)
restdiff = abs(l1 - l2) - mindiff
if l1 >= l2:
l1 = min(255, l1 + restdiff / 2)
l2 = max(0, l1 - mindiff)
l1 = min(255, l2 + mindiff)
else:
l2 = min(255, l2 + restdiff / 2)
l1 = max(0, l2 - mindiff)
l2 = min(255, l1 + mindiff)
hsl1 = (hls1[0], l1, hls1[2])
hsl2 = (hls2[0], l2, hls2[2])
rgb1 = colorsys.hls_to_rgb(*hsl1)
rgb2 = colorsys.hls_to_rgb(*hsl2)
return (rgb2hex(*rgb1), rgb2hex(*rgb2))
print(distinguish_hex('#ff0000', '#0000ff'))
Contrast-Finder is an open source online tool (written by Open-S and M. Faure) that, given foreground and background colors, will calculate the contrast ratio and if it's insufficient according to WCAG formula, will give you a bunch of background OR foreground colors with sufficient contrast ratio and thus options, using different algorithms (you must tell it if you want to keep the foreground color or the background color and if you want contrast ratio higher than 4.5:1 or 3:1 - level AA - or 7:1 / 4.5:1 - level AAA).
It's pretty spot on for many couples of colors.
Sources - in Java - are on GitHub.
Note: as already written in other answers, colourblind people ("people with colour deficiencies") are only a part of people concerned by the choice of colors: partially sighted people also are. And when a webdesigner chooses #AAA on #FFF, it's a problem for many people without any loss of sight or colour perception; they just've a shiny Retina® screen in non-optimal light conditions... :p
I have one PHP script that have to search the information in one result shell script, this shell script make one connection ssh, get the route table and save this in one .txt file, but if i try to read the file or get the information direct from the script and make my search with preg_match_all, the result is empty, but i put the result direct in my file php, the code work fine, so i'm lost with this problem, my php code is:
$resultsCK = array();
// ([0]) [1] [2] [3] [4] [5] [6] [7] ([8])
$searchTextG = "/(S|R|B|O|A|K|H|P|U|i) +(IA|E|N|) +([0-9.]+)\/([0-9]+) +via +([0-9.]+), +([a-zA-Z0-9.]+|), +cost +(?:[0-9]+:|)([0-9]+), +age +[0-9]+ +\n((?: +via +[0-9.]+, +(?:[a-zA-Z0-9.]+|) +\n)*)/";
$searchTextC = "/(C) +([0-9.]+)\/([0-9]+) +is directly connected, +([a-zA-Z0-9.]+) +\n/";
foreach ($ciscoCk as $ipCk) {
shell_exec('./tmp/routeCk.sh ' . $ipCk . ' 22 commandeCk > /tmp/resultRouteCk.txt');
$txt= file_get_contents('/tmp/resultRouteCk.txt');
$matches = [];
preg_match_all($searchTextG, $txt, $matches, PREG_SET_ORDER);
foreach ($matches as $id => $match) {
unset($matches[$id][0]);
if (isset($match[8])) {
preg_match_all($searchSubTextG, $match[8], $subpatternMatches, PREG_SET_ORDER);
unset($matches[$id][8]);
foreach ($subpatternMatches as $spmid => $spm) {
unset($subpatternMatches[$spmid][0]);
$matches[$id][8][] = $subpatternMatches[$spmid];
}
}
}
//g of general
$resultsCK[$ipCk]["g"] = $matches;
$matches = [];
preg_match_all($searchTextC, $txt, $matches, PREG_SET_ORDER);
foreach ($matches as $id => $match) {
unset($matches[$id][0]);
}
$resultsCK[$ipCk]["c"] = $matches;
}
var_dump($resultsCK);
So i already tryed:
shell_exec('./tmp/routeCk.sh ' . $ipCk . ' 22 commandeCk > /tmp/resultRouteCk.txt');
$txt= file_get_contents('/tmp/resultRouteCk.txt');
This to :
$txt=('./tmp/routeCk.sh ' . $ipCk . ' 22 commandeCk');
And doesnt work, but if i put
$txt="
Codes: C - Connected, S - Static, R - RIP, B - BGP,
O - OSPF IntraArea (IA - InterArea, E - External, N - NSSA)
A - Aggregate, K - Kernel Remnant, H - Hidden, P - Suppressed,
U - Unreachable, i - Inactive
O E 0.0.0.0/0 via 10.140, bond1.30, cost 1:10, age 5
via 10.141, bond1.31
via 10.142, bond1.32
O E 10.112/23 via 10.140, bond1.30, cost 46:1, age 2511
O E 10.112/23 via 10.140, bond1.30, cost 46:1, age 2511
O IA 10.138/29 via 10.140, bond1.30, cost 46, age 1029440
C 10.141/29 is directly connected, bond2.35
C 10.141/29 is directly connected, bond2.35
";
The script will work, and this is the same information from the file, so how i can fix this? can be some charset problem?
Writing this, i test change one line form the resultRouteCk.txt and test PHP with the $txt=('./tmp/routeCk.sh ' . $ipCk . ' 22 commandeCk'); and is working, so it seems to be one problem between the file or ouput made by linux and the string in php, but how i can fix this?
I fix the problem, because it was some special character, at the end i have the code:
$txt= shell_exec('./routeCk.sh ' . $ipCk . ' 22 commandeCk | dos2unix ');
This will put all the output in unix type, but is not logic, working with linux, have the ouptut with dos character
Good morning -
I'm interested in seeing an efficient way of parsing the values of an heirarchical text file (i.e., one that has a Title => Multiple Headings => Multiple Subheadings => Multiple Keys => Multiple Values) into a simple XML document. For the sake of simplicity, the answer would be written using:
Regex (preferrably in PHP)
or, PHP code (e.g., if looping were more efficient)
Here's an example of an Inventory file I'm working with. Note that Header = FOODS, Sub-Header = Type (A, B...), Keys = PRODUCT (or CODE, etc.) and Values may have one more more lines.
**FOODS - TYPE A**
___________________________________
**PRODUCT**
1) Mi Pueblito Queso Fresco Authentic Mexican Style Fresh Cheese;
2) La Fe String Cheese
**CODE**
Sell by date going back to February 1, 2009
**MANUFACTURER**
Quesos Mi Pueblito, LLC, Passaic, NJ.
**VOLUME OF UNITS**
11,000 boxes
**DISTRIBUTION**
NJ, NY, DE, MD, CT, VA
___________________________________
**PRODUCT**
1) Peanut Brittle No Sugar Added;
2) Peanut Brittle Small Grind;
3) Homestyle Peanut Brittle Nuggets/Coconut Oil Coating
**CODE**
1) Lots 7109 - 8350 inclusive;
2) Lots 8198 - 8330 inclusive;
3) Lots 7075 - 9012 inclusive;
4) Lots 7100 - 8057 inclusive;
5) Lots 7152 - 8364 inclusive
**MANUFACTURER**
Star Kay White, Inc., Congers, NY.
**VOLUME OF UNITS**
5,749 units
**DISTRIBUTION**
NY, NJ, MA, PA, OH, FL, TX, UT, CA, IA, NV, MO and IN
**FOODS - TYPE B**
___________________________________
**PRODUCT**
Cool River Bebidas Naturales - West Indian Cherry Fruit Acerola 16% Juice;
**CODE**
990-10/2 10/5
**MANUFACTURER**
San Mar Manufacturing Corp., Catano, PR.
**VOLUME OF UNITS**
384
**DISTRIBUTION**
PR
And here's the desired output (please excuse any XML syntactical errors):
<foods>
<food type = "A" >
<product>Mi Pueblito Queso Fresco Authentic Mexican Style Fresh Cheese</product>
<product>La Fe String Cheese</product>
<code>Sell by date going back to February 1, 2009</code>
<manufacturer>Quesos Mi Pueblito, LLC, Passaic, NJ.</manufacturer>
<volume>11,000 boxes</volume>
<distibution>NJ, NY, DE, MD, CT, VA</distribution>
</food>
<food type = "A" >
<product>Peanut Brittle No Sugar Added</product>
<product>Peanut Brittle Small Grind</product>
<product>Homestyle Peanut Brittle Nuggets/Coconut Oil Coating</product>
<code>Lots 7109 - 8350 inclusive</code>
<code>Lots 8198 - 8330 inclusive</code>
<code>Lots 7075 - 9012 inclusive</code>
<code>Lots 7100 - 8057 inclusive</code>
<code>Lots 7152 - 8364 inclusive</code>
<manufacturer>Star Kay White, Inc., Congers, NY.</manufacturer>
<volume>5,749 units</volume>
<distibution>NY, NJ, MA, PA, OH, FL, TX, UT, CA, IA, NV, MO and IN</distribution>
</food>
<food type = "B" >
<product>Cool River Bebidas Naturales - West Indian Cherry Fruit Acerola 16% Juice</product>
<code>990-10/2 10/5</code>
<manufacturer>San Mar Manufacturing Corp., Catano, PR</manufacturer>
<volume>384</volume>
<distibution>PR</distribution>
</food>
</FOODS>
<!-- and so forth -->
So far, my approach (which might be quite inefficient with a huge text file) would be one of the following:
Loops and multiple Select/Case statements, where the file is loaded into a string buffer, and while looping through each line, see if it matches one of the header/subheader/key lines, append the appropriate xml tag to a xml string variable, and then add the child nodes to the xml based on IF statements regarding which key name is most recent (which seems time-consuming and error-prone, esp. if the text changes even slightly) -- OR
Use REGEX (Regular Expressions) to find and replace key fields with appropriate xml tags, clean it up with an xml library, and export the xml file. Problem is, I barely use regular expressions, so I'd need some example-based help.
Any help or advice would be appreciated.
Thanks.
An example you can use as a starting point. At least I hope it gives you an idea...
<?php
define('TYPE_HEADER', 1);
define('TYPE_KEY', 2);
define('TYPE_DELIMETER', 3);
define('TYPE_VALUE', 4);
$datafile = 'data.txt';
$fp = fopen($datafile, 'rb') or die('!fopen');
// stores (the first) {header} in 'name' and the root simplexmlelement in 'element'
$container = array('name'=>null, 'element'=>null);
// stores the name for each item element, the value for the type attribute for subsequent item elements and the simplexmlelement of the current item element
$item = array('name'=>null, 'type'=>null, 'current_element'=>null);
// the last **key** encountered, used to create new child elements in the current item element when a value is encountered
$key = null;
while ( false!==($t=getstruct($fp)) ) {
switch( $t[0] ) {
case TYPE_HEADER:
if ( is_null($container['element']) ) {
// this is the first time we hit **header - subheader**
$container['name'] = $t[1][0];
// ugly hack, < . name . />
$container['element'] = new SimpleXMLElement('<'.$container['name'].'/>');
// each subsequent new item gets the new subheader as type attribute
$item['type'] = $t[1][1];
// dummy implementation: "deducting" the item names from header/container[name]
$item['name'] = substr($t[1][0], 0, -1);
}
else {
// hitting **header - subheader** the (second, third, nth) time
/*
header must be the same as the first time (stored in container['name']).
Otherwise you need another container element since
xml documents can only have one root element
*/
if ( $container['name'] !== $t[1][0] ) {
echo $container['name'], "!==", $t[1][0], "\n";
die('format error');
}
else {
// subheader may have changed, store it for future item elements
$item['type'] = $t[1][1];
}
}
break;
case TYPE_DELIMETER:
assert( !is_null($container['element']) );
assert( !is_null($item['name']) );
assert( !is_null($item['type']) );
/* that's maybe not a wise choice.
You might want to check the complete item before appending it to the document.
But the example is a hack anyway ...so create a new item element and append it to the container right away
*/
$item['current_element'] = $container['element']->addChild($item['name']);
// set the type-attribute according to the last **header - subheader** encountered
$item['current_element']['type'] = $item['type'];
break;
case TYPE_KEY:
$key = $t[1][0];
break;
case TYPE_VALUE:
assert( !is_null($item['current_element']) );
assert( !is_null($key) );
// this is a value belonging to the "last" key encountered
// create a new "key" element with the value as content
// and addit to the current item element
$tmp = $item['current_element']->addChild($key, $t[1][0]);
break;
default:
die('unknown token');
}
}
if ( !is_null($container['element']) ) {
$doc = dom_import_simplexml($container['element']);
$doc = $doc->ownerDocument;
$doc->formatOutput = true;
echo $doc->saveXML();
}
die;
/*
Take a look at gettoken() at http://www.tuxradar.com/practicalphp/21/5/6
It breaks the stream into much simpler pieces.
In the next step the parser would "combine" or structure the simple tokens into more complex things.
This function does both....
#return array(id, array(parameter)
*/
function getstruct($fp) {
if ( feof($fp) ) {
return false;
}
// shortcut: all we care about "happens" on one line
// so let php read one line in a single step and then do the pattern matching
$line = trim(fgets($fp));
// this matches **key** and **header - subheader**
if ( preg_match('#^\*\*([^-]+)(?:-(.*))?\*\*$#', $line, $m) ) {
// only for **header - subheader** $m[2] is set.
if ( isset($m[2]) ) {
return array(TYPE_HEADER, array(trim($m[1]), trim($m[2])));
}
else {
return array(TYPE_KEY, array($m[1]));
}
}
// this matches _____________ and means "new item"
else if ( preg_match('#^_+$#', $line, $m) ) {
return array(TYPE_DELIMETER, array());
}
// any other non-empty line is a single value
else if ( preg_match('#\S#', $line) ) {
// you might want to filter the 1),2),3) part out here
// could also be two diffrent token types
return array(TYPE_VALUE, array($line));
}
else {
// skip empty lines, would be nicer with tail-recursion...
return getstruct($fp);
}
}
prints
<?xml version="1.0"?>
<FOODS>
<FOOD type="TYPE A">
<PRODUCT>1) Mi Pueblito Queso Fresco Authentic Mexican Style Fresh Cheese;</PRODUCT>
<PRODUCT>2) La Fe String Cheese</PRODUCT>
<CODE>Sell by date going back to February 1, 2009</CODE>
<MANUFACTURER>Quesos Mi Pueblito, LLC, Passaic, NJ.</MANUFACTURER>
<VOLUME OF UNITS>11,000 boxes</VOLUME OF UNITS>
<DISTRIBUTION>NJ, NY, DE, MD, CT, VA</DISTRIBUTION>
</FOOD>
<FOOD type="TYPE A">
<PRODUCT>1) Peanut Brittle No Sugar Added;</PRODUCT>
<PRODUCT>2) Peanut Brittle Small Grind;</PRODUCT>
<PRODUCT>3) Homestyle Peanut Brittle Nuggets/Coconut Oil Coating</PRODUCT>
<CODE>1) Lots 7109 - 8350 inclusive;</CODE>
<CODE>2) Lots 8198 - 8330 inclusive;</CODE>
<CODE>3) Lots 7075 - 9012 inclusive;</CODE>
<CODE>4) Lots 7100 - 8057 inclusive;</CODE>
<CODE>5) Lots 7152 - 8364 inclusive</CODE>
<MANUFACTURER>Star Kay White, Inc., Congers, NY.</MANUFACTURER>
<VOLUME OF UNITS>5,749 units</VOLUME OF UNITS>
<DISTRIBUTION>NY, NJ, MA, PA, OH, FL, TX, UT, CA, IA, NV, MO and IN</DISTRIBUTION>
</FOOD>
<FOOD type="TYPE B">
<PRODUCT>Cool River Bebidas Naturales - West Indian Cherry Fruit Acerola 16% Juice;</PRODUCT>
<CODE>990-10/2 10/5</CODE>
<MANUFACTURER>San Mar Manufacturing Corp., Catano, PR.</MANUFACTURER>
<VOLUME OF UNITS>384</VOLUME OF UNITS>
<DISTRIBUTION>PR</DISTRIBUTION>
</FOOD>
</FOODS>
Unfortunately the status of the php module for ANTLR currently is "Runtime is in alpha status." but it might be worth a try anyway...
See: http://www.tuxradar.com/practicalphp/21/5/6
This tells you how to parse a text file into tokens using PHP. Once parsed you can place it into anything you want.
You need to search for specific tokens in the file based on your criteria:
for example:
PRODUCT
This gives you the XML Tag
Then 1) can have special meaning
1) Peanut Brittle...
This tells you what to put in the XML tag.
I do not know if this is the most efficient way to accomplish your task but it is the way a compiler would parse a file and has the potential to make very accurate.
Instead of Regex or PHP use the XSLT 2.0 unparsed-text() function to read the file (see http://www.biglist.com/lists/xsl-list/archives/200508/msg00085.html)
Another Hint for an XSLT 1.0 Solution is here: http://bytes.com/topic/net/answers/808619-read-plain-file-xslt-1-0-a