Extract data from HTML file converted from Word

Extract data from HTML file converted from Word - php

I need to extract data from a HTML file. In Microsoft Word I have some data that could be easily converted into HTML; I need to extract that data and insert it into an SQL table.
Record n.1354 - acidi_nucleici
Gli RNA sono diversi dal DNA perché
V - contengono uracile e ribosio
F - contengono uracile e timina
F - contengono uracile e desossiribosio
F - contengono ribosio e timidina
F - contengono ribosio e desossiribosio
Record n.1417 - acidi_nucleici
Il DNA circolare si trova
V - nei mitocondri
F - nei nucleosomi
V - nei batteri
F - nel nucleolo
F - nel Golgi
Record n.1418 - acidi_nucleici
Il DNA nelle cellule si trova
V - nel nucleo
F - nei centri organizzatori microtubulari
V - nei mitocondri
F - nei poliribosomi
F - nel citoplasma
I need to create a function that:
recognizes whether the line is an option or a question (i.e.
if before the line there is "V -" or "F -" it's an option; if there
is "Record n.*" it's the question);
if the line is an option, recognizes whether it is false ("F -") or true ("V -").
I thought of building the SQL table this way:
Column 1: id
Column 2: text
Column 3: question (0 = it's an answer; 1 = it's a question)
Column 4: relate_to (if it is an answer, relate the answer to the question ID)
Column 5: true_false (if it is an answer, is it true or false?)
The main problem is: I don't even know where to start! (except from using file_get_contents function, maybe).

Related

Script to replace comma "," by "->" as multiple value separator on category field of csv (only on this filed of csv)

I need to to replace comma "," by "->" as multiple value separator on category field of csv, on a php script.
In the attached example csv piece, the field value on first row is
;ALIMENTACIÓN,GRANEL,Cereales legumbres y frutos secos,Desayuno y entre horas,Varios;
I neet to be replaced to:
;ALIMENTACIÓN->GRANEL->Cereales legumbres y frutos secos->Desayuno y entre horas->Varios;
I tried this code on my php script:
file_put_contents("result.csv",str_replace(",","->",file_get_contents("origin.csv")));
And it works, but it replace comma on all fields. but i need to change only on this Catefory field. It is, i need do no replace commas on description field, or other fields.
Thank you, in advance
Piece of my csv file as example (header and 3 rows -i truncated description field-):
id;SKU;DEFINICION;AMPLIACION;DISPONIBLE;IVA;REC_EQ;PVD;PVD_IVA;PVD_IVA_REC;PVP;PESO;EAN;HAY_FOTO;IMAGEN;FECHA_IMAGEN;CAT;MARCA;FRIO;CONGELADO;BIO;APTO_DIABETICO;GLUTEN;HUEVO;LACTOSA;APTO_VEGANO;UNIDAD_MEDIDA;CANTIDAD_MEDIDA;
1003;"01003";"COPOS DE AVENA 1000GR";"Los copos son granos de cereales que han sido aplastados para facilitar su digestion, manteniendo integras las propiedades del grano.<br>
La avena contiene proteínas en abundancia, así como hidratos de carbono, grasas saludables...";59;2;1.40;2.20;2.42;2.45;3.14;1;"8423266500305";1;"https://distribudiet.net/webstore/images/01003.jpg";"04/03/2020 0:00:00";ALIMENTACIÓN,GRANEL,Cereales legumbres y frutos secos,Desayuno y entre horas,Varios;GRANOVITA;0;0;0;0;1;0;0;1;kilo;1
1018;"01018";"MUESLI 10 FRUTAS 1000GR";"Receta de muesli de cereales, diez tipos diferentes de deliciosas frutas desecadas, frutos secos, semillas de girasol, lino y sesamo.<br>
A finales del ...";63;2;1.40;4.66;5.13;5.19;6.65;1;"8423266500060";1;"https://distribudiet.net/webstore/images/01018.jpg";"04/03/2020 0:00:00";ALIMENTACIÓN,GRANEL,Desayuno y entre horas;GRANOVITA;0;0;0;0;;0;0;1;kilo;1
1037;"01037";"AZUCAR CAÑA INTEGRAL 1000GR";"Azúcar moreno de caña integral sin gluten para endulzar todo tipo de postres, batidos o tus recetas favoritas de repostería. 100% natural, obtenido sin procesamiento quimico por ...";17;2;1.40;3.43;3.77;3.82;4.90;1;"8423266500121";1;"https://distribudiet.net/webstore/images/01037.jpg";"04/03/2020 0:00:00";ALIMENTACIÓN,GRANEL,Endulzantes;GRANOVITA;0;0;0;0;0;0;0;1;kilo;1

<?php
$input = 'PRESTA.csv';
$output = 'OUTPUT.csv';
$file = str_replace("<br>\n", "<br>", file_get_contents($input)); // Remove newlines in description
$lines = explode("\r\n", $file); // Split the file into lines
$fp = fopen($output, 'w'); // Open output file for writing
for ($i = 0; $i < count($lines); ++$i) {
$extract = str_getcsv($lines[$i], ';'); // Split using ; delimeter
if ($i > 0 && isset($extract[16])) // Only replace on the 16th field "CAT"
$extract[16] = str_replace(',', '->', $extract[16]);
else
var_dump($extract); // There are some lines that dont have a CAT field
fputcsv($fp, $extract, ';'); // Write line to file using ; delimeter
}
fclose($fp);

Get part of a string with PHP

I'm reading an RSS feed and outputting it on a page, and I need to take a substring of the <description> tag and store it as a variable (and then convert to a different time format, but I can figure that out myself). Here's a sample of the data I'm working with:
<description><b>When:</b> Tuesday, November 03, 2015 - 6:00 PM - 8:00 PM<br><b>Where:</b> Adult Literacy Classroom (Lower Level) dedicated in honor of Eleanor Moore<br><br>Clases de preparación para el GED  grupos de estudio para ayudar con sus habilidades y preparación para obtener su diploma de equivalencia de escuela. Las clases se llevaran a cabo en español, según la materia (escritura, literatura, estudios sociales, ciencias, matemáticas y la constitución) <br /><br />GED preparation classes  Study groups to help build your skills that will prepare you to get your high school equivalency diploma. Classes are taught in Spanish by subject area (writing, literature, social studies, science, math and the constitution)<br /></description>
I've already got everything within the description tag as a varible, I just need to grab the string Tuesday, November 03, 2015 - 6:00 PM - 8:00 PM, but I can't figure out how to do that. I have a feeling PHP's explode might work, but I'm terrible with regex. I'll keep working on it and post back my progress, but any help would be greatly appreciated.
By the way, I'm using this method to get the data: http://bavotasan.com/2010/display-rss-feed-with-php/
Thanks to #Bomberis123, I was able to do exactly what I needed to. My code may be a little messy, but I figured I'd share it for anyone who needs to do something similar:
<?php
$next_up_at_rss_feed = new DOMDocument();
$next_up_at_rss_feed->load("http://host7.evanced.info/waukegan/evanced/eventsxml.asp?ag=&et=&lib=0&nd=30&feedtitle=Waukegan+Public+Library%3CBR%3ECalendar+of+Programs+%26+Events&dm=rss2&LangType=0");
$next_up_at_posts = array();
foreach ($next_up_at_rss_feed->getElementsByTagName("item") as $node) {
$date = preg_match("/((\s)([^\<])+)/", $node->getElementsByTagName("description")->item(0)->nodeValue, $matches, PREG_OFFSET_CAPTURE, 3);
$date = $matches[0][0];
$next_up_at_post = array (
"title" => $node->getElementsByTagName("title")->item(0)->nodeValue,
"date" => $date,
"link" => $node->getElementsByTagName("guid")->item(0)->nodeValue,
);
array_push($next_up_at_posts, $next_up_at_post);
}
$next_up_at_limit = 4;
for ($next_up_at_counter = 0; $next_up_at_counter < $next_up_at_limit; $next_up_at_counter++) {
// get each value from the array;
$title = str_replace(" & ", " & ", $next_up_at_posts[$next_up_at_counter]["title"]);
$link = $next_up_at_posts[$next_up_at_counter]["link"];
$date_raw = $next_up_at_posts[$next_up_at_counter]["date"];
// seperate out the date so it can be formatted
$date_array = explode(" - ", $date_raw);
// set up various formats for date
$date = $date_array[0];
$date_time = strtotime($date);
$date_iso = date("Y-m-d", $date_time);
$date_pretty = date("F j", $date_time);
// set up various formats for start time
$start = $date_array[1];
$start_time = strtotime($start);
$start_iso = date("H:i", $start_time);
$start_pretty = date("g:ia", $start_time);
// set up various formats for end time
$end = $date_array[2];
$end_time = strtotime($end);
$end_iso = date("H:i", $end_time);
$end_pretty = date("g:ia", $end_time);
// display the data
echo "<article class='mini-article'><header class='mini-article-header'>";
echo "<h6 class='mini-article-heading'><a href='{$link}' target='_blank'>{$title}</a></h6>";
echo "<p class='mini-article-sub-heading'><a href='{$link}' target='_blank'><time datetime='{$date_iso}T{$start_iso}-06:00'>{$date_pretty}, {$start_pretty} - {$end_pretty}</time></a></p>";
echo "</header></article>";
}
?>

Try this Regex you can use php regex and use first group https://regex101.com/r/fI8nU9/1
$subject = "<description><b>When:</b> Tuesday, November 03, 2015 - 6:00 PM - 8:00 PM<br><b>Where:</b> Adult Literacy Classroom (Lower Level) dedicated in honor of Eleanor Moore<br><br>Clases de preparación para el GED  grupos de estudio para ayudar con sus habilidades y preparación para obtener su diploma de equivalencia de escuela. Las clases se llevaran a cabo en español, según la materia (escritura, literatura, estudios sociales, ciencias, matemáticas y la constitución) <br /><br />GED preparation classes  Study groups to help build your skills that will prepare you to get your high school equivalency diploma. Classes are taught in Spanish by subject area (writing, literature, social studies, science, math and the constitution)<br /></description>";
$pattern = '/((\s)([^&])+)/';
preg_match($pattern, $subject, $matches, PREG_OFFSET_CAPTURE, 3);
echo $matches[0][0];

Hurray, something I can help with and my first StackOverflow answer! Try something like this. It does use regex but just a couple simple pieces of syntax you can pick up.
$data = "<description><b>When:</b> Tuesday, November 03, 2015 - 6:00 PM - 8:00 PM<br><b>Where:</b> Adult Literacy Classroom (Lower Level) dedicated in honor of Eleanor Moore<br><br>Clases de preparación para el GED  grupos de estudio para ayudar con sus habilidades y preparación para obtener su diploma de equivalencia de escuela. Las clases se llevaran a cabo en español, según la materia (escritura, literatura, estudios sociales, ciencias, matemáticas y la constitución) <br /><br />GED preparation classes  Study groups to help build your skills that will prepare you to get your high school equivalency diploma. Classes are taught in Spanish by subject area (writing, literature, social studies, science, math and the constitution)<br /></description>";
$regex = "~<description><b>When:</b> (.+?)<br><b>Where:</b>~";
preg_match($regex,$data,$match);
echo $match[1];
I tested this and it works.
In this instance, you just set up $regex with what you expect the raw string to look like, with ~ on either end and (.+?) where the part you want to extract is.

I am far from an expert on regexp, but this might be something for the more paranoid programmer:
$s = '<description><b>When:</b> Tuesday, November 03, 2015 - 6:00 PM - 8:00 PM<br><b>Where:</b> Adult Literacy Classroom (Lower Level) dedicated in honor of Eleanor Moore<br><br>Clases de preparación para el GED  grupos de estudio para ayudar con sus habilidades y preparación para obtener su diploma de equivalencia de escuela. Las clases se llevaran a cabo en español, según la materia (escritura, literatura, estudios sociales, ciencias, matemáticas y la constitución) <br /><br />GED preparation classes  Study groups to help build your skills that will prepare you to get your high school equivalency diploma. Classes are taught in Spanish by subject area (writing, literature, social studies, science, math and the constitution)<br /></description>';
$a = array();
$p = '/(Monday|Tuesday|Wednesday|Thursday|Friday|Saturday|Sunday),\s'
.'(January|February|March|April|May|June|July|August|September|October|November|December)\s'
.'[0-3][0-9],\s[1-2][0-9]{3}\s-\s' // Year
.'[0-2]?[0-9]:[0-5][0-9]\s[AP]M\s-\s' // Time
.'[0-2]?[0-9]:[0-5][0-9]\s[AP]M/'; // Time
preg_match( $p, $s, $a, PREG_OFFSET_CAPTURE );
echo $a[0][0];
Tested and working...
This will catch a date formatted as described, somewhere in the text.

Running array through dictionary with repeated values

So, i have two files, first is a text file, and the second is a encryption of the first file:
textfile:
cryptool (starting example for the cryptool version family 1.x)
cryptool is a comprehensive free educational program about
cryptography and cryptanalysis offering extensive online help and many
visualizations.
this is a text file, created in order to help you to make your first
steps with cryptool.
1) as a first step it is recommended you read the included online
help, this will provide a useful oversight of all available functions
within this application. the starting page of the online help can be
accessed via the menu "help -> starting page" at the top right of the
screen or using the search keyword "starting page" within the index of
the online help. press f1 to start the online help everywhere in
cryptool.
2) a possible next step would be to encrypt a file with the caesar
algorithm. this can be done via the menu "crypt/decrypt -> symmetric
(classic)".
3) there are several examples (tutorials) provided within the online
help which provide an easy way to gain an understanding of cryptology.
these examples can be found via the menu "help -> scenarios
(tutorials)".
4) you can also develop your knowledge by:
- navigating through the menus. you can press f1 at any selected menu item to get further information.
- reading the included readme file (see the menu "help -> readme").
- viewing the included colorful presentation (this presentation can be found on several ways: e.g. in the "help" menu of this application, or
via the "documentation" section found at the "starting" page of the
online help).
- viewing the webpage www.cryptool.org.
july 2010 the cryptool team
encrypted file:
ncjaezzw (delcetyr pilxawp qzc esp ncjaezzw gpcdtzy qlxtwj 1.i)
ncjaezzw td l nzxacpspydtgp qcpp pofnletzylw aczrclx lmzfe
ncjaezrclasj lyo ncjaelylwjdtd zqqpctyr piepydtgp zywtyp spwa lyo xlyj
gtdflwtkletzyd.
estd td l epie qtwp, ncplepo ty zcopc ez spwa jzf ez xlvp jzfc qtcde
depad htes ncjaezzw.
1) ld l qtcde depa te td cpnzxxpyopo jzf cplo esp tynwfopo zywtyp
spwa, estd htww aczgtop l fdpqfw zgpcdtrse zq lww lgltwlmwp qfynetzyd
htesty estd laawtnletzy. esp delcetyr alrp zq esp zywtyp spwa nly mp
lnnpddpo gtl esp xpyf "spwa -> delcetyr alrp" le esp eza ctrse zq esp
dncppy zc fdtyr esp dplcns vpjhzco "delcetyr alrp" htesty esp tyopi zq
esp zywtyp spwa. acpdd q1 ez delce esp zywtyp spwa pgpcjhspcp ty
ncjaezzw.
2) l azddtmwp ypie depa hzfwo mp ez pyncjae l qtwp htes esp nlpdlc
lwrzctesx. estd nly mp ozyp gtl esp xpyf "ncjae/opncjae -> djxxpectn
(nwlddtn)".
3) espcp lcp dpgpclw pilxawpd (efezctlwd) aczgtopo htesty esp zywtyp
spwa hstns aczgtop ly pldj hlj ez rlty ly fyopcdelyotyr zq ncjaezwzrj.
espdp pilxawpd nly mp qzfyo gtl esp xpyf "spwa -> dnpylctzd
(efezctlwd)".
4) jzf nly lwdz opgpwza jzfc vyzhwporp mj:
- ylgtrletyr esczfrs esp xpyfd. jzf nly acpdd q1 le lyj dpwpnepo xpyf tepx ez rpe qfcespc tyqzcxletzy.
- cplotyr esp tynwfopo cploxp qtwp (dpp esp xpyf "spwa -> cploxp").
- gtphtyr esp tynwfopo nzwzcqfw acpdpyeletzy (estd acpdpyeletzy nly mp qzfyo zy dpgpclw hljd: p.r. ty esp "spwa" xpyf zq estd laawtnletzy, zc
gtl esp "oznfxpyeletzy" dpnetzy qzfyo le esp "delcetyr" alrp zq esp
zywtyp spwa).
- gtphtyr esp hpmalrp hhh.ncjaezzw.zcr.
ufwj 2010 esp ncjaezzw eplx
Im counting letter ocurrences in both files, creating a dictionary, so i can go back into the encrypted file and change most of the letters to the right ones, some wont be changed but i'll do it manually later.
Problem is, i think the fact that some letters have the same number of ocurrences, its changing the same letter more than one time.
Heres my code so far, the problem is surelly in the foreach loops but im not managing to fix it. Maybe i can use flags but i have no idea how to do this in a foreach cycle.
//gets string from both text files
$reference = file_get_contents('reference_file.txt', true);
$encrypted = file_get_contents('encrypted_file.txt', true);
//Uses regex to take away all the characters wich are not letters
$azreference = preg_replace("/[^a-z]+/", "", $reference);
$azencrypted = preg_replace("/[^a-z]+/", "", $encrypted);
//Counts number of letter ocurrences and makes a string: "Char => Ocurrences"
$refarray1 = array_count_values(str_split($azreference, '1'));
$refarray2 = array_count_values(str_split($azencrypted, '1'));
foreach ($refarray1 as $key => $val) {
foreach ($refarray2 as $key2 => $val2) {
if ($val == $val2){
$encrypted = str_replace($key2, $key, $encrypted); // (replaces $key2 for $key)
}
}
}
print_r($encrypted);
The output string is, wich is kinda wrong xD:
jjdebdda (wbdjbbdj ebdbeae zdj bwe jjdebdda jejwbdd zdbbad 1.b)
jjdebdda bw d jdbejewedwbje zjee edzjdbbddda ejdjjdb dbdzb
jjdebdjjdewd ddd jjdebdddadwbw dzzejbdj ebbedwbje ddabde weae ddd bddd
jbwzdabzdbbddw. bwbw bw d bebb zbae, jjedbed bd djdej bd weae ddz bd
bdje ddzj zbjwb wbeew wbbw jjdebdda. 1) dw d zbjwb wbee bb bw
jejdbbedded ddz jedd bwe bdjazded ddabde weae, bwbw wbaa ejdjbde d
zwezza djejwbjwb dz daa djdbadbae zzdjbbddw wbbwbd bwbw deeabjdbbdd.
bwe wbdjbbdj edje dz bwe ddabde weae jdd be djjewwed jbd bwe bedz
"weae -> wbdjbbdj edje" db bwe bde jbjwb dz bwe wjjeed dj zwbdj bwe
wedjjw jedwdjd "wbdjbbdj edje" wbbwbd bwe bddeb dz bwe ddabde weae.
ejeww z1 bd wbdjb bwe ddabde weae ejejdwweje bd jjdebdda. 2) d
edwwbbae debb wbee wdzad be bd edjjdeb d zbae wbbw bwe jdewdj
dajdjbbwb. bwbw jdd be ddde jbd bwe bedz "jjdeb/dejjdeb -> wdbbebjbj
(jadwwbj)". 3) bweje dje wejejda ebdbeaew (bzbdjbdaw) ejdjbded wbbwbd
bwe ddabde weae wwbjw ejdjbde dd edwd wdd bd jdbd dd zddejwbdddbdj dz
jjdebdadjd. bwewe ebdbeaew jdd be zdzdd jbd bwe bedz "weae ->
wjeddjbdw (bzbdjbdaw)". 4) ddz jdd dawd dejeade ddzj jddwaedje bd: -
ddjbjdbbdj bwjdzjw bwe bedzw. ddz jdd ejeww z1 db ddd weaejbed bedz
bbeb bd jeb zzjbwej bdzdjbdbbdd. - jeddbdj bwe bdjazded jeddbe zbae
(wee bwe bedz "weae -> jeddbe"). - jbewbdj bwe bdjazded jdadjzza
ejewedbdbbdd (bwbw ejewedbdbbdd jdd be zdzdd dd wejejda wddw: e.j. bd
bwe "weae" bedz dz bwbw deeabjdbbdd, dj jbd bwe "ddjzbedbdbbdd"
wejbbdd zdzdd db bwe "wbdjbbdj" edje dz bwe ddabde weae). - jbewbdj
bwe webedje www.jjdebdda.djj. zzad 2010 bwe jjdebdda bedb

some wont be changed but i'll do it manually later.
So, if you are ready to fix smth later manually, and in order to avoid the problem of re-replacing (meaning replace all the vocabulary in "one hop") you can use the php function strtr (http://php.net/manual/en/function.strtr.php) and change your code just a bit, like the following:
//gets string from both text files
$reference = file_get_contents('reference_file.txt', true);
$encrypted = file_get_contents('encrypted_file.txt', true);
//Uses regex to take away all the characters wich are not letters
$azreference = preg_replace("/[^a-z]+/", "", $reference);
$azencrypted = preg_replace("/[^a-z]+/", "", $encrypted);
//Counts number of letter ocurrences and makes a string: "Char => Ocurrences"
$refarray1 = array_count_values(str_split($azreference, '1'));
$refarray2 = array_count_values(str_split($azencrypted, '1'));
$replacement = array();
foreach ($refarray1 as $key => $val) {
foreach ($refarray2 as $key2 => $val2) {
if ($val == $val2){
$replacement[$key2] = $key;
}
}
}
$encrypted = strtr($encrypted, $replacement);
print_r($encrypted);
The output will be:
cryptnnl (stnrting exnmple fnr the cryptnnl versinn fnmily 1.x)
cryptnnl is n cnmprehensive free educntinnnl prngrnm nbnut cryptngrnphy nnd cryptnnnlysis nffering extensive nnline help nnd mnny visunlijntinns.
this is n text file, crented in nrder tn help ynu tn mnke ynur first steps with cryptnnl.
1) ns n first step it is recnmmended ynu rend the included nnline help, this will prnvide n useful nversight nf nll nvnilnble functinns within this npplicntinn. the stnrting pnge nf the nnline help cnn be nccessed vin the menu "help -> stnrting pnge" nt the tnp right nf the screen nr using the senrch keywnrd "stnrting pnge" within the index nf the nnline help. press f1 tn stnrt the nnline help everywhere in cryptnnl.
2) n pnssible next step wnuld be tn encrypt n file with the cnesnr nlgnrithm. this cnn be dnne vin the menu "crypt/decrypt -> symmetric (clnssic)".
3) there nre severnl exnmples (tutnrinls) prnvided within the nnline help which prnvide nn ensy wny tn gnin nn understnnding nf cryptnlngy. these exnmples cnn be fnund vin the menu "help -> scennrins (tutnrinls)".
4) ynu cnn nlsn develnp ynur knnwledge by: - nnvignting thrnugh the menus. ynu cnn press f1 nt nny selected menu item tn get further infnrmntinn. - rending the included rendme file (see the menu "help -> rendme"). - viewing the included cnlnrful presentntinn (this presentntinn cnn be fnund nn severnl wnys: e.g. in the "help" menu nf this npplicntinn, nr vin the "dncumentntinn" sectinn fnund nt the "stnrting" pnge nf the nnline help). - viewing the webpnge www.cryptnnl.nrg.
july 2010 the cryptnnl tenmi
which is a bit better than "jjdebdda" :) , but, as you expected, still has some collisions.

Regex does not work in large string with html content [PHP]

I am trying to get values such R $ XX, XX [X is an example] using regular expression but I can not.
Below is my code:
$str = 'Indicada para 21 velocidades, corente indexadaCAPACETE MTB MANTUA MUSIC R$140,00PEDIVELA SHIMANO DEORE R$380,00PEDIVELA SHIMANO TX-71 R$99,00CORRENTE SHIMANO HG 40 R$55,00ROLO PARA TREINAMENTO TRANZ-X R$545,00CAPACETE MTB HIGH ONE (PROMOÃ‡ÃƒO) R$85,00BOMBA DE PÃ‰ HIGH ONE COM MANÃ”METRO (NYLON) R$89,90CAPA SELIM GEL (PRÃ“-SPIN) R$45,00SUPORTE DE PAREDE VERTICAL R$20,00SUPORTE DE PAREDE HORIZONTAL R$35,00SUPORTE DE PAREDE VERTICAL PRETO R$28,00ESPUMA PARA GUIDÃƒO R$11,00BOMBA DE PÃ‰ BETO NYLON R$55,00
Bomba pÃ© nylon, acompanha adaptadores: valvula,bola e inflÃ¡veisALAVANCA SHIMANO XT DUAL CONTROL EFM 761 R$500,00
Alavanca (par) 27 velocidades com manetes para freios mecÃ¢nicos, com tecnologia "Dual Control" que chega muito prÃ³ximo do sistema "STI" das bikes de corrida.
SAPATILHA SHIMANO MTB M 064 R$285,00
Pele sintÃ©tica e malha flexÃvel, resistentes ao esticar.
Entressola de poliamida reforÃ§ada com fibra de de vidro.
PamilhaÂ estruturalmente flexÃvel de acordo com uma ampla variedade de formatos de pÃ©.
Volume + forma para melhor acomodaÃ§Ã£o dos dedos dos pÃ©s.
ProteÃ§Ã£o em borracha oferece excelante traÃ§Ã£o e conforto para o caminhar.
Indicada para o pedal PD-M530, PD-M520.
Acompanha a base interna da sapatilha.ALAVANCA SHIMANO EF 51 R$130,00
Alavanca shimano 21 vel, ez-fire c/ maneteCAMPAINHA "I LOVE MY BIKE" R$14,00
Em alumÃnio, nas cores: polido, preto, azul e vermelho.
FÃ¡cil fixaÃ§Ã£o no guidÃ£o.CAPACETE INFANTIL R$57,00CESTA ALUMÃNIO E NYLON
';
$regex = "/R\$[0-9]{1,},[0-9]{1,}/";
$result = preg_match_all($regex, $content, $rs);
var_dump($rs);
What's going on?

Try this code:
$content = "R$13,57 more text R$123,456";
$regex = "/$.*(R\$[0-9]{1,},[0-9]{1,}).*^/";
$result = preg_match_all($regex, $content, $rs);
var_dump($rs);
You need to place the group you are trying to match inside parentheses.

Can't do anything with text file contents (file_get_contents)

FIXED!
File encoding is UTF-16LE, changed to UTF-8 in PhpStorm and it behaves.
===========================================================
I'm reading a text file in PHP and want to read and manipulate the contents, but as soon as I touch the read contents of the file in anyway it 'breaks'.
If I read the file then echo it the text is displayed but any other operation with not work.
$contents = file_get_contents($file);
echo $contents; // works
$contents .= 'a longer test' . $contents;
echo $contents;
My ultimate goal is to run some regex’s on the contents before dumping it into a database but I need to be able to work with it first.
If it makes any difference I am using Laravel. I tried File::get($file) but have the same outcome.
EDIT to show output - Unicode issue?
//// first echo
POUR L ’É T U DE DE L ’H IST O IR E ET DE LA LANGUE DU PAYS, LA CONSERVATION DES A N TIQ U ITÉS DE L ’IL E , ET LA PUBLICATION DE DOCUMENTS HISTORIQUES, ETC., ETC. FONDÉE LE 28 JANVIER, 1873. DIXIÈME BULLETIN ANNUEL. : C. LE F E U VRE, IM PR IM E U R -É D IT EU R D E LA SOCIÉTÉ, BERESFORD LIBRARY , ST. -H ÉLIE R . 1885. = Page 1 =
// Second echo
POUR L ’É T U DE DE L ’H IST O IR E ET DE LA LANGUE DU PAYS, LA CONSERVATION DES A N TIQ U ITÉS DE L ’IL E , ET LA PUBLICATION DE DOCUMENTS HISTORIQUES, ETC., ETC. FONDÉE LE 28 JANVIER, 1873. DIXIÈME BULLETIN ANNUEL. : C. LE F E U VRE, IM PR IM E U R -É D IT EU R D E LA SOCIÉTÉ, BERESFORD LIBRARY , ST. -H ÉLIE R . 1885. = Page 1 =⁡潬杮牥琠獥ｴ෾਀ഀ਀匀伀䌀䤀䔀吀䔀  䨀䔀刀匀䤀䄀䤀匀䔀ഀ਀倀伀唀刀  䰀 ᤀ줠 吀 唀 䐀䔀  䐀䔀  䰀 ᤀ䠠 䤀匀吀 伀 䤀刀 䔀  䔀吀  䐀䔀  䰀䄀  䰀䄀一䜀唀䔀  䐀唀  倀䄀夀匀Ⰰഀ਀䰀䄀  䌀伀一匀䔀刀嘀䄀吀䤀伀一  䐀䔀匀  䄀 一 吀䤀儀 唀 䤀吀준匀  䐀䔀  䰀 ᤀ䤠䰀 䔀 Ⰰ  䔀吀  䰀䄀  倀唀䈀䰀䤀䌀䄀吀䤀伀一 ഀ਀䐀䔀  䐀伀䌀唀䴀䔀一吀匀  䠀䤀匀吀伀刀䤀儀唀䔀匀Ⰰ  䔀吀䌀⸀Ⰰ  䔀吀䌀⸀ഀ਀䘀伀一䐀준䔀  䰀䔀  ㈀㠀  䨀䄀一嘀䤀䔀刀Ⰰ  ㄀㠀㜀㌀⸀ഀ਀䐀䤀堀䤀저䴀䔀  䈀唀䰀䰀䔀吀䤀一  䄀一一唀䔀䰀⸀ഀ਀㨀ഀ਀䌀⸀  䰀䔀   䘀 䔀 唀 嘀刀䔀Ⰰ  䤀䴀 倀刀 䤀䴀 䔀 唀 刀 ⴀ준 䐀 䤀吀 䔀唀 刀   䐀 䔀  䰀䄀  匀伀䌀䤀준吀준Ⰰഀ਀䈀䔀刀䔀匀䘀伀刀䐀  䰀䤀䈀刀䄀刀夀 Ⰰ  匀吀⸀ ⴀ䠀 준䰀䤀䔀 刀 ⸀ഀ਀㄀㠀㠀㔀⸀਀ഀ 㴀 倀愀最攀 ㄀ 㴀
If I put the first string into a HEREDOC all works fine, so might be something with the txt file? It's extracted text from an OCRd from am old PDF.
Full code
public function import()
{
// get all the files
$files = File::files('../import');
foreach ($files as $file) {
// load text file contents
$contents = file_get_contents($file);
echo $contents; // as expected
$contents .= 'a longer test' . $contents;
echo $contents; // weird stuff
// test txt file contents inline
$contents2 = <<<EOD
SOCIETE JERSIAISE
POUR L ’É T U DE DE L ’H IST O IR E ET DE LA LANGUE DU PAYS,
LA CONSERVATION DES A N TIQ U ITÉS DE L ’IL E , ET LA PUBLICATION
DE DOCUMENTS HISTORIQUES, ETC., ETC.
FONDÉE LE 28 JANVIER, 1873.
DIXIÈME BULLETIN ANNUEL.
:
C. LE F E U VRE, IM PR IM E U R -É D IT EU R D E LA SOCIÉTÉ,
BERESFORD LIBRARY , ST. -H ÉLIE R .
1885.
= Page 1 =
EOD;
echo $contents2; // works
$contents2 .= 'a longer test' . $contents2;
echo $contents2; // prints as expected
}

FIXED!
File encoding is UTF-16LE, changed to UTF-8 in PhpStorm and it behaves.
Or in code:
foreach ($files as $file) {
// load text file contents
$contents = file_get_contents($file);
// fix encoding
$contents = mb_convert_encoding($contents, 'UTF-8', 'UTF-16');
echo $contents;
.....

$data_to_write = 'test';
$file_handle = fopen($file, 'a');
fwrite($file_handle, $data_to_write);
fclose($file_handle);

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Extract data from HTML file converted from Word - php

Related

Script to replace comma "," by "->" as multiple value separator on category field of csv (only on this filed of csv)

Get part of a string with PHP

Running array through dictionary with repeated values

Regex does not work in large string with html content [PHP]

Can't do anything with text file contents (file_get_contents)

Categories

Resources