Separate/Group plain text with Regex and PHP - php

I can separate data from the plain text below with Regex.
Plain text:
190.A 42-year-old male patient has been delivered to a hospital in a grave condition with dyspnea, cough with expectoration of purulent
sputum, fever up to 39,5 oC.The ?rst symptoms appeared 3 weeks ago.
Two weeks ago, a local therapist diagnosed him wi- th acute
right-sided pneumonia. Over the last 3 days, the patient’s condition
deteriorated: there was a progress of dyspnea, weakness, lack of
appetite. Chest radiography con?rms a rounded shadow in the lower lobe
of the right lung with a horizontal?uid level, the right si- nus is
not clearly visualized. What is the most likely diagnosis? A.Abscess
of the right lung B.Acute pleuropneumonia C.Right pulmonary empyema
D.Atelectasis of the right lung E.Pleural effusion 191.An 11-year-old
boy complains of general weakness, fever up to 38,2 oC, pain and
swelli- ng of the knee joints, feeling of irregular heartbeat. 3 weeks
ago, the child had quinsy. Knee joints are swollen, the overlying skin
and skin of the knee region is reddened, local temperature is
increased, movements are li- mited. Heart sounds are muf?ed,
extrasystole is present, auscultation reveals apical systolic murmur
that is not conducted to the left ingui- nal region. ESR is 38 mm/h.
CRP is 2+, anti- streptolysin O titre - 40 0. What is the most likely
diagnosis? A.Acute rheumatic fever B.Vegetative dysfunction
C.Non-rheumatic carditis D.Juvenile rheumatoid arthritis E.Reactive
arthritis 192.A 28-year-old male patient complains of sour
regurgitation, cough and heartburn that occurs every day after having
meals, when bending forward or lying down. These problems have been
observed for 4 years. Objective status and laboratory values are
normal. FEGDS revealed endoesophagitis. What is the leading factor in
the development of this disease? A.Failure of the lower esophageal
sphincter B.Hypersecretion of hydrochloric acid C.Duodeno-gastric
re?ux D.Hypergastrinemia E.Helicobacter pylori infection 193.On
admission a 35-year-old female reports acute abdominal pain, fever up
to 38,8 oC, mucopurulent discharges. The pati- ent is nulliparous, has
a history of 2 arti?cial abortions. The patient is unmarried, has
sexual Krok 2 Medicine 20 14 24 contacts. Gynecological examination
reveals no uterus changes. Appendages are enlarged, bilaterally
painful. There is profuse purulent vaginal discharge. What study is
required to con?rm the diagnosis? A.Bacteriologic and bacteriascopic
studies B.Hysteroscopy C.Curettage of uterine cavity D.Vaginoscopy
E.Laparoscopy
What did I do for this?
For the question section:
/(\d+)\.\s*([A-Z].*?)\s+([A-Z]\..*?)(?=\d+\.\s*[A-Z]|$)/s
For the options of question section:
/\s+(?=[A-Z0-9][,.:])
PHP:
$soruAlimPattern = [
'q&a' => '/(\d+)\.\s*([A-Z].*?)\s+([A-Z]\..*?)(?=\d+\.\s*[A-Z]|$)/s',
'answers' => '/\s+(?=[A-Z0-9][,.:])/'
];
$res = [];
if (preg_match_all($soruAlimPattern['q&a'], $temizSoruCikisi, $out, PREG_SET_ORDER) > 0) {
foreach ($out AS $k => $v) {
// remove the full match ($0)
$res[$k] = array_slice($v, 1, 3);
// split the answers
$res[$k][2] = preg_split($soruAlimPattern['answers'], $res[$k][2]);
}
}
$sorularJsonKodlaniyor = json_encode($res);
[...]
I can distinguish between question and question options, but is it possible to use a single Regex code instead of 2 different Regex?
I don't know how quality the PHP code is but it works.
My problem:
1. Sometimes there are unidentifiable letters in the question and these
undefined characters are indicated with a question mark. For
example: `fever up to 39,5 oC.The ?rst symptoms` or `..39,5 oC.The ?rst symptoms..`
2. Due to the numerical values in the question, the Regex code divides the question in half. For example: `... anti- streptolysin O titre - 40 0. What is the most likely diagnosis? ` In fact, the question divides the question because of the number "zero".
Expected JSON Format:
[
{
"question": "190.A 42-year-old male patient has been delivered to a hospital in a grave condition with dyspnea, cough with expectoration of purulent sputum, fever up to 39,5 oC.The ?rst symptoms appeared 3 weeks ago. Two weeks ago, a local therapist diagnosed him wi- th acute right-sided pneumonia. Over the last 3 days, the patient’s condition deteriorated: there was a progress of dyspnea, weakness, lack of appetite. Chest radiography con?rms a rounded shadow in the lower lobe of the right lung with a horizontal?uid level, the right si- nus is not clearly visualized. What is the most likely diagnosis? ",
"answers": [
"A.Abscess of the right lung ",
"B.Acute pleuropneumonia ",
"C.Right pulmonary empyema ",
"D.Atelectasis of the right lung ",
"E.Pleural effusion 1"
]
},
{
"question": "191.An 11-year-old boy complains of general weakness, fever up to 38,2 oC, pain and swelli- ng of the knee joints, feeling of irregular heartbeat. 3 weeks ago, the child had quinsy. Knee joints are swollen, the overlying skin and skin of the knee region is reddened, local temperature is increased, movements are li- mited. Heart sounds are muf?ed, extrasystole is present, auscultation reveals apical systolic murmur that is not conducted to the left ingui- nal region. ESR is 38 mm/h. CRP is 2+, anti- streptolysin O titre - 40 0. What is the most likely diagnosis? ",
"answers": [
"A.Acute rheumatic fever ",
"B.Vegetative dysfunction ",
"C.Non-rheumatic carditis ",
"D.Juvenile rheumatoid arthritis ",
"E.Reactive arthritis 1"
]
},
{
"question": "192.A 28-year-old male patient complains of sour regurgitation, cough and heartburn that occurs every day after having meals, when bending forward or lying down. These problems have been observed for 4 years. Objective status and laboratory values are normal. FEGDS revealed endoesophagitis. What is the leading factor in the development of this disease? ",
"answers": [
"A.Failure of the lower esophageal sphincter ",
"B.Hypersecretion of hydrochloric acid ",
"C.Duodeno-gastric re?ux ",
"D.Hypergastrinemia ",
"E.Helicobacter pylori infection 1"
]
},
{
"question": "193.On admission a 35-year-old female reports acute abdominal pain, fever up to 38,8 oC, mucopurulent discharges. The pati- ent is nulliparous, has a history of 2 arti?cial abortions. The patient is unmarried, has sexual Krok 2 Medicine 20 14 24 contacts. Gynecological examination reveals no uterus changes. Appendages are enlarged, bilaterally painful. There is profuse purulent vaginal discharge. What study is required to con?rm the diagnosis? ",
"answers": [
"A.Bacteriologic and bacteriascopic studies ",
"B.Hysteroscopy ",
"C.Curettage of uterine cavity ",
"D.Vaginoscopy ",
"E.Laparoscopy 1"
]
}
]
How can I overcome these problems?

What you might do is use preg_split to get all the strings with the right characters at the start like 190.A or A.
\b(?=(?:\d+|[A-Z])\.[A-Z])
\b Word boundary
(?= Positive lookahead, assert what is on the right is
(?:\d+|[A-Z]) Match either 1+ digits or a single char A-Z
\.[A-Z] Match . and a single char A-Z
) Close positive lookahead
Regex demo | Php demo
If you have all those entries in an array, you could for example use array_reduce to the get array structure that you need for the json output.
$pattern = "/\b(?=(?:\d+|[A-Z])\.[A-Z])/";
$result = preg_split($pattern, $data, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
$result = array_reduce($result, function($carry, $item){
// If the string starts with a digit
if (ctype_digit(substr($item, 0, 1))) {
// Create the questions key
$carry[] = ["question" => $item];
return $carry;
}
// Get reference to the last added array in $carry
end($carry);
$last = &$carry[key($carry)];
// Create the answers key
array_key_exists("answers", $last) ? $last["answers"][] = $item : $last["answers"] = [$item];
return $carry;
}, []);
print_r(json_encode($result))
Output
[
{
"question": "190.A 42-year-old male patient has been delivered to a hospital in a grave condition with dyspnea, cough with expectoration of purulent sputum, fever up to 39,5 oC.The ?rst symptoms appeared 3 weeks ago. Two weeks ago, a local therapist diagnosed him wi- th acute right-sided pneumonia. Over the last 3 days, the patient\u2019s condition deteriorated: there was a progress of dyspnea, weakness, lack of appetite. Chest radiography con?rms a rounded shadow in the lower lobe of the right lung with a horizontal?uid level, the right si- nus is not clearly visualized. What is the most likely diagnosis? ",
"answers": [
"A.Abscess of the right lung ",
"B.Acute pleuropneumonia ",
"C.Right pulmonary empyema ",
"D.Atelectasis of the right lung ",
"E.Pleural effusion "
]
},
{
"question": "191.An 11-year-old boy complains of general weakness, fever up to 38,2 oC, pain and swelli- ng of the knee joints, feeling of irregular heartbeat. 3 weeks ago, the child had quinsy. Knee joints are swollen, the overlying skin and skin of the knee region is reddened, local temperature is increased, movements are li- mited. Heart sounds are muf?ed, extrasystole is present, auscultation reveals apical systolic murmur that is not conducted to the left ingui- nal region. ESR is 38 mm\/h. CRP is 2+, anti- streptolysin O titre - 40 0. What is the most likely diagnosis? ",
"answers": [
"A.Acute rheumatic fever ",
"B.Vegetative dysfunction ",
"C.Non-rheumatic carditis ",
"D.Juvenile rheumatoid arthritis ",
"E.Reactive arthritis "
]
},
{
"question": "192.A 28-year-old male patient complains of sour regurgitation, cough and heartburn that occurs every day after having meals, when bending forward or lying down. These problems have been observed for 4 years. Objective status and laboratory values are normal. FEGDS revealed endoesophagitis. What is the leading factor in the development of this disease? ",
"answers": [
"A.Failure of the lower esophageal sphincter ",
"B.Hypersecretion of hydrochloric acid ",
"C.Duodeno-gastric re?ux ",
"D.Hypergastrinemia ",
"E.Helicobacter pylori infection "
]
},
{
"question": "193.On admission a 35-year-old female reports acute abdominal pain, fever up to 38,8 oC, mucopurulent discharges. The pati- ent is nulliparous, has a history of 2 arti?cial abortions. The patient is unmarried, has sexual Krok 2 Medicine 20 14 24 contacts. Gynecological examination reveals no uterus changes. Appendages are enlarged, bilaterally painful. There is profuse purulent vaginal discharge. What study is required to con?rm the diagnosis? ",
"answers": [
"A.Bacteriologic and bacteriascopic studies ",
"B.Hysteroscopy ",
"C.Curettage of uterine cavity ",
"D.Vaginoscopy ",
"E.Laparoscopy"
]
}
]

Related

Extracting all string values matching a pattern for CAS Numbers in PHP

Ok, so I working on extracting CAS Numbers from uploaded SDS files (working on docx before I move to pdf). I have successfully converted the docx to a string in the page, but I need to extract several strings if they exist. Here is the code I'm using, and I don't think I'm using preg_match_all correctly at all.
$docObj = new DocxConversion($_FILES["sdsFile"]["tmp_name"]);
$docText = $docObj->convertToText();
preg_match_all("/[0-9]{2,7}-[0-9]{2}-[0-9]{1}/", $docText, $matches);
print_r($matches);
This give me Array ( [0] => Array ( ) ). Not very helpful when I'm looking for:
64742‐47‐8
64742‐65‐0
9003‐29‐6
The output of $docText is:
IDENTIFICATION PRODUCT IDENTIFIER USED ON LABEL: Finished Product Item Number Customer Item Number LABEL DESCRIPTION ACTUAL BRAND SM5802EE ECHO POWERBLEND X EXTENDED LIFE OIL ECHO SMGR33EC 6450005 ECHO POWER BLEND X ECHO SMGR01EC 6450025 ECHO POWER BLEND X ECHO SMGR07EC 6450002 ECHO POWER BLEND X ECHO SM5101EC X6972270101/99988800086 ECHO POWER BLEND X ECHO SM5905EC 6450250 ECHO BAR & CHAIN OIL ECHO SM5818ER 6450114 ECHO POWER BLEND X HIGH PERFORMANCE 2 STROKE ENGINE ECHO SM5818EG 6450103 ECHO POWER BLEND X ECHO SM5238EC 99988800088 ECHO POWER BLEND X ECHO SM5218EC X6972270201/99988800085 ECHO POWER BLEND X ECHO SMGR25EC X6974100202 ECHO POWER BLEND X ECHO SMGR02EC 6450001 ECHO POWER BLEND X ECHO SMGR29EC 6450000 ECHO POWERBLEND X ECHO SM5818EE 6450102 ECHO POWER BLEND X LOW SMOKE ECHO SM5818EC 6450100/6450099 ECHO POWER BLEND X ECHO SM5818EM 6450060 ECHO POWER BLEND X ECHO SMGR34EE ECHO POWERBLEND X ECHO SM5906EC 6450050 ECHO POWER BLEND X ECHO SM5906EM 6450062 ECHO POWER BLEND X ECHO SM5943EE 6450116 ECHO POWER BLEND X ECHO SMGR33EK 6450118 ECHO POWERBLEND X ECHO SMGR34ER 6450109 ECHO POWER BLEND X ECHO SM5926EC 6450006 ECHO POWERBLEND X XTENDED LIFE OIL ECHO SMGR34EE ECHO POWER BLEND X ECHO SMGR34EC 6450108 ECHO POWER BLEND X ECHO SMGR12EC 99988800089 ECHO POWER BLEND X ECHO SMGR34EK 6450119 ECHO POWERBLEND X ECHO SM5834EM 6450061 ECHO POWER BLEND X ECHO Finished Product Item Number Customer Item Number LABEL DESCRIPTION ACTUAL BRAND SMGR34EG 6450115 ECHO POWER BLEND X ECHO SM5955EC 6452750 ECHO POWER BLEND X ECHO RECOMMENDED USE OF THE CHEMICAL AND RESTRICTIONS ON USE; PETROLEUM LUBRICATING OIL NO OTHER USES RECOMMENDED NAME, ADDRESS, AND TELEPHONE NUMBER OF THE CHEMICAL MANUFACTURER, IMPORTER, OR OTHER RESPONSIBLE PARTY: 1.3.1. Spectrum Lubricants Corporation 500 Industrial Park Drive Selmer, TN 38375‐3276 United States of America Product Information MSDS Requests: (800) 264‐6457 or +17316454972 Technical Information: (800) 264‐6457 or +17316454972 General Information: vswedley#spectrumcorporation.comEMERGENCY PHONE NUMBER: 1.4.1. Emergency Response North America: CHEMTREC (800) 424‐9300 after 5:00pm CST Or +17035273887 Health Emergency USA: (800) 264‐6457 or +17316454972 HAZARD(S) IDENTIFICATION CLASSIFICATION OF THE CHEMICAL IN ACCORDANCE WITH PARAGRAPH (d) of §1910.1200: Acute Inhalation Category 4 Eye Irritant Category 2 Skin Corrosion/Irritation Category 2 Flammable Liquid Category 4 Signal Word: Warning Symbol: Hazard Statements: Harmful if Inhaled Causes serious eye irritation Causes skin irritation Combustible Liquid Precautionary Statements: Prevention: Avoid breathing mist or spray. Use only outdoors or in a well‐ventilated area. Wear eye/face protection Wear protective gloves Keep away from heat, hot surfaces, sparks, open flames and other ignition sources. No smoking. Response: If inhaled: Remove person to fresh air and keep comfortable for breathing. If in eyes: Rinse cautiously with water for several minutes. Remove contact lenses, if present and easy to do. Continue rinsing. If eye irritation persists get medical advice/attention. If on skin: wash with plenty of water, if irritation or rash occurs get medical advice/attention. Take off contaminated clothing and wash it before reuse. Call a poison center/doctor if you feel unwell. In case of fire: Use water fog, foam, dry chemical or carbon dioxide (CO2) to extinguish flames. Storage: Store in well‐ventilated place. Disposal: Dispose of contents/container in accordance with local/regional/national/international regulations. Composition/ information on ingredients The chemical name and concentration (exact percentage) or concentration ranges of all ingredients which are classified as health hazards in accordance with paragraph (d) of §1910.1200 3.1.1. COMPONENTS CAS Number EU Number Concentration (%) Hazard Statements (see Section 16) Distillates (petroleum), hydrotreated light 64742‐47‐8 265‐149‐8 10‐30 H226, H304, H315, Solvent‐dewaxed heavy paraffinic distillates 64742‐65‐0 265‐169‐7 40‐50 H315, H332 Polyiosbutylene 9003‐29‐6 Not available 40‐70 H315, H319, H332 FIRST AID MEASURES
There's more, but I'll spare you...
You need to add other hyphens:
~\d{2,7}\p{Pd}\d{2}\p{Pd}\d~u
See a demo on regex101.com.
Broken down:
~ # pattern delimiter
\d{2,7} # digits, 2-7 times
\p{Pd} # matches any kind of hyphen or dash (including unicode characters)
\d{2} # 2 digits
\p{Pd} # same as above
\d # one digit
~ # pattern delimiter
u # unicode flag (pattern modifier)
In PHP:
preg_match_all('~\d{2,7}\p{Pd}\d{2}\p{Pd}\d~u', $docText, $matches);

json string is showing blank why is not getting decoded

i have json string but when i am getting it json_decode() it is showing blank.
$str = '[{"actcode":"Auck4","actname":"Sky Tower","date":"","time":"","timeduration":"","adult":"0","adultprice":"28","child":"0","childprice":"0","description":"Discover the best of Auckland in half a day. Soak up spectacular sights on this scenic tour, from heritage-listed buildings on Queen Street to the stunning Viaduct Harbour and panoramic vistas from the Sky Tower observation deck.
Start your tour with a hotel pick-up and travel through Auckland?s dynamic Central Business District. Travel across the iconic Auckland Harbour Bridge and admire stunning city views. Then, return to the city centre and visit the vibrant precinct of Wynyard Quarter. Here, wander among the sculptures and enjoy the happenings on the water of Viaduct Harbour.
Continue to Queen Street, also known as the ?Golden Mile? of Aucklands business and shopping district. Marvel at historic buildings like the Ferry Terminal building before visiting the Auckland Museum. Here, explore fascinating exhibits paying tribute to New Zealands natural, Maori and European histories. Afterwards, travel along Aucklands most expensive residential streets with fantastic views of the Waitemata Harbour and its islands.
Your tour ends at Sky Tower, the tallest free-standing structure in the Southern Hemisphere. Take in breathtaking 360-degree views of the city and its surroundings. In the afternoon, continue your own exploration of Auckland."}]';
i tried the below code
$array = json_decode($str,true);
echo print_r($array);
this one too
$str1 = trim($str);
$array = json_decode($str1,true);
echo print_r($array);
but the string si showing blank
try this one.
$string = mysql_real_escape_string($str);
$findsym = array('\r', '\n');
$removesym = array("", "");
$strdone = stripslashes(str_replace($findsym,$removesym,strip_tags($string)));
$jsonarray = json_decode($strdone,true);
echo "<pre>"; echo print_r($jsonarray);

How to create csv file using raw text data

i am much confused at this point regarding the csv file creation and insert data in the database.
suppose i have below text data - that is of 45000 record set, i am posting dew of them below.
Winged Wheels in France, by Michael Myers Shoemaker 45790
A Battle Fought on Snow Shoes, by Mary Cochrane Rogers 45789
The German Classics of the Nineteenth and Twentieth Centuries, 45788
Volume 11, by Friedrich Spielhagen, Theodor Storm,
Wilhelm Raabe, Marion D. Learned and Ewald Eiserhardt
[Subtitle: Masterpieces of German Literature
Translated Into English]
Zofloya ou le Maure, Tomes 1-4, by Charlotte Dacre 45787
[Subtitle: Histoire du XVe si?cle]
[Language: French]
Their Majesties as I Knew Them, by Xavier Paoli 45786
[Subtitle: Personal Reminiscences of the
Kings and Queens of Europe]
[Translator: Alexander Teixeira de Mattos]
New York Times Current History: The European War, Vol. 8, 45785
Pt. 2, No. 1, July 1918, by Various
Gallery of Comicalities, by Robert Cruikshank, 45784
George Cruikshank and Robert Seymour
[Subtitle: Embracing Humorous Sketches]
Katri, by Emil Nervander 45783
[Subtitle: Kertomus 17 vuosi-sadasta]
[Language: Finnish]
The Little Brown Jug at Kildare, by Meredith Nicholson 45782
[Illustrator: James Montgomery Flagg]
Beaumont & Fletcher's Works (6 through 10), by Francis Beaumont 45781
and John Fletcher
[Subtitle: The Queen of Corinth; Bonduca; The Knight of the
Burning Pestle; Loves Pilgrimage; The Double Marriage]
Beaumont & Fletcher's Works (1 through 5), by Francis Beaumont 45780
and John Fletcher
[Subtitle: A Wife for a Month; The Lovers Progress;
The Pilgrim; The Captain; The Prophetess]
The Washington Historical Quarterly, Volume V, 1914, by Various 45779
[Editor: Edmond S. Meany]
Minstrelsy of the Scottish Border Volume III of 3, by Walter Scott 45778
[Subtitle: Consisting of Historical and Romantic Ballads,
Collected In the Southern Counties of Scotland; With
a Few Of Modern Date, Founded Upon Local Tradition.
In Three Volumes. Vol. III]
What i want is simply insert Winged Wheels in France, by Michael Myers Shoemaker in one column and 45790 in other column of CSV. then i will be able to add them to my database.
moreover, e.g,
The German Classics of the Nineteenth and Twentieth Centuries,
Volume 11, by Friedrich Spielhagen, Theodor Storm,
Wilhelm Raabe, Marion D. Learned and Ewald Eiserhardt
[Subtitle: Masterpieces of German Literature
Translated Into English]
i want to insert above text in this way:
The German Classics of the Nineteenth and Twentieth Centuries,
Volume 11, by Friedrich Spielhagen, Theodor Storm,
Wilhelm Raabe, Marion D. Learned and Ewald Eiserhardt
means no this portion:
[Subtitle: Masterpieces of German Literature
Translated Into English]
the ", by" should also omitted and so my new data would be like this. so actually i need three columns in the csv.
1 | Winged Wheels in France | Michael Myers Shoemaker | 45790
2 | The German Classics of the Nineteenth and Twentieth Centuries,
Volume 11 | Friedrich Spielhagen, Theodor Storm,
Wilhelm Raabe, Marion D. Learned and Ewald Eiserhardt | 45789
Please help in getting it inserted in excel file and create csv from it.
thank you all.
do like this
(not tested)
$f = file_get_contents('yourtextfile.txt');
$f = preg_replace('/\[(.*?)\]/s','',$f);
$f = str_replace(array("\n", "\r"), '', $f);
file_put_contents('temp.txt',$f);
$file = file('temp.txt');
foreach($file as $key => $line){
if($line!=null || $line!='')
{
mysqli_query($connection,"insert into table1(column1) values('$line')");
}
}
edit
$f = file_get_contents('tt.txt');
$f = preg_replace('/\[(.*?)\]/s','',$f);
$keywords = preg_split("/[ ]{15}/", $f);
print_r(array_filter($keywords));
i'm still not clear,whether you have those numbers in your file or you have just mentioned it there!!

php: count instances of words in a given string then return top 5 which match in another array

php: sort and count instances of words in a given string
In this article, I have know how to count instances of words in a given string and sort by frequency. Now I want make a further work, match the result words into anther array ($keywords), then only get the top 5 words. But I do not know how to do that, open a question. thanks.
$txt = <<<EOT
The 2013 Monaco Grand Prix (formally known as the Grand Prix de Monaco 2013) was a Formula One motor race that took place on 26 May 2013 at the Circuit de Monaco, a street circuit that runs through the principality of Monaco. The race was won by Nico Rosberg for Mercedes AMG Petronas, repeating the feat of his father Keke Rosberg in the 1983 race. The race was the sixth round of the 2013 season, and marked the seventy-second time the Monaco Grand Prix has been held. Rosberg had started the race from pole.
Background
Mercedes protest
Just before the race, Red Bull and Ferrari filed an official protest against Mercedes, having learned on the night before the race of a three-day tyre test undertaken by Pirelli at the venue of the last grand prix using Mercedes' car driven by both Hamilton and Rosberg. They claimed this violated the rule against in-season testing and gave Mercedes a competitive advantage in both the Monaco race and the next race, which would both be using the tyre that was tested (with Pirelli having been criticised following some tyre failures earlier in the season, the tests had been conducted on an improved design planned to be introduced two races after Monaco). Mercedes stated the FIA had approved the test. Pirelli cited their contract with the FIA which allows limited testing, but Red Bull and Ferrari argued this must only be with a car at least two years old. It was the second test conducted by Pirelli in the season, the first having been between race 4 and 5, but using a 2011 Ferrari car.[4]
Tyres
Tyre supplier Pirelli brought its yellow-banded soft compound tyre as the harder "prime" tyre and the red-banded super-soft compound tyre as the softer "option" tyre, just as they did the previous two years. It was the second time in the season that the super-soft compound was used at a race weekend, as was the case with the soft tyre compound.
EOT;
$words = array_count_values(str_word_count($txt, 1));
arsort($words);
var_dump($words);
$keywords = array("Monaco","Prix","2013","season","Formula","race","motor","street","Ferrari","Mercedes","Hamilton","Rosberg","Tyre");
//var_dump($words) which should match in $keywords array, then get top 5 words.
You already have $words as an associative array, indexed by the word and with the count as the value, so we use array_flip() to make your $keywords array an associative array indexed by word as well. Then we can use array_intersect_key() to return only those entries from $words that have a matching index entry in our flipped $keywords array.
This gives a resulting $matchWords array, still keyed by the word, but containing only those entries from the original $words array that match $keywords; and still sorted by frequency.
We then simply use array_slice() to extract the first 5 entries from that array.
$matchWords = array_intersect_key(
$words,
array_flip($keywords)
);
$matchWords = array_slice($matchWords, 0, 5);
var_dump($matchWords);
gives
array(5) {
'race' =>
int(11)
'Monaco' =>
int(7)
'Mercedes' =>
int(5)
'Rosberg' =>
int(4)
'season' =>
int(4)
}
Caveat: You could have problems with case-sensitivity. "Race" !== "race", so the $words = array_count_values(str_word_count($txt, 1)); line will treat these as two different words.

PHP regex and compare variables in a string

i have a tricky problem and seems like i'm stuck. I have an idea how to proceed but no idea how to do it in practice.
What i want to do is convert a string inside .txt file to another format (using regex and variables?). The main problem is when i need to convert those lines marked with //comments.
NOTE: "...villainx calls $x" is calculated differently in original and in the format it should be converted to. And that's the problem i need some serious help.
Example:
This needs to be converted...
HERO posts small blind $0.50.
villain4 posts big blind $1.00.
** Dealing down cards **
Dealt to HERO [ 7s 8c 5d 8d ]
villain1 calls $1.00
villain2 raises to $3.00 // total sum a player raises to
villain3 calls $3.00
HERO calls $3.00
villain4 calls $3.00
villain1 calls $3.00 // total sum a player calls whether he has put money in to the pot before (as he has -- $1 call, first to act)
** Dealing Flop ** [ 9c, Ah, Jh ]
...to this:
HERO posts small blind [$0.50 USD].
villain4 posts big blind [$1.00 USD].
** Dealing down cards **
Dealt to HERO [ 7s 8c 5d 8d ]
villain1 calls [$1.00 USD]
villain2 raises [$3.00 USD] // total sum a player raises to
villain3 calls [$3.00 USD]
HERO calls [$2.50 USD] // a sum player calls = last raise ($3) - money put in (=$0.50 small blind)
villain4 calls [$2.00 USD] // $3 - $1 (big blind)
villain1 calls [$2.00 USD] // $3 - $1 (the call first to act)
** Dealing Flop ** [ 9c, Ah, Jh ]
Another example:
HERO posts small blind $0.50.
villain4 posts big blind $1.00.
** Dealing down cards **
Dealt to HERO [ 7s 8c 5d 8d ]
villain1 bets $5.50
villain2 raises to $20.00
villain3 raises to $40.00
villain1 calls $40.00 //THIS NEEDS TO BE "calls $34.50"
villain2 calls $40.00 //THIS NEEDS TO BE "calls $20.00"
** Dealing Flop ** [ 9c, Ah, Jh ]
and here's the full example how the whole hand should look. Txt file could contain a hundreds of hands. I've managed to preg_replace basically all other issues but that above. I'm lost. Please help me! :D
***** Hand History for Game 335502358 ***** (Full Tilt)
$100.00 USD PL Omaha - Thursday, October 15, 01:32:21 ET 2009
Table Foxtrot (Real Money)
Seat 3 is the button
Seat 1: villain1 ( $38.50 USD )
Seat 2: villain2 ( $99.65 USD )
Seat 3: villain3 ( $415.55 USD )
Seat 4: HERO ( $99.00 USD )
Seat 6: villain4 ( $171.20 USD )
HERO posts small blind [$0.50 USD].
villain4 posts big blind [$1.00 USD].
** Dealing down cards **
Dealt to HERO [ 7s 8c 5d 8d ]
villain1 calls [$1.00 USD]
villain2 raises [$3.00 USD]
villain3 calls [$3.00 USD]
HERO calls [$2.50 USD]
villain4 calls [$2.00 USD]
villain1 calls [$2.00 USD]
** Dealing Flop ** [ 9c, Ah, Jh ]
HERO checks
villain4 checks
villain1 checks
villain2 bets [$8.00 USD]
villain3 folds
HERO folds
villain4 calls [$8.00 USD]
villain1 folds
** Dealing Turn ** [ Th ]
villain4 checks
villain2 bets [$13.00 USD]
villain4 calls [$13.00 USD]
** Dealing River ** [ 3c ]
villain4 checks
villain2 checks
villain2 shows [Qc, Js 8s Qd ]
villain4 shows [Kh, Tc 7h Kd ]
villain4 wins $54.15 USD from main pot
edit 1: added NOTE to clarify my real question
edit 2: added another example
Could you use a preg_match to pull out the dollar value and re-arrange the string with a preg_replace?
$regex = '/(\$[0-9.]+)/';
$matched = preg_match($regex, $stringToMatch, $matches);
if($matched > 0)
{
$output string = preg_replace($regex, '['.$matches[0].' USD]', $stringToMatch);
}
The only thing this won't do is ignore the lines at the beginning where you declare each 'seat' so you might need to filter those out first [simple strpos($stringToMatch, 'Seat') might be enough there, not wonderfully elegant though].
OK, I'll have another go. This will be written in a sort of php/psuedocode thing.
while($line = get the next line)
{
if($line contains 'seat')
{
$player = get player from $line
$pool = get player pool from $line
$bettingMatrix[$player]['pool'] = $pool;
}
else if($line contains 'blind')
{
$player = get player from $line
$betValue = get blind value from $line
$bettingMatrix[$player]['betTotal'] = $betValue
$bettingMatrix['pot'] += $betValue //keep a sum of the pot
}
else if($line contains 'raises')
{
$player = get player from $line
$betValue = get value from $line
$betMade = $betValue - $bettingMatrix[$player]['betTotal']; //actual amount raised by
$bettingMatrix[$player]['betTotal'] = $betValue //$line contains total bet this hand (shortcut)
$bettingMatrix['raiseValue'] = $betMade
$bettingMatrix['pot'] += $betMade //keep a sum of the pot
}
else if($line contains 'calls')
{
$player = get player from $line
//if player has called, can work out bet from raiseValue
$betMade = $bettingMatrix['raiseValue']
$bettingMatrix[$player]['betTotal'] += $betMade
$bettingMatrix['pot'] += $betMade //keep a sum of the pot
}
else if(substr($line, 0, 3) == ' Wins ') //probably do something about players named Wins :)
{
//assume all bets resolved
foreach($bettingMatrix[$player])
{
update pool.
zero betTotal
}
zero pot, zero raiseValue
}
}
erm, it's pretty rough and ready, and I probably wouldn't class it as a Parser as such, but it does just about work out all the values you need, I think anyway. The 2 $betMade variables should end up with the values you want.
Edit: I've just noticed that it doesn't quite work if no-one raises and everyone just calls or folds (and probably dies horribly if everyone just folds or whatever). It does need a little bit more work, but it is the general gist - call it a half-answer. Sorry.

Categories