zaggs — 2012-12-12T07:30:17-05:00 — #1
I need some advice with regards to how to code something. I am using CURL to retrieve the html on a page and on that page is a <select> field. I would like PHP to extract the highest value from the select box, please take below as an example:
<select id="provide_vrm:prVRMfrag:prVRMCon:vrmRegistered" class="inputTextBox provideVrmWidth" size="1" name="provide_vrm:prVRMfrag:prVRMCon:vrmRegistered">
<option selected="selected" value="0">Select a vehicle</option>
How can I extract the highest option value? Ie. in this instance the value I want returned is 2 (value="2")
Please help! J
guido2004 — 2012-12-12T07:47:09-05:00 — #2
You can extract all values in an array with preg_match_all and a regular expression, something like
preg_match_all('%<option[.]* value="([^"]+)"%', $yourdata, $matches, PREG_PATTERN_ORDER);
where $yourdata contains the html code you got with curl.
Do a var_dump of $matches to see the result.
Then get the highest value from the array (take a look at rsort)
starlion — 2012-12-12T08:41:42-05:00 — #3
er... be careful doing that, Guido - if there's more than one select box on the page (Like... a language dropdown?), that could end up giving some very bad responses.
Lets make sure we get the -specific- box we're after.
Something a bit more like...
preg_match_all('%<select id="provide_vrm:prVRMfrag:prVRMCon:vrmRegistered".*?(<option.*? value="([^"]+)">.+?</option>)+</select>%', $yourdata, $matches, PREG_PATTERN_ORDER);
(Note: This will change the location of your desired values in the $matches array, because we added another subpattern)
guido2004 — 2012-12-12T09:47:18-05:00 — #4
I know, I based my answer on the info in the OP
on that page is a <select> field
cpradio — 2012-12-12T11:22:28-05:00 — #5
I believe there are DOM methods you could use in PHP to walk through the HTML hierarchy to get to the exact select box too.
oddz — 2012-12-12T12:29:00-05:00 — #6
I would recommend using query path which makes this and a whole lot more super simple when it comes to crawling strings of mark-up.
jgetner — 2012-12-12T13:36:58-05:00 — #7
I would advise against using Regex for matching html attributes as that leaves you prone to many errors. As suggested use a DOM parser witch php does have many bolted on.
stomme_poes — 2012-12-13T17:20:58-05:00 — #8
Using regex to parse HTML? Oh my. This calls for some Zalgo.
See this as a ++ to jgetner's suggestion of using a parser to parse. Lives will be saved. Hair will remain on head. Orphan children will simply grow old without fulfilling prophesies of wizardry, and instead will marry overweight suburbanites and work in insurance until they retire.
Though querypath reminds me of Python's libxml, also sounds good.
joebert — 2012-12-15T13:50:17-05:00 — #9
The first thing I'd do, since the element has a proper ID attribute, is use simple string methods to extract that <select> element from the source. strpos to find the start position of that particular <select>, strpos to find the position of the <select> element's closing tag, and substr to extract it.
Then I'd pass the extracted string to one of the DOM libraries mentioned.
guido2004 — 2012-12-15T15:19:32-05:00 — #10
Yeah yeah, I got it...
serverstorm — 2012-12-16T09:08:02-05:00 — #11
Wow that is funny stuff. Talk about beating a dead horse :lol:
lemon_juice — 2012-12-16T12:53:50-05:00 — #12
Can DOM be used to parse HTML that is not XHTML?
cpradio — 2012-12-16T13:01:20-05:00 — #13
Based on the comments, I would say yes.
lemon_juice — 2012-12-16T14:45:23-05:00 — #14
stomme_poes — 2012-12-17T15:52:32-05:00 — #15
If a browser can do it, you can too. With all the mistakes browsers also make when the HTML is bad
zaggs — 2012-12-19T05:20:29-05:00 — #16
Thanks for your answers guys but one final question:
How can I extract an iframe from HTML? I.e. I just want to return the src of the iframe, lets take the following example:
<iframe title="paymentServicesiframe" id="paymentServicesiframe" src ="https://ips.ihost.com/hpp/checkout.hpp?sessionId=ADWSGET716SJWY2" frameborder="0" align="middle" scrolling="no" height="460px" width="709px"> Your browser does not support in-line frames or is currently configured not to display in-line frames. </iframe>
stomme_poes — 2012-12-19T09:13:36-05:00 — #17
src is but an attribute of the iframe tag. You would grab it the same way you would grab any other element's attributes.