Parsing - Time

From nswccWiki
Jump to: navigation, search

Parsing Time

One of the most difficult aspects of computer coding involves parsing especially of human formatted/generated text. This issue has been partly dealt with through the use of regular expression analysis: regexp.

As a science teacher I have been continually challenged by students "not getting it". The "it" in this case is how to convert SI denoted values using [<sign>]<magnitude> [<prefix multiplier>]<base-unit> syntax into a meaningful value for use in mathematical computation. Many teachers while they can mechanise the process, really don't clearly understand what is actually required when learning this process.

if you thought it was all very straight forward and easy… think again. The AppleScript below shows you what you have to learn to perform this trick just for extracting time values and transforming them from [<sign>]<magnitude> [<prefix multiplier>]<base-unit> to [<sign>]<magnitude> <base-unit> for the more common range of vales of time. You will observe that scaling from SI seconds to minutes, hours and beyond has been omitted. It's messy but straight forward. Other SI units pose similar difficulties, but their scaling functions are obviously regular.

So the next time you ask a student to extract an SI value from some text and then transform it to remove the scaling prefix, try to remember this code snippet. Alternatively, teach your students something about regular expression analysis… you'll be so glad you did and then realised why it's so difficult!

-- Ian W. Parker
-- first worked on:  2010-07-15, last worked on: 2010-07-15
-- extract time in seconds description from a string using regexp
-- requires Satimage OSAX
set test_string to "this is a  test  0.52 ms  to see if it works"

parse_time(test_string)
-- result is {5.2E-4, "s"}

on parse_time(test_string)
	-- ensure low overhead
	local current_temp, temp_dim, temp_mag, the_mag, mult
	try
		set current_temp to matchResult of (find text "([-+]?[0-9]*.?[0-9]*[E]?[-]?[0-9]*) ([Mkcmµ]?s)" in test_string with regexp) as string
	on error
		return {"error-2763", "syntax:<space><signed number><space><unit>"}
	end try
	-- extract fields
	set current_temp to every text of current_temp
	--recompose
	set current_temp to splittext current_temp using space
	-- fix up due to using space as split delim
	if length of item 1 of current_temp is 0 then set current_temp to rest of current_temp
	set the_mag to (item 1 of current_temp as number)
	if length of item 2 of current_temp is 2 then
		set mult to character 1 of item 2 of current_temp
		set item 2 of current_temp to character 2 of item 2 of current_temp
		considering case -- to allow for "m" and "M" detection
			if mult is "µ" then set the_mag to the_mag * 1.0E-6
			if mult is "m" then set the_mag to the_mag * 1.0E-3
			if mult is "c" then set the_mag to the_mag * 0.01
		end considering
	end if
	-- add processing for HH:mm:ss (to be written!)
	return {the_mag, item 2 of current_temp}
end parse_time

Parsing Temperature - Not too hot not too cold, just right

This very small program is capable of parsing a temperature specification from text if the specification has the format given in the regular expression. It is fairly forgiving in terms of syntax, any number of spaces etc, but requires all components to be present. The best way to see what it will or will not accept is to either analyse the regular expression or just add and remove bits of the test string. First try adding extra spaces-it will succeed, then remove the ± error specification (it won't succeed). This script also demonstrates how second level evaluation (convert a string into a record of variables) can be achieved by some trickery by running it as a script. This cannot be done using a list.

-- Parse temperature and return a record triple: {value:val, delta:val, dim:val}
-- Ian W. Parker first worked on 2011-08-02, last worked on 2012-05-16
-- requires Satimage OSAX to be installed
-- result is not scaled to SI...

set current_temp to "pre text -23.5 °C ± 0.1 °C post text"


set result to parse_temperature(current_temp)
set temperature to (value of result & " " & dim of result & " ± " & delt of result & " " & dim of result) as string

on parse_temperature(current_temp)
	set current_temp to matchResult of (find text "([-]?[0-9]*.[0-9]*) *°C *±? *([0-9]*.[0-9]*) *°C]?" in current_temp using "{value:\\1, delt:\\2,dim:\"°C\"}" with regexp)
	return (run script result) -- second level evaluation gives us a record
end parse_temperature
Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox