Database Techniques
Contents |
Driving Google Earth using AppleScript
Where in the world is ...
We teach a course in Year 8 Science called “Stuck on a Deserted island”. As part of their work, students build a physical model of the island “somewhere in the Pacific Ocean”. Model building is a desirable outcome of the course.
I designed the model of the island that students build and later found a real island that closely matches my model. The model design resembled an enlarged topographical map.
Wouldn’t it be sound pedagogy to give our students some context of where they are in the world and where an island like their model island is located? To “demonstrate the technology” here is a short AppleScript that drives the Google Earth application.
This short application will immediately fix a Google Earth view over the Southern Highlands of NSW, wait 10 seconds then traverse to Erromango Island in the Pacific Ocean. The traversal is done automatically by Google Earth. Of course, for it to work, you need to be connected on-line to Google. The traversal behaves just like it does when you manually drive Google Earth. I was astounded by how simple it was. I thought I would have to generate the intermediate positions, and speed of the traversal. Not so... This AppleScript does it all for you. Imagine what could be done... clues for a crossword puzzle anyone.
-- on-line traversal from one location on the Earth to another location on the Earth
-- Ian W. Parker, 2008
-- the distance variables optimise views for the two locations
tell application "Google Earth"
activate
-- distance = height above ground level
-- latitude = decimal latitude (see GE preferences)
-- longitude = decimal longitude
-- titl, azimuth can be left as is
-- Start Location - Southern Highlands of NSW
SetViewInfo {latitude:-34.47, longitude:150.42, distance:1.0E+5, tilt:3.77E-10, azimuth:0.03}
delay 10 -- this gives you 10 seconds to orient yourself at the initial view
-- speed determines the behaviour. Slower better, smoother.
-- traversal path determined by Google Earth application
-- End Location - Erromango Island in the Pacific Ocean
-- so all we have to do is ...
SetViewInfo {latitude:-18.825, longitude:169.175, distance:4.8E+4, tilt:3.77E-10, azimuth:0.03} speed 0.02
end tell
Parsing text fields using AppleScript
Data fields in a database may have very unusual features. For example: Suppose you had a database that used the following format: <SurName>'@'<firstName>'#'<group1>'%'<group2> <eol>. There are no embedded tabs or comma separators and each field string may have leading spaces, trailing spaces and/or embedded spaces. AppleScript can parse this very easily.
The aim is to extract a 4-tuple list {<SurName>,<FirstName>,<group1>,<group2>} without leading or trailing spaces and single spaces where there are multiple embedded spaces. Loading a string buffer from a file is not considered here: the string buffer is assumed to contain the data shown and further processing such as data integrity checking and copying the 4-tuples (where each would end up in a column) in an Excel Spreadsheet is trivial but not considered here.
Consider a typical extract from a text report file from a database with the assumed format:
test Record:: = "La1 @ Step7 # 7 %G10"
We could attack this problem in many ways even only using just basic AppleScript. I will offer two options, one based on AppleScript's text item delimiter, and the second using Regular Expression Analysis part of Satimage's incredibly powerful OSAX.
Assumptions about the text.
The text file is assumed to be plain ASCII or UTF-8 text: nothing fancy. The text is divided into lines, ending with a <cr>-<lf> sequence (= Windows <eol>.) Usually we expect either ',' (CSV) or '<tab>' as the delimiters. In this case however, the line is divided into fields of varying length, delimited by '@', '#','%' ASCII characters. <,> and <tab> could be treated as acceptable characters in a field. One reason for using the special delimiter character set approach is that it is easy to identify missing field entries.
-- Ian W Parker 2008
-- first worked on 2008-02-08, last worked on 2008-02-08
-- set up the test harnass string with appropriate structure
set theRecord to " La1 @ Step7 # 7 %G10"
-- this executes
doTokenize(theRecord)
--result : {" La1 ", " Step7 ", " 7 ", "G10"}
-- -------------------------------------------------------------------------------
-- defines the basic splitting routine
-- ------------------------------------------------------------------------------
on doTokenize(currentRecord)
-- initilise
set {surname, firstname, Group1, Group2} to {"", "", "", ""}
local currentRecord2, currentRecord3, surname, firstname, yearGroup, rollGroup
--we save the original TID definition
set TID to AppleScript's text item delimiters
-- split on "@"
set AppleScript's text item delimiters to "@"
set surname to text item 1 of currentRecord
set currentRecord2 to text item 2 of currentRecord
-- split remainder on "#"
set AppleScript's text item delimiters to "#"
set firstname to text item 1 of currentRecord2
set currentRecord3 to text item 2 of currentRecord2
-- split remainder on %
set AppleScript's text item delimiters to "%"
set Group1 to text item 1 of currentRecord3
set Group2 to text item 2 of currentRecord3
-- restor the original TID
set AppleScript's text item delimiters to TID
-- these may have leading or trailing spaces
return {surname, firstname, Group1, Group2}
end doTokenize
In the second example we use regular expression techniques to extract the sub-field items and squeeze multiple blanks.
-- Ian W Parker 2008
-- first worked on 2008-02-08, last worked on 2008-02-09
-- requires satimage OSAX to be installed
-- set up the test harnass string with appropriate structure
set crlf to (ASCII character 10) & (ASCII character 13)
-- assume for the example the string has a tralining crlf as eol as per Windows
-- in a file processing example, the eol would have already been stripped
set theRecord to " La1 @ Step7 # 7 %G10 " & crlf
-- this executes
getTokens(theRecord)
-- result is {"La1", "Step7", "7", "G10"}
-- -------------------------------------------------------------------------------
-- defines the enhanced REGEX based routine
-- returns a list of 4 strings. Internally the strings
-- have no leading, no trailing spaces AND any internal
-- spaces are respected, but contracted to a single space
-- ------------------------------------------------------------------------------
to getTokens(currentRecord)
set surname to ""
set yearGroup to ""
set firstname to ""
set rollGroup to ""
-- change trailing crlf to cr so Mac friendly
set currentRecord to convert to Mac (currentRecord)
-- use regular expressions to select components, could concatenate to one regular expression
set MetaString1 to re_compile "[ ]*(.*)[ ]*@.*"
set MetaString2 to re_compile ".*@(.*)#.*"
set MetaString3 to re_compile ".*#[ ]*(.*)%.*"
set MetaString4 to re_compile ".*%[ ]*(.*)"
-- save TID and ensure text item delimiter set to ' '
-- the TID enables us to extract a list of words separated by the TID
-- if the TID is ' ' then blanks are squeezed
set {TID, AppleScript's text item delimiters} to {AppleScript's text item delimiters, space}
-- extract surname
set surname to find text MetaString1 in currentRecord using "\\1"
-- chop leading trailing blanks, squeeze internal banks
set surname to (every word of matchResult of surname) as string
-- extract firstname
set firstname to find text MetaString2 in currentRecord using "\\1"
-- chop leading trailing blanks, squeeze internal banks
set firstname to (every word of matchResult of firstname) as string
-- extract group 1 information
set Group1 to find text MetaString3 in currentRecord using "\\1"
-- chop leading trailing blanks, squeeze internal banks
set Group1 to (every word of matchResult of Group1) as string
-- extract group 2 information
set Group2 to find text MetaString4 in currentRecord using "\\1"
-- chop leading trailing blanks, squeeze internal banks
set Group2 to (every word of matchResult of Group2) as string
-- revert TID as good programming practice
set AppleScript's text item delimiters to TID
-- collect items as strings into a list for convenience
return {surname, firstname, Group1, Group2}
end getTokens
Language Interpretation
As a science teacher, a long time interest of mine has been the realisation of a language parser/ evaluator/ display string generator capable of correctly interpreting scientific expressions like: v = u + a * t, including perform dimensional consistency checking and evaluation to a result for the left hand side. This is not a trivial task. Significant time and effort is expended in physics courses at high school and university in this endeavour.
Although this idea may seem obvious to a physics teacher, there are few computer languages or systems incorporating this level of sophistication. And yet it is germane to building useful interactive online learning-science systems.
The main issues are:
Building a SI database to associate the names of the types of variables with a formal syntax of unevaluated internal string representations so that regular expression pattern matching and substitution may be used to resolve the left hand side with each of the terms on the right hand side of the expression. This is similar to the first pass parsing, symbol identification in a 2 -pass computer language compiler.
The interleaving of three distinct domains of discourse (a) numerical expression evaluation (b) SI prefix interpretation and mapping to the numerical domain (c) dimensional analysis and dimensional consistency checking of variables in an expression.
Building a translator and display system ultimately based on a stack evaluator for {length, time, mass, (electrical) current}.
The script below works towards resolving the first two issues.
-- Ian Parker 1998, 2008 first model 13/2/91 (in BBC BASIC)
-- last worked on 2008-03-06
-- requires satimage OSAX for regex analysis
-- evaluates an equation of the form : SI_Dim1 = SI_Dim2 [* | / ] SI_Dim3
-- eg. <acceleration> = <velocity> / <time>
-- and returns an updated database as a recordset
-- ---------------------------------------------------------------------------------
-- construct a basic SI database for augmentation
property SI_symtab : ¬
{{title:"distance", syn:{"length"}, type:"base", formula:"m"}, ¬
{title:"time", syn:{""}, type:"base", formula:"s"}, ¬
{title:"mass", syn:{""}, type:"base", formula:"k"}, ¬
{title:"current", type:"base", formula:"A"}, ¬
{title:"velocity", syn:{"speed"}, type:"distance / time", formula:"(m)/(s)"}, ¬
{title:"acceleration", type:"velocity / time", formula:"((m)/(s))/(s)"}, ¬
-- this type is correct BUT the formula is actually wrong!
-- the system will ask the user to resolve the ambiguity
{title:"force", syn:{"thrust", "push", "pull", "weight"}, type:"mass * acceleration", formula:"((k)*(((m)/(s))/(s)))"}}
-- -----------------------------------------------------------
-- extract item quad: (<lvar token>, <rhs1 token>, <operator>, <rhs2 token>)
set workList to SI_items_of("Force = distance * Acceleration")
SI_process_rhs(workList)
set x to SI_symtab -- return the whole database as a recordset
-- result is {{title:"distance", syn:{"length"}, type:"base", formula:"m"},
-- {title:"time", syn:{""}, type:"base", formula:"s"},
--{title:"mass", syn:{""}, type:"base", formula:"k"},
--{title:"current", type:"base", formula:"A"},
--{title:"velocity", syn:{"speed"}, type:"distance / time", formula:"(m)/(s)"},
--{title:"acceleration", type:"velocity / time", formula:"((m)/(s))/(s)"},
--{title:"force", syn:{"thrust", "push", "pull", "weight"}, type:"distance * acceleration", formula:"(m)*(((m)/(s))/(s))"}}
-- ---------------------------------------------------------------------
--support routines
-- ---------------------------------------------------------------------
on SI_items_of(a_line)
local div_replace, mul_replace, theItems, x
-- lowercase to prevent issues with letter case
set a_line to lowercase a_line
-- ** uses regex based token parsing, not character by character parsing **
set expr1 to re_compile "(.*)[ ]*=[ ]*(.*)[ ]*([/*])[ ]*(.*)" without case sensitive
-- name some synonyms for '*' and '/'
set div_replace to re_compile "divided[ ]*by|over" without case sensitive
set mul_replace to re_compile "by|times|multiplied[ ]*by" without case sensitive
-- standardise the form
set a_line to change div_replace in a_line into " / " with regex
set a_line to change mul_replace in a_line into " * " with regex
-- recompose as a standard list
set theItems to matchResult of (find text expr1 in a_line using "\\1, \\2, \\3, \\4")
-- this part is tricky because of the '/'
if theItems contains "/" then
set x to theItems & " d"
set x to {word 1 of x} & {word 2 of x} & {word 4 of x} & {word 3 of x}
else if theItems contains "*" then
set x to theItems & " m"
set x to {word 1 of x} & {word 2 of x} & {word 5 of x} & {word 4 of x}
end if
return x
end SI_items_of
-- ----------------------------------------------------------
on SI_check_sym_table(entry)
global SI_symtab
local x
set x to 0
-- because small a straight linear search of symtab, symtab is a record
repeat with thisItem in SI_symtab
set x to x + 1 -- we need to know the index value
if entry is (title of thisItem) then return x
end repeat
set SI_symtab to SI_symtab & {{title:entry, syn:" ", type:" ", formula:" "}}
return number of items in SI_symtab
end SI_check_sym_table
-- -----------------------------------------------------------
on SI_process_rhs(workList)
-- create variable lists
local my_result, lhs, rhs1, op, rhs2, checked
set checkedOK to false
set {lhs, rhs1, op, rhs2} to every item of workList --explode the workList string
set templhs to SI_check_sym_table(lhs)
set {temprhs1, temprhs2} to {SI_check_sym_table(rhs1), SI_check_sym_table(rhs2)}
-- cannot proceed, symbol not found on the rhs
if (temprhs1 is number of items in SI_symtab or temprhs2 is number of items in SI_symtab) then
display alert "Error: Unknown symbol on right hand side "
return
end if
set rhs1result to (formula of item temprhs1 of SI_symtab) --
set rhs2result to formula of (item temprhs2 of SI_symtab)
if op is "m" then set op to "*"
if op is "d" then set op to "/"
set lhsresult to formula of last item of SI_symtab
-- look at new definition, consider the reverse unless we reduce before comparison
set tryformula to "(" & rhs1result & ")" & op & "(" & rhs2result & ")"
-- build entry, check formulas, then add to SI_symtab if not in agreement
set checkrhsformula to "((" & rhs1result & ")" & op & "(" & rhs2result & "))"
--do the formulas agree?
if lhsresult is checkrhsformula then
set checkedOK to true
else
if lhsresult = " " then set lhsresult to "'undefined'"
display alert "New definition for " & lhs & ":
" & checkrhsformula & " was " & lhsresult buttons {"OK", "Cancel"} default button 2
if (button returned of result) is "OK" then
set formula of item templhs of SI_symtab to tryformula
set trytype to rhs1 & " " & op & " " & rhs2
set type of item templhs of SI_symtab to trytype
end if
end if
-- lhsresult is old, checkedformula is new
return {checkedOK, lhsresult, checkrhsformula}
end SI_process_rhs
Slash it, Mash it, Mashups...
The Internet has invented the ‘mashup’. It is not new; the only thing new is the name. Many internet news providers have been at it for some time. Teachers are really familiar with printed mashups. A mashup takes content from a number of sources and brings it all together, in a new container. The look and feel might change, but the (selected) meta-content remains the same. I have to say that because usually the actual data changes. We sometimes call these collected teaching resources for a unit.
How would you begin the process of building a mashup?
The easy part is to determine where the mashup will fit into your teaching and learning programme. The second most difficult part is to design a web page. These two are familiar to many.
Then comes the most difficult part, finding reliable sources for each container on the page. You will invariably have to extract data from the source, and slice and dice it to fit your requirements. However, some service providers are now supplying web page feeds specifically for embedding in mashup containers. Google even supplies application programming interfaces (APIs) for a number of their free offerings.
As an example, you’re putting together a unit on weather, and want to have at least one page that automatically collates relevant up-to-date data and possibly graphs it for you in ways that make sense to you.
An example of a source for a mashup (that has connotations of dynamic Web data, ie. that changes periodically) is data from the Australian Bureau of Meteorology( Aust BOM).
A student could manually go to Aust BOM pages (via searching for useful information using their favourite internet search engine) and extract (or copy and paste) raw data using their favourite word processor, they then rearrange the data for personal use and hand it to their teacher, who, in the circular process, has probably done most of the work.
For a teacher, another approach is to build at least one page, using mashup technology. I’m sure that in the near future we will see applications for building a mashup, but for now...
This is a simple script that does the extracting, slicing AND the dicing of a portion of an Aust BOM page providing weather forecasts for the Illawarra and Southern Highlands of NSW. The end result of running this script is three web page files. Two of them could be embedded in another web page... a mashup. I had no idea that this would be so difficult using REGEXP! The main issue is that the weather data has several variants...
Now, if we could do the same for UV forecasts, thunderstorm forecasts, and put these all together on the front page of my T&L website... that would be useful. Remember, for a mashup, the actual data may change but the source and look and feel will remain the same.
-- Vers 1.1 -get current weather forecast from the Aust Bureau of Meteorology(BOM) -- for Illawarra and Southern Highlands, NSW -- Ian Parker, first worked on: 2008-03-23 last worked on: 2008-03-31 -- requires SATIMAGE OSAX to be installed, for REGEXP -- requires user currently connected on-line to the Internet -- web pages on current user Desktop, new forecasts at 5:15 am, 4:05 pm -- store in a fixed location on a local server as source for LAN users, for a simple mashup set myWeather1 to "ISHweather.htm" set myLoc1 to (path to desktop as string) & myWeather1 -- sliced data file from BOM site, not useful as is... set myWeather2 to "ISHweather1.htm" set myLoc2 to (path to desktop as string) & myWeather2 -- the diced and condensed version ready for mashup use set myWeather3 to "ISHweather2.htm" set myLoc3 to (path to desktop as string) & myWeather3 -- get weather data from the internet, ie. real CURRENT forecast try -- to start a session... if not possible then skip the lot! tell application "URL Access Scripting" -- weather forecast, auto replacing old files set useFile to download ¬ "http://www.bom.gov.au/products/IDN10061.shtml" to myLoc1 replacing yes end tell -- copy the (HTML) text file into string for slicing set theFile to open for access myLoc1 set the_Text to read theFile close access theFile try -- to extract useful section of text from dowloaded file using SATIMAGE OSAX set thePattern1 to re_compile "ILLAWARRA" -- start of useful section set usefulData1 to find text thePattern1 in the_Text with regexp set thePattern2 to re_compile "IDN1006102" -- end of useful section set usefulData2 to find text thePattern2 in the_Text with regexp -- slice off some begin noise, and end noise -- stop pattern being greedy set usefulString to extract string the_Text ¬ from ((matchPos of usefulData1) + 77) to ((matchPos of usefulData2) - 500) -- maintain original HTML data layout, replace the < pre>..< /pre> tags for browser -- this artificial chop is to placate the wiki set usefulString to "<P" & "RE><B>ILLAWARRA</B><BR>" & usefulString & "</" & "PRE> " as text -- save to a new HTML file (on desktop) set theFile to open for access myLoc2 with write permission write usefulString to theFile close access theFile on error display alert "Download file: No useful data" as critical close access theFile end try try -- Safari shows how it would be displayed in a large mashup container? tell application "Safari" to open myLoc2 -- --------------[ optional to extract essential forecast data] ---------------------------------- -- dice for only the data giving a string: Bowral & Wollongong try -- to dice for first Wollongong entry with Min: and Max: data set thePattern4 to re_compile "Wollongong :(.*)Min:.*([ |-][0-9]+).*Max:.*([ |-][0-9]+).*" set usefulData2 to find text thePattern4 in usefulString ¬ using "Wol:\\1 \\2 ºC to \\3 ºC" with regexp on error -- to dice for first Wollongong entry with Min: data missing set thePattern4 to re_compile "Wollongong :(.*)Max:.*([ |-][0-9]+).*" set usefulData2 to find text thePattern4 in usefulString ¬ using "Wol:\\1 to \\2 ºC" with regexp end try try -- to dice for first Bowral entry with Min: and Max: present set thePattern3 to re_compile "Bowral :(.*)Min:.*([ |-][0-9]+).*Max:.*([ |-][0-9]+).*</" set usefulData1 to find text thePattern3 in usefulString ¬ using "Bow:\\1 \\2 ºC to \\3 ºC" with regexp on error -- to dice for first Bowral entry with Min: data missing set thePattern3 to re_compile "Bowral :(.*)Max:.*([ |-][0-9]+).*</" set usefulData1 to find text thePattern3 in usefulString ¬ using "Bow:\\1 to \\3 ºC" with regexp end try -- usefulData1-> Bowral forecast string --usefulData2 ->first Wollongong forecast string set weatherForecast to "Tomorrow" -- check for 5:15 am forecast for today or assume for tomorrow if the_Text contains "515" then set weatherForecast to "Today" -- note line continuation char (opt-return)at end of line set weatherForecast to weatherForecast & "<BR>" & ¬ matchResult of usefulData1 & "<BR>" & matchResult of usefulData2 -- write the one liner to a text file with HTML formatting to 3 liner HTML try -- to delete the exiting file first to ensure file create, modify correct tell application "Finder" to delete alias myLoc3 end try set theFile1 to open for access myLoc3 with write permission write weatherForecast to theFile1 close access theFile1 tell application "Safari" to open myLoc3 -- to visually check result on error -- alert user if weather data could not be extracted display alert "Extracted data file: cannot get useful data" as critical close access theFile1 end try on error -- alert user if session could not be started display alert "Aust. BofM not responding: check connection to Internet" as critical end try
UV to the Max - updated
Each day, the Australian Bureau of Meteorology(Aust. BoM) supplies a large amount of aggregated data. Here is a script to get next day forecast of UV maxima for a selected town from the Aust BoM.
We can embed try...on error.. end try blocks, allowing the script to gracefully degrade its performance. Besides using the http:// protocol the download command can also access ftp sites, especially public sites. To compress the data and recompose it we use some date and time slicing and dicing. The “test harnass” lines show how to construct simple record structures in AppleScript. The original line for Bowral was
0019 068102 94747 Bowral 26 10 2010 UV Alert from 8.40 to 16.40 Max: 13
The result is an internally held list
{"Bowral", "8:40:00 AM", "4:40:00 PM", "UV Max:13"}
And the script to do this ...
-- Ian Parker Vers 1.1 first worked on 2008-10-10, last worked on 2010-10-25
-- extract a list of useful data from a live ftp site: BOM, Australia
global theFile
set {myLoc3, UVfileName} to {path to desktop as string, "IDYGP007.txt"}
-- the Aust. BoM page for next day UV forecasts
set UVDataFile to "ftp://ftp2.bom.gov.au/anon/gen/fwo/IDYGP007.txt"
try -- to ...
tell application "URL Access Scripting" -- then
set UVfile to download "ftp://ftp2.bom.gov.au/anon/gen/fwo/IDYGP007.txt" to ¬
myLoc3 & UVfileName replacing yes
end tell
try -- to get lines for bowral
set theFile to open for access UVfile
set UVBowral to UVstringFor("bowral") -- replace with any valid town name (case unimportant)
close theFile
on error
display alert "UV data file: " & myLoc3 & UVfileName & " missing" as critical
end try
on error
display alert "Cannot connect to Bureau of Meteorology(BOM)- Australia, check Internet connection" as critical
end try
UVBowral
-- result is {"Bowral", "8:40:00 AM", "4:40:00 PM", "UV Max:13"} -- for 2010-10-26
-- ----- [ conditional read of text file ]-------------------------------
on UVstringFor(the_loc)
set target to ""
-- search for, and slice line from the text file
repeat while (the_loc is not in target)
-- this is terminated by a newline
set target to read theFile before "
" -- slurp in a line as per readline
end repeat
set target to every word of target -- split the line on spaces
tell target to set loc to item 4
set earlyUV to time string of date (item 11 of target & "am") -- dice the time
set lateUV to time string of date (item 13 of target & "am") -- dice the time (wierd!)
tell target to set UVMax to item 15
tell target to set thedate to item 7 & "-" & item 6 & "-" & item 5
-- recompose, and return list of strings with ISO date formats
return {thedate, loc, earlyUV, lateUV, "UV Max:" & UVMax}
end UVstringFor