Chris F Carroll – Page 6 – Chris F Carroll

A Supernaturalist and a Physicalist Swap Accounts of the Universe

A supernaturalist S and a physicalist P swap accounts of the existence of the world.

The supernatural account (S) of “something exists and behaves reliably enough for us to theorise about it” might go like this:
S: Something exists that is eternal and able to create a reliable universe which we can theorise about.
P: “How can that be?”
S: I don't know, it's supernatural.
P: Well that's not very satisfactory! Here's my account.

The physicalist account (P) of “something exists and behaves reliably enough for us to theorise about it” might go like this:
P: Something exists that is …[insert P's detail here]…
S: “How can that be?”
P: As a physicalist, I have a proper explanation. Look, here's my explanation, E, showing how it follows mathematically from certain equations and conservation laws.
S: “Nice. 2 questions though. (1) May I read your explanation E? and (2) does it include an explanation of how come these conservation laws and equations hold?”
P: Regarding (1) no you can't see explanation E because no-one has written it yet. And (2) it probably won't include that, we usually just accept them as a given.
S: Ok. Let's pass over the fact that no-one has written explanation E yet. When someone does write E, the account of “these conservation laws and equations hold” will be supernatural. Look, here's my account:

S2: Something exists that is eternal and able to create a reliable universe which instantiates these conservation laws you and conforms to these equations you mentioned.
P: “How can that be?”
S2: I don't know, it's supernatural.
P: Well that's not very satisfactory! Here's my account.

P2: Something exists and it instantiates these conservation laws and conforms to these equations we mentioned.
S: “How can that be?”
P2: It just does. It's a brute fact. That's the theory.
S: What makes this account physicalist rather than supernatural?
P2: Because I don't invoke anything supernatural.
S: What's a brute fact if not something supernatural? You explicitly use it to mean those phenomena for which you offer no physical account. Otherwise we wouldn't be calling it brute fact. So it's super-physical. Or, as we say in everyday English, supernatural.
P2: That is not what we usually mean by supernatural.
S: Hmmm. I thought “not explained by physical laws” was what we meant by supernatural?
P2: You can't expect the physical laws themselves to be explained in terms of physical laws.
S: You're right, I don't expect that. What I expect is, to recognise that saying physical laws can't be explained in terms of physical laws means they are inexplicable (in physicalist terms at least), and hence anything that a moment ago you explained in terms of them is, also, inexplicable in physicalist terms. There is no physicalist account of the world we live in, only supernatural ones.

Pretending that SQLite has Stored Procedures and Functions

SQLite is marvellous. The fact that it doesn't have SQL syntax for stored procs and functions is usually not a handicap because it has an interface for the consuming application to register functions, which means you get to write your functions in your preferred programming language. Win all round.

But sometimes you do wish you could do it in SQL.

The lack of Stored Procedures is usually ok—you can just use scripts. Variables are easy to do: create a one-row temporary table and call it args or var. Working around the lack of Functions seems harder but in fact, you can program functions with Views. You can use CTEs in a View definition, so you can build up complex calculations. And CTEs allow recursion so you have Turing completeness.

As an example, the Exponent function as a View:

Drop Table if Exists Args ; Create Table Args as Select 5.5 as Base, 4 as Exponent ;

Drop View If Exists Power;
Create View Power As
    WITH RECURSIVE pow(exponent, exponent_remainder, base, result) as (
        --FIRST EXPRESSION
        SELECT exponent, exponent-1 , base, base
        FROM Args

        union all
        --SECOND EXPRESSION
        select Args.exponent, pow.exponent_remainder -1, pow.base, pow.result * pow.base
        from Args
        join pow on Args.exponent = pow.exponent
        where pow.exponent_remainder >= 0
    )
    select pow.result
    from pow
    where pow.exponent_remainder = 0;

and now you ‘call the function’ with:

Update Args set Base=2.5, Exponent=5; Select Result from Power;

The elements of the workaround are:

A one-row table for function arguments
A view which can refer to the arguments table and do the calculation. Since you can use CTEs to do recursion, you could in principle programming anything this way.

In similar style, here's an Exponential function which lets you specify how many significant digits you want the result to, default to about 7 digits of accuracy. This time we call the Args (X,Y,Z,p4,p5,…)

Drop Table if Exists Args ;
Create Table Args as Select 1 as X, 2 as Y, 3 as Z, 4 as p4, 5 as p5, 6 as  p6;

Drop View If Exists Exp;
Create View Exp As
    WITH RECURSIVE exp1(X, N, term, approx, accuracy ) as (
        --FIRST EXPRESSION
        SELECT X, 1, X, 1+X, Max(Min(Y, 1),0.00000000000000001)   FROM Args

        Union All
        --SECOND EXPRESSION
        Select X, N + 1, term * X / (N + 1), approx + term * X / (N + 1), accuracy
        From exp1
        Where  term / approx > accuracy Or N <3
    )
    Select approx as Result From exp1 Order By N Desc Limit 1;

And then:

Update Args Set X=22.0, Y=0.00000000000001;
Select * from Exp;
#
# > 3584912846.1315813 # Exp(22) correct to 14 digits

Installing and using SQLite extensions on macOs (and maybe Windows & Linux too)

Installing and using SQLite extensions on macOs

SQLite is brilliant and … lite. Deliberately. Even for maths functions the view is “it's really, really easy to add extensions and we don't want to bloat the core.”

This is fine if you are used to C development. This page is for if you aren't. The first section is specific to macOs, which is the “hardest” case. Linux and Windows are easier and you can skip the first section.

1. For macOs: Be able to load SQLite extensions

Install SQLite from homebrew, because the apple-shipped SQLite will probably not allow you to load extensions. If you try to load an extension, you will just get a "not authorized" error.
```
brew install sqlite
```
Note that homebrew tells you that it has not put sqlite in the path because the apple-shipped version is in the path. Fix this either by editing your profile file to extend the path, or else by adding a link to the updated sqlite3 in /usr/local/bin:
```
ln -s /usr/local/opt/sqlite/bin/sqlite3 /usr/local/bin/
```

Now if you start sqlite3 you should see a new improved version number:

SQLite version 3.33.0 2020-08-14 13:23:32
Enter ".help" for usage hints.

2. Download a Loadable Module

We'll take Spatialite as a great example. Get the .dylib file (macOs) or .dll (for Windows or Linux) or .so file (for Linux) for your extension and confirm it is somewhere you can find it. For Windows the homepage has links to 7z archive file containing the loadable module and sqlite.exe too. For macOs:
```
brew install libspatialite
# ... lots of homebrew output ...

ls /usr/local/lib/*spatialite*
# /usr/local/lib/libspatialite.dylib
# /usr/local/lib/mod_spatialite.dylib
# /usr/local/lib/libspatialite.7.dylib
# /usr/local/lib/mod_spatialite.7.dylib
```
Add the directory /usr/local/lib to your LD_PATH if it isn't already there. (The alternative to this step is, in the next step, to use the absolute path to load the module.)
- echo $LD_PATH #Check if you already have it
- export LD_PATH="$LD_PATH:/usr/local/lib"
- Edit your profile file to make the change repeatable. For instance:
- zsh : echo 'export LD_PATH="$LD_PATH:/usr/local/lib"' >> ~/.zshrc
- bash: echo 'export LD_PATH="$LD_PATH:/usr/local/lib"' >> ~/.bash_profile
- fish: echo 'set -x LD_PATH "$LD_PATH:/usr/local/lib"' >> ~/.config/fish/config.fish
Start sqlite again and now load the module. There are two ways to do it, either should work:
- Either with .load :
  .load mod_spatialite.dylib #*if you set LD_PATH above*
  .load /full/path/to/the/file/mod_spatialite.dylib # *if you didn't*
- Or with Select:
  Select load_extension('mod_spatialite');
Either way you should now be able to select the spatialite version number:
```
select spatialite_version() ;
# 4.3.0a
```

3. Other Extensions

Spatialite was the easy example because there are pre-compiled binaries available for all platforms. Other extensions mostly exist as .c files. But good news! Many of them are single files and easy to compile and install.

Download some extensions, usually as a single .c file
For instance, look at https://www.sqlite.org/contrib and notice extension-functions.c at the bottom of the page. Let's install this, which has common maths, string and stats functions functions such as power(), exp(), reverse(), stdev() :

Having downloaded the .c file, compile it in your download directory.

For macOs:

gcc -g -fPIC -dynamiclib extension-functions.c -o extension-functions.dylib

For Windows, use one of:

gcc -g -shared YourCode.c -o YourCode.dll
cl YourCode.c -link -dll -out:YourCode.dll

For linux/*nix:

gcc -g -fPIC -shared YourCode.c -o YourCode.so

Copy it to your lib directory and use it. e.g. for macOs:

mv extension-functions.dylib /usr/local/lib/
sqlite3
> .load extension-functions.dylib
Select sqrt(10), log(exp(2)) ;
# sqrt(10)          log( exp(2) )
# ----------------  -------------
# 3.16227766016838  2.0

Even more extensions

There are more extensions in the SQLite repository which you can download and install from the .c file in the same way. https://sqlite.org/src/file/ext/misc/ includes files for json & csv formatting, regex, uuids, and other requirements.

If you have a not-latest version of SQLite installed, you may need the advice on Forums - How do I compile the newest .... I ended up with:

for f in *.c
  set b (basename -s .c $f)
  gcc -g -fPIC  -DSQLITE_INNOCUOUS=0 -DSQLITE_VTAB_INNOCUOUS=0  -DSQLITE_VTAB_DIRECTONLY=0  -DSQLITE_SUBTYPE=0 -dynamiclib $f -o $b.dylib
  rm -r $b.dylib.dSYM
end

4. Making it permanent-ish

You can use the file ~/.sqliterc to permanently include your loaded functions. Here's mine:

.headers ON
.mode column
.load extension-functions.dylib

For the same functionality on another machine, you must replicate these steps. The sqlite developers' solution would be, compile your own distribution of sqlite with all the bits you want.

References

julifos.wordpress.com - mac-sqlite3-command-line-one-liner-load-extension
https://www.sqlite.org/loadext.html#build
https://sqlite.org/src/file/ext/misc/
https://www.sqlite.org/loadext.html
Forums - How do I compile the newest ...
medium.com/@carusot42/installing-and-loading-spatialite-on-macos-28bf677f0436

Error-freeness per kilowatt-hour : a Proposed Metric for Machine Learning

Abstract

Accuracy on well-known problems is widely used as a measure of the state of the art in machine learning. Accuracy is a good metric for algorithms in a world where energy has negligible cost. We do not live in such a world.

I propose an alternative metric, error-freeness per kilowatt-hour, which improves on accuracy by trading-off accuracy against energy efficiency in a useful way. It has the desirable properties of approximate linearity in the relevant ranges (1 error per 1000 is 10 times better than 1 error per 100) and of weighting energy use in a way that accounts for cost of training as a realistic fraction of a delivered service. Error-freeness per kWh is calculated as e = 1/(1 + g - Accuracy)/(h + (training time in hours * (GPU+CPU Wattage)/1000)). The granularity g is the point of diminishing returns for improving accuracy. The overhead h is the energy cost of delivering a software service with no ML training. The parameters may be tuned to circumstance, but as a general-purpose metric, I propose g=1/100,000 and h=100kWh are good human scale, commercially relevant parameter values.

Detail

Where training is very expensive, it is not helpful to score machine learning algorithms on accuracy alone, with no account taken of the resources consumed to train to the level reported. In a competitive setting, it biases to the richest player; in the global, or society-wide, or customer-focussed setting, it ignores a real cost. This leads at best to sub-optimal choices and at worst to a growing harm.

I suggest that a useful, general purpose, metric has the following
characteristics:

For gross errors, it is linear in the error rate. Halving the error doubles
the score.
For very small errors, improving the error rate even to perfection adds only incremental value. Perfection is only notionally better than an error rate so small that one error during the application's lifetime is unlikely.
For extremely large energy consumption, the financial cost of the delivered service becomes proportional to the energy cost. We are concerned that the total human cost of energy use is very much dis-proportionate, in that emissions from increased energy consumption is an existential threat to the human race. For much less extreme energy consumption we might nonetheless accept the linear financial cost of energy as a proxy for the real cost.
For small energy consumption, the energy cost of training becomes an insignificant fraction of the whole cost of delivering a software service. The parameter h represents the energy cost of a service that uses no training.

Error-freeness per kWh can now be formulated as:

e = 
    1 / (1 + g - Accuracy)
      / (h + (training time in hours * GPU+CPU training wattage)/1000)

Setting the Parameters g and h

The granularity g

For general purpose human scale and commercial purposes I suggest a granularity of g=1/100,000 is a level at which halving the error rate grants only incremental extra value. It is about the level at which human perception of error takes real effort. Consider a 1m x 10m jigsaw of 100x1,000 pieces in which 1 piece is missing. An observer standing back to see the entire 1m x 10m work will not see the error. They would have to spend effort searching the 10 meter length of jigsaw.

Changing g by an order of magnitude either way makes little different to scores, until accuracy approaches 99.999%. So an alternative way to think about g is:

“If you can tell the difference between accuracy of 99% versus 99.9%, but cannot tell the difference between accuracy of 99.99% and 99.999%, then your granularity g is smaller than 1/1,000 but not smaller than 1/100,000.”

The software overhead energy cost, h

We estimate the energy cost of an algorithm-centric software service as follows:

A typical single-core of cloud compute requires 135W ¹ , for an energy cost of 1.0 MWh per year per server.
The software parts of a service in a fast moving sector (and, “being a candidate for using ML” currently all but defines fast-moving sectors) have a typical lifespan of about 1 year. (The whole service may last longer, but as with the ship of Theseus, the parts do not).
A typical size of service that uses a single algorithm is 4 cores plus 4 more for development and test. (Larger services will use more algorithms. We want the cost of a service of a size that uses only one algorithm).
8 such cores running 24/7 for 1 year is 8MWh.

We should set h to some fraction of 8MWh. There is little gain in attempting a more accurate baseline for general-purpose use. See the supplementary discussion below. We set that fraction based on 2 considerations.

Those 8 cores are often shared by other services, both in cloud-compute and self-hosting deployments. The large majority of the world's systems—anything outside the global top 10,000 websites—have minimal overnight traffic; office-hours is more realistic. Anything from 1% to 99% of a CPU-year might be a realistic percentage, the lower figure for virtual cloud-computing and the highest for dedicated hardware.

It is pragmatic to measure training train as the time for a single training run, rather than imagining developers keep careful record of every full or partial training run during development. We can more properly account for the total energy cost of all training time by dividing 8MWh by a typical number of training runs. If the final net takes 100 hours to train, it may have taken 10 or 10,000 training runs to settle on that net, in development, hyper-parameter tuning, multiple runs for statistical analysis, comparison with alternatives and so on. For a researcher, 1000 training runs may be too little, where for a commercial team doing only hyper-parameter training and testing, 50 runs might be more than enough. It is the widespread commercial usage that concerns our metric.

Combining the shared CPU usage with a typical number of training runs, one might argue for any fraction of 8MWh as typical, from 1/20th to 1/1000th. I propose a broad-brush rule of thumb setting h= 1/80th of 8MWh, or 100kWh. For large projects, it is simple to set g and h to values based on an actual business case and costs.

Proposed general purpose parameter values

This gives us standard parameters for error-freeness per kWh of

g=1/100,000
h=100kWh

Examples

A net is trained for 100 hours on a grid of ten 135W servers, each with a 400W GPU (i.e. 0.535 kW per server), and achieves an accuracy of 99%:

e = 1/(1+ g - 0.99)/(h + 100*10*.535) = 0.16.
It reaches accuracy=99.5% by quadrupling the number of servers: e =0.09.
A different algorithm for the same task achieves 99.4% on the original 10 servers: e=0.26.

On a different task, a net required only 10 hours on just a single server and GPU to reach 99% accuracy:

e = 1/(1+ g - 0.99)/(h + 10 * .535) = 0.95.
It reaches accuracy=99.5% by quadrupling training time to 40hours. e=1.64.
A different algorithm achieves 99.4% in the original 10 hours. e=1.58.

In the first case, training costs ½ a megawatt-hour per run (around £4,000 for 40 training runs at UK 2020 energy prices) and energy cost is well-reflected in the score. We may consider the doubling of accuracy not worth the quadrupling of cost. Where the training cost is a small fraction (around £40 for 40 training runs) of the cost of a delivered service, even a small gain in accuracy outweighs a quadrupling of energy cost.

Conclusion

When you measure people's performance, “what you measure is what you get”. People who are striving for excellence will measure their success by the measure you use. By promoting a metric that takes explicit account of energy usage, we create a culture of caring about energy usage.

The question this metric aims to answer is, “Given algorithms and training times that can achieve differing accuracy levels for different energy usage, which ought we to choose?” The point is to focus our attention on this question, in preference to letting us linger on the increasingly counter-productive question “what accuracy score can I reach if I ignore resource costs.”

Because the parameters are calibrated for real world general purposes, this metric represents a useful insight into the value vs energy cost of deploying one algorithm versus another.

Supplementary Discussion [Work In Progress] – the energy cost of a software service

To ask for the energy cost of a deployed software service is like asking for the length of a piece of string. In the absence of a survey of systems using ML, the calculation given is anecdotal on 3 points: How big a service does a typical single ML algorithm serve; what is the lifespan of such a service; for what fraction of that lifespan is the service consuming power?

In 1968, typical software application lifespan was estimated at 6-7 years ², but a single service is a fraction of such an application, and the churn of software services has increased with the ease of the development and replacement. I propose 1 year or less is a realistic lifespan for an algorithmic service in a competitive commercial environment.

The figure of 4 cores for a service arises from considering that although the deployed algorithm may only use a single core (or one low-power GPU), a service is typically deployed as part of an application with a user interface and some persistence mechanism. A whole service might then use 2 cores (for a monolithic deployment with redundancy) or 6 or more (for a multi-tier service with redundancy). Anything beyond that is likely already looking at parts of a larger application, unconnected to the work of machine learning. We can reasonably set the boundary for “that part of the system which we are only shipping because we have an algorithm to power it” at no bigger than that.

The figure of 135W for a single socket server might, in the context of efficiency-driven cloud computing, be discounted even 99% or more for low-usage services sharing hardware and consuming zero energy when not in use. Setting h=1/80 rather than, say h=1/500, probably represents very heavy usage.

Author: Chris F Carroll