Tag Archives: probability

How to estimate uncertain data

Data Estimator is a tool that helps answer questions about uncertain quantities, eg. “What will our company’s sales be next year?”

It is designed to be used as part of an interview process, where expert judgements are drawn out and quantified.

It’s a reminder that in this world of big data, some things remain hard to measure, especially when it comes to the future.

You will be asked a handful of questions, using “probability wheels” like this to visualise the uncertainties:

probability wheel

When you’re done, you will be able to see and export the resulting probability distribution. For example, the result could be “there is a 60% chance this market will more than double in five years, a 20% chance it will more than treble, but a 10% chance it will shrink.”

probability density

It will also show some alternatives for you how you could place the uncertainty in a decision tree, eg.

decision tree node

Try it out here: http://racingtadpole.com/more/estimator/

  

Curve fitting with javascript & d3

If javascript is up to amazing animations and visualizations with d3, maybe it’s up to non-linear curve fitting too, right?

Something like this, perhaps:



Here’s how I did it:

  • The hard work is done using Cobyla (“Constrained Optimization BY Linear Approximation”), which Anders Gustafsson ported to Java and Reinhard Oldenburg ported to Javascript, as per this stackoverflow post.
    Cobyla minimises a non-linear objective function subject to constraints.
  • The demo and its components (including cobyla) use the module pattern, which has the advantage of keeping the global namespace uncluttered.
  • To adapt Cobyla to this curve fitting problem, I wrote a short wrapper which is added onto the cobyla module as cobyla.nlFit(data, fitFn, start, min, max, constraints, solverParams). This function minimises the sum of squared differences (y1^2-y2^2) between the data points, (x,y1), and the fitted points, (x,y2).
  • The Weibull cumulative distribution function (CDF), inverse CDF and mean are defined in the “distribution” module. Thus distribution.weibull([2,1,5]) .inverseCdf(0.5) gives the median (50th percentile) of a Weibull distribution with shape parameter 2, scale parameter 1 and location parameter 5.
  • The chart is built with d3. I am building an open-source library of a few useful chart types, d3elements, which are added onto the d3 module as d3.elts. This one is called d3.elts.xyChart.
  • So the user interface doesn’t get jammed, I use a javascript web worker to calculate the curve fit. I was surprised how easy it was to set this up.
  • I apologise in advance that this sample code is quite complicated. If you see ways to make it simpler, please let me know.
  • Finally, this may be obvious, but I like the rigour that a tool like jshint imposes. Hence the odd comment at the top of fitdemo.js, /* global fitdemo: true, jQuery, d3, _, distribution, cobyla */

Check out the source code on bitbucket here. You can see it being used for uncertain data estimation here.

Please let me know what you think!