Preliminary Results from Weight Agnostic Training

Home » News » Preliminary Results from Weight Agnostic Training

Following on from my last post, below is a selection of the typical resultant output from the Bayesopt Library minimisation

    3    3    2    2    2    8   99   22   30    1
    3    3    2    3    2   39    9   25   25    1
    2    2    3    2    2   60   43   83   54    3
    2    1    2    2    2    2    0   90   96   43
    3    2    3    2    2    2    2   43   33    1
    2    3    2    3    2    2    0   62   98   21
    2    2    2    2    2   18   43   49    2    2
    2    3    2    4    1    2    0   23    0    0
    2    2    1    2    3    2    0   24   63   65
    3    2    2    2    3    5   92   49    1    0
    2    3    2    1    1    7   84   22   17    1
    3    2    4    1    1   46    1    0   99    7
    2    2    3    2    2    2    0   74   82   50
    3    3    2    2    2   45   14   81   23    2
    2    3    3    2    2    2    0   99   79    4
    2    2    2    2    2    2    0    0   68    0
    3    3    3    2    2   67   17   37   84    1
    3    2    3    2    2   24   39   56   55    1
    3    3    4    3    2    2   30   62   67    1
    2    2    2    2    2    2    1    0    4    0
    2    2    2    2    2    2    9    8   45    1
    2    3    3    2    2   48    1   18   28    1
    2    3    3    2    2    2    0   34   42   18
    2    2    2    3    2    2    0   70   81   10
    2    2    3    3    2    2    0   85   23   11

where the rows are separate run's results, the first five columns show the type of activation function per layer and the last five columns show the number of neurons per layer ( see function code in my last post for details. )

Some quick take aways from this are:

  1. the sigmoid activation function ( bounded 0 to 1 ) is not favoured, with either the Tanh or “Lecun” sigmoid ( see section 4.4 of this paper ) being preferred
  2. 40% of the network architectures are single hidden layer with just 2 neurons in the hidden layer 
  3. 8% have only two hidden layers with the second hidden layer having just one neuron

I would say these preliminary results suggest that a deep architecture is not necessary for the features/targets being tested and obviously the standard sigmoid/logistic function should be avoided.

As is my wont whilst waiting for lengthy computer tests to complete I have also been browsing online, motivated by the early results of the above, and discovered Random Vector Functional Link Networks, which seem to be a precursor to Extreme Learning Machines. However, there appears to be some controversy about whether or not Extreme Learning Machines are just plagiarism of earlier ideas, such as RVFL networks. 

Readers may remember that I have used ELMs before with the Bayesopt library ( see my post here ) and now that the above results point towards the use of shallow networks I intend to replicate the above, but using RVFL networks. This will be the subject of my next post.   

Leave a Reply

Your email address will not be published. Required fields are marked *

New Providers
Binolla

The Broker
More then 2 million businesses
See Top 10 Broker

gamehag

Online game
More then 2 million businesses
See Top 10 Free Online Games

New Games
Lies of P

$59.99 Standard Edition
28% Save Discounts
See Top 10 Provider Games

COCOON

$24.99 Standard Edition
28% Save Discounts
See Top 10 Provider Games

New Offers
Commission up to $1850 for active user of affiliate program By Exness

Top Points © Copyright 2023 | By Topoin.com Media LLC.
Topoin.info is a site for reviewing the best and most trusted products, bonus, offers, business service providers and companies of all time.

Discover more from Top Points

Subscribe now to keep reading and get access to the full archive.

Continue reading