Using LibLinear to get logistic regression for single variable

30 views Asked by At

Hello I have a dataset of 10,000 x values between 60 and 250 and a corresponding y value of either 0 or 1.

I'm trying to work out the probability of a new x value mapping to y = 0 and y = 1. It seems a pretty classic simple logistic regression puzzle.

Plotting the values they look like this in excel: enter image description here

I've tested the data with the following website https://stats.blue/Stats_Suite/logistic_regression_calculator.html and get the following output:

enter image description here

Which looks about right and I'd like to recreate this functionality in Java.

I've found the following library https://liblinear.bwaldvogel.de/ which appears to do what I want and have written the following code to use it:

import de.bwaldvogel.liblinear.Feature;
import de.bwaldvogel.liblinear.FeatureNode;
import de.bwaldvogel.liblinear.Linear;
import de.bwaldvogel.liblinear.Model;
import de.bwaldvogel.liblinear.Parameter;
import de.bwaldvogel.liblinear.Problem;
import de.bwaldvogel.liblinear.SolverType;
import lombok.extern.slf4j.Slf4j;

import java.util.List;

@Slf4j
public class LogisticRegression {

    private final Model model;

    public LogisticRegression(List<int[]> samples) {
        this.model = Linear.train(
                buildProblem(samples),
                buildParameter()
        );
    }

    public void predict(int score) {
        final Feature[] instance = new Feature[]{new FeatureNode(1, score)};

        double[] prediction = new double[2];
        Linear.predictProbability(this.model, instance, prediction);

        // show probability for each possible outcome (0 or 1)
        log.info("Prediction for {}: ", score);
        for (int i = 0; i < 2; ++i) {
            log.info("outcome: {}, probability: {}", this.model.getLabels()[i], prediction[i]);
        }
    }

    private Problem buildProblem(List<int[]> samples) {
        final Problem problem = new Problem();
        problem.l = samples.size();
        problem.n = 1;
        problem.x = new Feature[samples.size()][1];
        problem.y = new double[samples.size()];

        for (int i = 0; i < samples.size(); i++) {
            problem.x[i] = new Feature[]{new FeatureNode(1, samples.get(i)[0])};
            problem.y[i] = samples.get(i)[1];
        }

        return problem;
    }

    private Parameter buildParameter() {
        final SolverType solver = SolverType.L2R_LR;
        final double c = 1;
        final double eps = 0.01;
        return new Parameter(solver, c, eps);
    }
}

My input samples are a list of int[], each int[] is in the format {x, y}, where x is a value between 60-250 and y is either 0 or 1.

I am then running a test which builds the model then calls the predict(score) method with all x from 60-250. I get the following output:

init f 6.841e+03 |g| 3.531e+04
iter  1 f 6.831e+03 |g| 2.693e+01 CG   2 step_size 1.00e+00
Prediction for 60: 
outcome: 0, probability: 0.4911513035729393
outcome: 1, probability: 0.5088486964270607
Prediction for 61: 
outcome: 0, probability: 0.4910038568683463
outcome: 1, probability: 0.5089961431316536
Prediction for 62: 
outcome: 0, probability: 0.4908564117288908
outcome: 1, probability: 0.5091435882711093
Prediction for 63: 
outcome: 0, probability: 0.4907089681802078
outcome: 1, probability: 0.5092910318197922
Prediction for 64: 
outcome: 0, probability: 0.49056152624793203
outcome: 1, probability: 0.509438473752068
Prediction for 65: 
outcome: 0, probability: 0.49041408595769614
outcome: 1, probability: 0.5095859140423038
Prediction for 66: 
outcome: 0, probability: 0.4902666473351322
outcome: 1, probability: 0.5097333526648677
Prediction for 67: 
outcome: 0, probability: 0.4901192104058709
outcome: 1, probability: 0.5098807895941291
Prediction for 68: 
outcome: 0, probability: 0.48997177519554197
outcome: 1, probability: 0.5100282248044581
Prediction for 69: 
outcome: 0, probability: 0.48982434172977346
outcome: 1, probability: 0.5101756582702266
Prediction for 70: 
outcome: 0, probability: 0.489676910034193
outcome: 1, probability: 0.510323089965807
Prediction for 71: 
outcome: 0, probability: 0.48952948013442615
outcome: 1, probability: 0.5104705198655739
Prediction for 72: 
outcome: 0, probability: 0.48938205205609786
outcome: 1, probability: 0.5106179479439021
Prediction for 73: 
outcome: 0, probability: 0.4892346258248314
outcome: 1, probability: 0.5107653741751685
Prediction for 74: 
outcome: 0, probability: 0.48908720146624896
outcome: 1, probability: 0.510912798533751
Prediction for 75: 
outcome: 0, probability: 0.48893977900597124
outcome: 1, probability: 0.5110602209940287
Prediction for 76: 
outcome: 0, probability: 0.488792358469618
outcome: 1, probability: 0.511207641530382
Prediction for 77: 
outcome: 0, probability: 0.4886449398828073
outcome: 1, probability: 0.5113550601171927
Prediction for 78: 
outcome: 0, probability: 0.4884975232711559
outcome: 1, probability: 0.5115024767288441
Prediction for 79: 
outcome: 0, probability: 0.4883501086602793
outcome: 1, probability: 0.5116498913397207
Prediction for 80: 
outcome: 0, probability: 0.48820269607579153
outcome: 1, probability: 0.5117973039242085
Prediction for 81: 
outcome: 0, probability: 0.48805528554330524
outcome: 1, probability: 0.5119447144566948
Prediction for 82: 
outcome: 0, probability: 0.48790787708843153
outcome: 1, probability: 0.5120921229115685
Prediction for 83: 
outcome: 0, probability: 0.4877604707367806
outcome: 1, probability: 0.5122395292632194
Prediction for 84: 
outcome: 0, probability: 0.48761306651396025
outcome: 1, probability: 0.5123869334860398
Prediction for 85: 
outcome: 0, probability: 0.48746566444557754
outcome: 1, probability: 0.5125343355544225
Prediction for 86: 
outcome: 0, probability: 0.4873182645572377
outcome: 1, probability: 0.5126817354427623
Prediction for 87: 
outcome: 0, probability: 0.48717086687454475
outcome: 1, probability: 0.5128291331254553
Prediction for 88: 
outcome: 0, probability: 0.4870234714231009
outcome: 1, probability: 0.5129765285768991
Prediction for 89: 
outcome: 0, probability: 0.4868760782285067
outcome: 1, probability: 0.5131239217714934
Prediction for 90: 
outcome: 0, probability: 0.4867286873163614
outcome: 1, probability: 0.5132713126836386
Prediction for 91: 
outcome: 0, probability: 0.48658129871226274
outcome: 1, probability: 0.5134187012877373
Prediction for 92: 
outcome: 0, probability: 0.4864339124418064
outcome: 1, probability: 0.5135660875581936
Prediction for 93: 
outcome: 0, probability: 0.48628652853058696
outcome: 1, probability: 0.513713471469413
Prediction for 94: 
outcome: 0, probability: 0.4861391470041969
outcome: 1, probability: 0.5138608529958031
Prediction for 95: 
outcome: 0, probability: 0.4859917678882276
outcome: 1, probability: 0.5140082321117724
Prediction for 96: 
outcome: 0, probability: 0.4858443912082681
outcome: 1, probability: 0.5141556087917318
Prediction for 97: 
outcome: 0, probability: 0.48569701698990625
outcome: 1, probability: 0.5143029830100938
Prediction for 98: 
outcome: 0, probability: 0.48554964525872785
outcome: 1, probability: 0.5144503547412722
Prediction for 99: 
outcome: 0, probability: 0.48540227604031727
outcome: 1, probability: 0.5145977239596827
Prediction for 100: 
outcome: 0, probability: 0.48525490936025717
outcome: 1, probability: 0.5147450906397428
Prediction for 101: 
outcome: 0, probability: 0.4851075452441279
outcome: 1, probability: 0.5148924547558721
Prediction for 102: 
outcome: 0, probability: 0.48496018371750865
outcome: 1, probability: 0.5150398162824914
Prediction for 103: 
outcome: 0, probability: 0.4848128248059764
outcome: 1, probability: 0.5151871751940236
Prediction for 104: 
outcome: 0, probability: 0.4846654685351066
outcome: 1, probability: 0.5153345314648934
Prediction for 105: 
outcome: 0, probability: 0.48451811493047264
outcome: 1, probability: 0.5154818850695273
Prediction for 106: 
outcome: 0, probability: 0.4843707640176463
outcome: 1, probability: 0.5156292359823538
Prediction for 107: 
outcome: 0, probability: 0.4842234158221972
outcome: 1, probability: 0.5157765841778028
Prediction for 108: 
outcome: 0, probability: 0.4840760703696933
outcome: 1, probability: 0.5159239296303066
Prediction for 109: 
outcome: 0, probability: 0.48392872768570044
outcome: 1, probability: 0.5160712723142995
Prediction for 110: 
outcome: 0, probability: 0.4837813877957828
outcome: 1, probability: 0.5162186122042172
Prediction for 111: 
outcome: 0, probability: 0.48363405072550236
outcome: 1, probability: 0.5163659492744976
Prediction for 112: 
outcome: 0, probability: 0.4834867165004194
outcome: 1, probability: 0.5165132834995806
Prediction for 113: 
outcome: 0, probability: 0.4833393851460919
outcome: 1, probability: 0.5166606148539081
Prediction for 114: 
outcome: 0, probability: 0.48319205668807635
outcome: 1, probability: 0.5168079433119237
Prediction for 115: 
outcome: 0, probability: 0.48304473115192653
outcome: 1, probability: 0.5169552688480734
Prediction for 116: 
outcome: 0, probability: 0.4828974085631947
outcome: 1, probability: 0.5171025914368053
Prediction for 117: 
outcome: 0, probability: 0.48275008894743104
outcome: 1, probability: 0.517249911052569
Prediction for 118: 
outcome: 0, probability: 0.48260277233018334
outcome: 1, probability: 0.5173972276698167
Prediction for 119: 
outcome: 0, probability: 0.4824554587369977
outcome: 1, probability: 0.5175445412630023
Prediction for 120: 
outcome: 0, probability: 0.482308148193418
outcome: 1, probability: 0.5176918518065821
Prediction for 121: 
outcome: 0, probability: 0.48216084072498566
outcome: 1, probability: 0.5178391592750143
Prediction for 122: 
outcome: 0, probability: 0.48201353635724065
outcome: 1, probability: 0.5179864636427594
Prediction for 123: 
outcome: 0, probability: 0.4818662351157201
outcome: 1, probability: 0.5181337648842799
Prediction for 124: 
outcome: 0, probability: 0.48171893702595925
outcome: 1, probability: 0.5182810629740408
Prediction for 125: 
outcome: 0, probability: 0.48157164211349124
outcome: 1, probability: 0.5184283578865088
Prediction for 126: 
outcome: 0, probability: 0.48142435040384685
outcome: 1, probability: 0.5185756495961531
Prediction for 127: 
outcome: 0, probability: 0.48127706192255487
outcome: 1, probability: 0.5187229380774452
Prediction for 128: 
outcome: 0, probability: 0.4811297766951415
outcome: 1, probability: 0.5188702233048585
Prediction for 129: 
outcome: 0, probability: 0.480982494747131
outcome: 1, probability: 0.5190175052528689
Prediction for 130: 
outcome: 0, probability: 0.48083521610404495
outcome: 1, probability: 0.519164783895955
Prediction for 131: 
outcome: 0, probability: 0.4806879407914031
outcome: 1, probability: 0.5193120592085969
Prediction for 132: 
outcome: 0, probability: 0.4805406688347227
outcome: 1, probability: 0.5194593311652773
Prediction for 133: 
outcome: 0, probability: 0.4803934002595186
outcome: 1, probability: 0.5196065997404814
Prediction for 134: 
outcome: 0, probability: 0.48024613509130326
outcome: 1, probability: 0.5197538649086968
Prediction for 135: 
outcome: 0, probability: 0.480098873355587
outcome: 1, probability: 0.519901126644413
Prediction for 136: 
outcome: 0, probability: 0.47995161507787754
outcome: 1, probability: 0.5200483849221225
Prediction for 137: 
outcome: 0, probability: 0.47980436028368034
outcome: 1, probability: 0.5201956397163197
Prediction for 138: 
outcome: 0, probability: 0.4796571089984983
outcome: 1, probability: 0.5203428910015018
Prediction for 139: 
outcome: 0, probability: 0.4795098612478319
outcome: 1, probability: 0.5204901387521681
Prediction for 140: 
outcome: 0, probability: 0.4793626170571795
outcome: 1, probability: 0.5206373829428206
Prediction for 141: 
outcome: 0, probability: 0.4792153764520364
outcome: 1, probability: 0.5207846235479636
Prediction for 142: 
outcome: 0, probability: 0.47906813945789595
outcome: 1, probability: 0.520931860542104
Prediction for 143: 
outcome: 0, probability: 0.4789209061002487
outcome: 1, probability: 0.5210790938997514
Prediction for 144: 
outcome: 0, probability: 0.4787736764045827
outcome: 1, probability: 0.5212263235954173
Prediction for 145: 
outcome: 0, probability: 0.47862645039638346
outcome: 1, probability: 0.5213735496036165
Prediction for 146: 
outcome: 0, probability: 0.47847922810113425
outcome: 1, probability: 0.5215207718988657
Prediction for 147: 
outcome: 0, probability: 0.47833200954431504
outcome: 1, probability: 0.5216679904556849
Prediction for 148: 
outcome: 0, probability: 0.478184794751404
outcome: 1, probability: 0.521815205248596
Prediction for 149: 
outcome: 0, probability: 0.47803758374787625
outcome: 1, probability: 0.5219624162521237
Prediction for 150: 
outcome: 0, probability: 0.47789037655920424
outcome: 1, probability: 0.5221096234407958
Prediction for 151: 
outcome: 0, probability: 0.4777431732108581
outcome: 1, probability: 0.5222568267891419
Prediction for 152: 
outcome: 0, probability: 0.47759597372830487
outcome: 1, probability: 0.5224040262716951
Prediction for 153: 
outcome: 0, probability: 0.47744877813700937
outcome: 1, probability: 0.5225512218629906
Prediction for 154: 
outcome: 0, probability: 0.47730158646243326
outcome: 1, probability: 0.5226984135375667
Prediction for 155: 
outcome: 0, probability: 0.4771543987300358
outcome: 1, probability: 0.5228456012699643
Prediction for 156: 
outcome: 0, probability: 0.4770072149652735
outcome: 1, probability: 0.5229927850347265
Prediction for 157: 
outcome: 0, probability: 0.4768600351935998
outcome: 1, probability: 0.5231399648064001
Prediction for 158: 
outcome: 0, probability: 0.4767128594404659
outcome: 1, probability: 0.5232871405595341
Prediction for 159: 
outcome: 0, probability: 0.4765656877313196
outcome: 1, probability: 0.5234343122686804
Prediction for 160: 
outcome: 0, probability: 0.4764185200916064
outcome: 1, probability: 0.5235814799083935
Prediction for 161: 
outcome: 0, probability: 0.4762713565467687
outcome: 1, probability: 0.5237286434532313
Prediction for 162: 
outcome: 0, probability: 0.4761241971222462
outcome: 1, probability: 0.5238758028777538
Prediction for 163: 
outcome: 0, probability: 0.47597704184347567
outcome: 1, probability: 0.5240229581565243
Prediction for 164: 
outcome: 0, probability: 0.475829890735891
outcome: 1, probability: 0.524170109264109
Prediction for 165: 
outcome: 0, probability: 0.47568274382492326
outcome: 1, probability: 0.5243172561750767
Prediction for 166: 
outcome: 0, probability: 0.4755356011360004
outcome: 1, probability: 0.5244643988639996
Prediction for 167: 
outcome: 0, probability: 0.47538846269454765
outcome: 1, probability: 0.5246115373054523
Prediction for 168: 
outcome: 0, probability: 0.4752413285259873
outcome: 1, probability: 0.5247586714740127
Prediction for 169: 
outcome: 0, probability: 0.4750941986557387
outcome: 1, probability: 0.5249058013442613
Prediction for 170: 
outcome: 0, probability: 0.47494707310921785
outcome: 1, probability: 0.5250529268907822
Prediction for 171: 
outcome: 0, probability: 0.4747999519118384
outcome: 1, probability: 0.5252000480881616
Prediction for 172: 
outcome: 0, probability: 0.47465283508901013
outcome: 1, probability: 0.5253471649109899
Prediction for 173: 
outcome: 0, probability: 0.47450572266614066
outcome: 1, probability: 0.5254942773338593
Prediction for 174: 
outcome: 0, probability: 0.47435861466863416
outcome: 1, probability: 0.5256413853313658
Prediction for 175: 
outcome: 0, probability: 0.4742115111218916
outcome: 1, probability: 0.5257884888781084
Prediction for 176: 
outcome: 0, probability: 0.4740644120513111
outcome: 1, probability: 0.5259355879486889
Prediction for 177: 
outcome: 0, probability: 0.4739173174822874
outcome: 1, probability: 0.5260826825177126
Prediction for 178: 
outcome: 0, probability: 0.4737702274402125
outcome: 1, probability: 0.5262297725597875
Prediction for 179: 
outcome: 0, probability: 0.473623141950475
outcome: 1, probability: 0.5263768580495249
Prediction for 180: 
outcome: 0, probability: 0.4734760610384605
outcome: 1, probability: 0.5265239389615395
Prediction for 181: 
outcome: 0, probability: 0.4733289847295511
outcome: 1, probability: 0.5266710152704489
Prediction for 182: 
outcome: 0, probability: 0.4731819130491261
outcome: 1, probability: 0.5268180869508738
Prediction for 183: 
outcome: 0, probability: 0.47303484602256146
outcome: 1, probability: 0.5269651539774385
Prediction for 184: 
outcome: 0, probability: 0.4728877836752299
outcome: 1, probability: 0.5271122163247701
Prediction for 185: 
outcome: 0, probability: 0.4727407260325007
outcome: 1, probability: 0.5272592739674993
Prediction for 186: 
outcome: 0, probability: 0.4725936731197404
outcome: 1, probability: 0.5274063268802596
Prediction for 187: 
outcome: 0, probability: 0.47244662496231143
outcome: 1, probability: 0.5275533750376886
Prediction for 188: 
outcome: 0, probability: 0.47229958158557395
outcome: 1, probability: 0.527700418414426
Prediction for 189: 
outcome: 0, probability: 0.4721525430148839
outcome: 1, probability: 0.5278474569851161
Prediction for 190: 
outcome: 0, probability: 0.4720055092755945
outcome: 1, probability: 0.5279944907244055
Prediction for 191: 
outcome: 0, probability: 0.4718584803930553
outcome: 1, probability: 0.5281415196069448
Prediction for 192: 
outcome: 0, probability: 0.4717114563926125
outcome: 1, probability: 0.5282885436073874
Prediction for 193: 
outcome: 0, probability: 0.4715644372996092
outcome: 1, probability: 0.5284355627003908
Prediction for 194: 
outcome: 0, probability: 0.4714174231393847
outcome: 1, probability: 0.5285825768606154
Prediction for 195: 
outcome: 0, probability: 0.471270413937275
outcome: 1, probability: 0.528729586062725
Prediction for 196: 
outcome: 0, probability: 0.471123409718613
outcome: 1, probability: 0.528876590281387
Prediction for 197: 
outcome: 0, probability: 0.4709764105087278
outcome: 1, probability: 0.5290235894912723
Prediction for 198: 
outcome: 0, probability: 0.470829416332945
outcome: 1, probability: 0.529170583667055
Prediction for 199: 
outcome: 0, probability: 0.470682427216587
outcome: 1, probability: 0.529317572783413
Prediction for 200: 
outcome: 0, probability: 0.4705354431849725
outcome: 1, probability: 0.5294645568150276
Prediction for 201: 
outcome: 0, probability: 0.4703884642634165
outcome: 1, probability: 0.5296115357365835
Prediction for 202: 
outcome: 0, probability: 0.47024149047723085
outcome: 1, probability: 0.5297585095227691
Prediction for 203: 
outcome: 0, probability: 0.4700945218517238
outcome: 1, probability: 0.5299054781482762
Prediction for 204: 
outcome: 0, probability: 0.46994755841219954
outcome: 1, probability: 0.5300524415878005
Prediction for 205: 
outcome: 0, probability: 0.46980060018395925
outcome: 1, probability: 0.5301993998160408
Prediction for 206: 
outcome: 0, probability: 0.4696536471923003
outcome: 1, probability: 0.5303463528076997
Prediction for 207: 
outcome: 0, probability: 0.4695066994625163
outcome: 1, probability: 0.5304933005374837
Prediction for 208: 
outcome: 0, probability: 0.46935975701989735
outcome: 1, probability: 0.5306402429801027
Prediction for 209: 
outcome: 0, probability: 0.4692128198897299
outcome: 1, probability: 0.5307871801102702
Prediction for 210: 
outcome: 0, probability: 0.46906588809729655
outcome: 1, probability: 0.5309341119027035
Prediction for 211: 
outcome: 0, probability: 0.46891896166787644
outcome: 1, probability: 0.5310810383321236
Prediction for 212: 
outcome: 0, probability: 0.46877204062674505
outcome: 1, probability: 0.531227959373255
Prediction for 213: 
outcome: 0, probability: 0.4686251249991737
outcome: 1, probability: 0.5313748750008263
Prediction for 214: 
outcome: 0, probability: 0.4684782148104304
outcome: 1, probability: 0.5315217851895696
Prediction for 215: 
outcome: 0, probability: 0.4683313100857791
outcome: 1, probability: 0.5316686899142209
Prediction for 216: 
outcome: 0, probability: 0.4681844108504804
outcome: 1, probability: 0.5318155891495195
Prediction for 217: 
outcome: 0, probability: 0.46803751712979047
outcome: 1, probability: 0.5319624828702095
Prediction for 218: 
outcome: 0, probability: 0.4678906289489623
outcome: 1, probability: 0.5321093710510377
Prediction for 219: 
outcome: 0, probability: 0.4677437463332445
outcome: 1, probability: 0.5322562536667554
Prediction for 220: 
outcome: 0, probability: 0.46759686930788213
outcome: 1, probability: 0.5324031306921179
Prediction for 221: 
outcome: 0, probability: 0.4674499978981166
outcome: 1, probability: 0.5325500021018834
Prediction for 222: 
outcome: 0, probability: 0.46730313212918484
outcome: 1, probability: 0.5326968678708152
Prediction for 223: 
outcome: 0, probability: 0.46715627202632026
outcome: 1, probability: 0.5328437279736797
Prediction for 224: 
outcome: 0, probability: 0.4670094176147524
outcome: 1, probability: 0.5329905823852477
Prediction for 225: 
outcome: 0, probability: 0.4668625689197067
outcome: 1, probability: 0.5331374310802933
Prediction for 226: 
outcome: 0, probability: 0.46671572596640487
outcome: 1, probability: 0.5332842740335951
Prediction for 227: 
outcome: 0, probability: 0.4665688887800642
outcome: 1, probability: 0.5334311112199358
Prediction for 228: 
outcome: 0, probability: 0.4664220573858984
outcome: 1, probability: 0.5335779426141016
Prediction for 229: 
outcome: 0, probability: 0.4662752318091172
outcome: 1, probability: 0.5337247681908828
Prediction for 230: 
outcome: 0, probability: 0.46612841207492595
outcome: 1, probability: 0.533871587925074
Prediction for 231: 
outcome: 0, probability: 0.4659815982085263
outcome: 1, probability: 0.5340184017914738
Prediction for 232: 
outcome: 0, probability: 0.46583479023511565
outcome: 1, probability: 0.5341652097648844
Prediction for 233: 
outcome: 0, probability: 0.46568798817988744
outcome: 1, probability: 0.5343120118201126
Prediction for 234: 
outcome: 0, probability: 0.46554119206803096
outcome: 1, probability: 0.534458807931969
Prediction for 235: 
outcome: 0, probability: 0.46539440192473136
outcome: 1, probability: 0.5346055980752686
Prediction for 236: 
outcome: 0, probability: 0.4652476177751697
outcome: 1, probability: 0.5347523822248303
Prediction for 237: 
outcome: 0, probability: 0.46510083964452303
outcome: 1, probability: 0.5348991603554769
Prediction for 238: 
outcome: 0, probability: 0.46495406755796403
outcome: 1, probability: 0.535045932442036
Prediction for 239: 
outcome: 0, probability: 0.46480730154066124
outcome: 1, probability: 0.5351926984593387
Prediction for 240: 
outcome: 0, probability: 0.46466054161777903
outcome: 1, probability: 0.535339458382221
Prediction for 241: 
outcome: 0, probability: 0.4645137878144778
outcome: 1, probability: 0.5354862121855222
Prediction for 242: 
outcome: 0, probability: 0.46436704015591324
outcome: 1, probability: 0.5356329598440868
Prediction for 243: 
outcome: 0, probability: 0.4642202986672372
outcome: 1, probability: 0.5357797013327628
Prediction for 244: 
outcome: 0, probability: 0.4640735633735971
outcome: 1, probability: 0.5359264366264029
Prediction for 245: 
outcome: 0, probability: 0.46392683430013604
outcome: 1, probability: 0.536073165699864
Prediction for 246: 
outcome: 0, probability: 0.46378011147199316
outcome: 1, probability: 0.5362198885280068
Prediction for 247: 
outcome: 0, probability: 0.46363339491430267
outcome: 1, probability: 0.5363666050856973
Prediction for 248: 
outcome: 0, probability: 0.46348668465219495
outcome: 1, probability: 0.536513315347805
Prediction for 249: 
outcome: 0, probability: 0.4633399807107959
outcome: 1, probability: 0.5366600192892041

This is not what I would expect. I was hoping for predict(60) to be ~0, and predict(250) to be ~1. All the other values should be somewhere in the middle with probability 0.5 at about ~155.

So my question is, where have I gone wrong? Am I misunderstanding the inputs/outputs? I'm not especially clear on what the parameters c and eps are doing so i've used the defaults from the example, but even changing these seems to have little effect.

I'm wondering if my dataset having identical x values with different y values (i.e. {60, 0}, {60, 0}, {60, 1} being valid samples) is to blame?

Thanks

1

There are 1 answers

2
Duane Allman On

Ok I solved it! I had to add a second feature which is always 1. I'm assuming this gives the loss function a second coefficient to calculate so we're maximising the likelihood function on a(1) + b(x), instead of just b(x)...just a guess.

import de.bwaldvogel.liblinear.Feature;
import de.bwaldvogel.liblinear.FeatureNode;
import de.bwaldvogel.liblinear.Linear;
import de.bwaldvogel.liblinear.Model;
import de.bwaldvogel.liblinear.Parameter;
import de.bwaldvogel.liblinear.Problem;
import de.bwaldvogel.liblinear.SolverType;
import lombok.extern.slf4j.Slf4j;

import java.util.List;

@Slf4j
public class LogisticRegression {

    private final Model model;

    public LogisticRegression(List<int[]> samples) {
        this.model = Linear.train(
                buildProblem(samples),
                buildParameter()
        );
    }

    public void predict(int score) {
        final Feature[] instance = new Feature[]{new FeatureNode(1, 1), new FeatureNode(2, score)};

        double[] prediction = new double[2];
        Linear.predictProbability(this.model, instance, prediction);

        // show probability for each possible outcome (0 or 1)
        log.info("Prediction for {}: {}", score, (1 / prediction[1]));
    }

    private Problem buildProblem(List<int[]> samples) {
        final Problem problem = new Problem();
        problem.l = samples.size();
        problem.n = 2;
        problem.x = new Feature[samples.size()][2];
        problem.y = new double[samples.size()];

        for (int i = 0; i < samples.size(); i++) {
            problem.x[i] = new Feature[]{new FeatureNode(1, 1), new FeatureNode(2, samples.get(i)[0])};
            problem.y[i] = samples.get(i)[1];
        }

        return problem;
    }

    private Parameter buildParameter() {
        final SolverType solver = SolverType.L2R_LR;
        final double c = 1;
        final double eps = 0.01;
        return new Parameter(solver, c, eps);
    }
}