STLSQ method results in different models from pySINDy? #475

xk-y · 2023-05-11T23:01:40Z

Hi! I was starting to use STLSQ method to create a SINDy-based model. I try the example here, but remove the noise in the reference data. When I compare the model parameters, I found the STLSQ method in Julia and pySINDy gives different results. I tried to keep all hyperparameters the same. Do you have any idea why it happens? I suppose pySINDy and DataDrivenDiffEq.jl should give the same result as the STLSQ algorithm are the same.

Julia code:

using DataDrivenDiffEq
using LinearAlgebra
using OrdinaryDiffEq
using DataDrivenSparse


function pendulum(u, p, t)
    x = u[2]
    y = -9.81sin(u[1]) - 0.3u[2]^3 - 3.0 * cos(u[1]) - 10.0 * exp(-((t - 5.0) / 5.0)^2)
    return [x; y]
end

u0 = [0.99π; -1.0]
tspan = (0.0, 15.0)
prob = ODEProblem(pendulum, u0, tspan)
sol = solve(prob, Tsit5(), saveat = 0.01);

X = sol[:, :] #+ 0.2 .* randn(rng, size(sol));
ts = sol.t;

using JLD2
JLD2.jldsave("data.jld";X,ts) ### save the data for Python usage


function create_prob(X, ts,i)
    (X = X, t = ts )
end

probnames = Tuple(Symbol.(["prob$i" for i in 1:1]));
probdata = Tuple([create_prob(X, ts,i) for i in 1:1]);
probtuple = NamedTuple{probnames}(probdata);
prob = DataDrivenDiffEq.ContinuousDataset(probtuple);


#prob = ContinuousDataDrivenProblem(X, ts)
    

@parameters  t 
@variables u(t)[1:2] 

h = Num[polynomial_basis(u, 5);]

basis = Basis(h, u);

λs = (1e-5)
opt = STLSQ(λs)
res = solve(prob, basis, opt)

system = get_basis(res)
params = get_parameter_map(system)
println(system) 
println(params)

The model I got from Julia is:

Model ##Basis#465 with 2 equations
States : (u(t))[1] (u(t))[2]
Parameters : 42
Independent variable: t
Equations
Differential(t)((u(t))[1]) = p₁ + p₁₂*((u(t))[2]^2) + p₁₆*((u(t))[2]^3) + p₁₉*((u(t))[2]^4) + p₂*(u(t))[1] + p₂₁*((u(t))[2]^5) + p₃*((u(t))[1]^2) + p₄*((u(t))[1]^3) + p₅*((u(t))[1]^4) + p₆*((u(t))[1]^5) + p₇*(u(t))[2] + p₁₃*((u(t))[2]^2)*(u(t))[1] + p₁₀*((u(t))[1]^3)*(u(t))[2] + p₁₄*((u(t))[1]^2)*((u(t))[2]^2) + p₁₅*((u(t))[1]^3)*((u(t))[2]^2) + p₁₈*((u(t))[1]^2)*((u(t))[2]^3) + p₁₇*((u(t))[2]^3)*(u(t))[1] + p₉*((u(t))[1]^2)*(u(t))[2] + p₁₁*((u(t))[1]^4)*(u(t))[2] + p₂₀*((u(t))[2]^4)*(u(t))[1] + p₈*(u(t))[1]*(u(t))[2]
Differential(t)((u(t))[2]) = p₂₂ + p₂₃*(u(t))[1] + p₂₄*((u(t))[1]^2) + p₂₅*((u(t))[1]^3) + p₂₆*((u(t))[1]^4) + p₂₇*((u(t))[1]^5) + p₂₈*(u(t))[2] + p₃₃*((u(t))[2]^2) + p₃₇*((u(t))[2]^3) + p₄₀*((u(t))[2]^4) + p₄₂*((u(t))[2]^5) + p₂₉*(u(t))[1]*(u(t))[2] + p₃₅*((u(t))[1]^2)*((u(t))[2]^2) + p₃₉*((u(t))[1]^2)*((u(t))[2]^3) + p₃₀*((u(t))[1]^2)*(u(t))[2] + p₃₁*((u(t))[1]^3)*(u(t))[2] + p₃₂*((u(t))[1]^4)*(u(t))[2] + p₃₆*((u(t))[1]^3)*((u(t))[2]^2) + p₃₄*((u(t))[2]^2)*(u(t))[1] + p₃₈*((u(t))[2]^3)*(u(t))[1] + p₄₁*((u(t))[2]^4)*(u(t))[1]

Pair{SymbolicUtils.BasicSymbolic{Real}, Float64}[p₁ => 0.0730131366, p₂ => 0.0834717146, p₃ => 0.0149782394, p₄ => -0.0059705709, p₅ => -0.0017038288, p₆ => -0.0001075822, p₇ => 1.0212420262, p₈ => 0.0228230317, p₉ => 0.0040984396, p₁₀ => 4.16304e-5, p₁₁ => -2.02122e-5, p₁₂ => 0.0037135607, p₁₃ => 0.0034812525, p₁₄ => 0.0006379738, p₁₅ => 1.52776e-5, p₁₆ => 0.0026753519, p₁₇ => 0.0005161771, p₁₈ => 5.01583e-5, p₁₉ => 0.0001611863, p₂₀ => 6.49419e-5, p₂₁ => -4.48301e-5, p₂₂ => -14.5630872419, p₂₃ => -16.616734041, p₂₄ => -2.9673310169, p₂₅ => 1.1849735762, p₂₆ => 0.3376741619, p₂₇ => 0.0213132928, p₂₈ => -3.9165878933, p₂₉ => -4.3162794295, p₃₀ => -0.819936348, p₃₁ => -0.0193343773, p₃₂ => 0.0031036009, p₃₃ => -0.551816404, p₃₄ => -0.5307240406, p₃₅ => -0.1191590027, p₃₆ => -0.0044727192, p₃₇ => -0.5195042192, p₃₈ => -0.0723909027, p₃₉ => -0.0066535181, p₄₀ => -0.0399888287, p₄₁ => -0.0133952208, p₄₂ => 0.0077426642]

Python code:

import numpy as np
import pysindy as ps
import h5py

X = d["X"][:]
times = d['ts'][:]
dt = 0.01

optimizer = ps.STLSQ(threshold=1e-5)
feature_library = ps.PolynomialLibrary(degree=5)
model = ps.SINDy(feature_names=["x", "y"],optimizer=optimizer,feature_library=feature_library)
model.fit([X], t=dt, multiple_trajectories = True)
model.print(precision=10)

The model I got from Python is:

(x)' = -0.0007809696 1 + -0.0008913440 x + 0.9991483762 y + -0.0001892916 x^2 + -0.0008836917 x y + -0.0002849399 y^2 + -0.0000113477 x^3 + -0.0002544056 x^2 y + -0.0001793808 x y^2 + -0.0000349313 y^3 + -0.0000207300 x^3 y + -0.0000688229 x^2 y^2 + 0.0000031815 x y^3 + -0.0000064892 x^3 y^2
(y)' = -14.5342725185 1 + -16.5809246400 x + -3.9699650420 y + -2.9592877785 x^2 + -4.3170609722 x y + -0.5520409493 y^2 + 1.1846939143 x^3 + -0.8008512434 x^2 y + -0.5160813109 x y^2 + -0.5070188768 y^3 + 0.3374568507 x^4 + -0.0137858781 x^3 y + -0.1128667395 x^2 y^2 + -0.0672355163 x y^3 + -0.0364527735 y^4 + 0.0212981524 x^5 + 0.0035045963 x^4 y + -0.0038914762 x^3 y^2 + -0.0060928204 x^2 y^3 + -0.0128738435 x y^4 + 0.0081038320 y^5

I could find the model parameters are not the same.

The text was updated successfully, but these errors were encountered:

ChrisRackauckas · 2023-05-28T15:59:36Z

I'm not sure there's an expectation for it to be exactly the same since IIRC STLSQ is quite numerically non-robust to floating point differences and so small difference in the factorization algorithm or convex optimization would lead to different rejections. I'm curious where the expectation is derived from.

ctessum · 2023-06-08T01:32:03Z

We're just trying to understand how it works. Would a reasonable summary be that if if there's a single system of equations that exactly describes the data (like if we're trying to recover the lotka-volterra system or something like that), then both implementations should return the same result, but if there are multiple plausible explanations then we'd expect the two implementations to return different results?

ctessum · 2023-06-08T01:33:43Z

The next question would be if we want to a single implementation to give us all of the different plausible explanations, how would we do that? Just run it multiple times with different noise added every time?

ChrisRackauckas · 2023-08-09T02:38:32Z

This should be re-written to use a convex optimizer. That would then be more robust. I talked with @Vaibhavdixit02 about doing this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

STLSQ method results in different models from pySINDy? #475

STLSQ method results in different models from pySINDy? #475

xk-y commented May 11, 2023 •

edited

Loading

ChrisRackauckas commented May 28, 2023

ctessum commented Jun 8, 2023

ctessum commented Jun 8, 2023

ChrisRackauckas commented Aug 9, 2023

STLSQ method results in different models from pySINDy? #475

STLSQ method results in different models from pySINDy? #475

Comments

xk-y commented May 11, 2023 • edited Loading

ChrisRackauckas commented May 28, 2023

ctessum commented Jun 8, 2023

ctessum commented Jun 8, 2023

ChrisRackauckas commented Aug 9, 2023

xk-y commented May 11, 2023 •

edited

Loading