Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Add Randomized SVD in PCA] #298

Closed
wants to merge 0 commits into from
Closed

Conversation

tarantula-leo
Copy link
Contributor

What problem does this PR solve?

使用 SPU 优化 PCA 算法
Issue Number: Fixed #259

Possible side effects?

  • Performance:
  1. 收敛速度更快(体现在能支持更大的特征维度)
  2. 不需要显示的计算原数据集的协方差矩阵
  • Backward compatibility:

@@ -59,17 +61,17 @@ def proc_reconstruct(X):
)
emulator.up()
# Create a simple dataset
X = random.normal(random.PRNGKey(0), (15, 100))
X = random.normal(random.PRNGKey(0), (1000, 10))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

请 emulator run 之前请先 seal 要处理的数据


class SimplePCA:
def __init__(
self,
method: str,
n_components: int,
max_iter: int = 100,
n_iter: int = 4,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里有两个iter数的超参,感觉需要从语意上明确一下,如:
max_iter -> power_iter
n_iter -> projection_iter


class SimplePCA:
def __init__(
self,
method: str,
n_components: int,
max_iter: int = 100,
n_iter: int = 4,
random_state: int = 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

random_state不需要作为参数

"""
assert len(X.shape) == 2, f"Expected X to be 2 dimensional array, got {X.shape}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个shape check应该是都需要的吧,没必要移到power method里面

# Remove the component from the covariance matrix
cov_matrix -= eigval * jnp.outer(vec, vec)
elif self._method == Method.PCA_rsvd:
random_state = np.random.RandomState(self._random_state)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

随机矩阵不能在函数里生成,可以考虑在fit里增加一个入参

@@ -56,71 +72,105 @@ def __init__(
self._mean = None
self._components = None
self._variances = None
self._n_iter = n_iter # used in rsvd
self._random_state = random_state # used in rsvd
self._scale = scale # used in rsvd
self._method = Method(method)

def fit(self, X):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

随机矩阵考虑增加到入参里

@deadlywing
Copy link
Contributor

另外,有几个点麻烦您注意一下哈:

  1. 我们最近重构了sml的目录结构,所以需要您麻烦把文件重新归位一下
  2. extmath.py的那个pr里忘记更新utils文件夹下的bazel文件了,麻烦您也一起更新一下吧
  3. 麻烦修改一下emul和test文件,建议保留之前开发者的测试函数,您可以重新定义一个测试函数来测试您的功能哈

感谢~

@github-actions github-actions bot locked and limited conversation to collaborators Aug 11, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

使用 SPU 优化 PCA 算法
3 participants