Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

None of a way to read the input file #12

Closed
Roger-GOAT opened this issue Nov 11, 2022 · 2 comments
Closed

None of a way to read the input file #12

Roger-GOAT opened this issue Nov 11, 2022 · 2 comments

Comments

@Roger-GOAT
Copy link

Hi, thanks for the software. However, I can't load the input file. BTY, I don't have a GPU.
I first try RData. I save one seurat data as RData like this:

sce <- readRDS("/Documents/sce_from_pyharmony.rds")
save.image("
/Documents/sce.RData")

python scGNN_v2.py --load_rdata ~/Documents/sce.RData     --output_dir ~/outputs/     --total_epoch 31 --feature_AE_epoch 500 300     --output_intermediate
2022-11-11 12:56:24,757 -
> Loading Packages
2022-11-11 12:56:27,381 - Using device: cpu
2022-11-11 12:56:27,381 - Namespace(alpha=0.5, ari_threshold=0.95, cluster_AE_batch_size=12800, cluster_AE_dropout_prob=0, cluster_AE_epoch=200, cluster_AE_learning_rate=0.001, cluster_AE_regu_strength=0.9, clustering_embed='graph', clustering_louvain_only=False, clustering_method='KMeans', clustering_use_flexible_k=False, deconv_opt1_epoch=5000, deconv_opt1_epsilon=0.0001, deconv_opt1_learning_rate=0.001, deconv_opt1_regu_strength=0.01, deconv_opt2_epoch=500, deconv_opt2_epsilon=0.0001, deconv_opt2_learning_rate=0.1, deconv_opt2_regu_strength=0.01, deconv_opt3_epoch=150, deconv_opt3_epsilon=0.0001, deconv_opt3_learning_rate=0.1, deconv_opt3_regu_strength_1=0.8, deconv_opt3_regu_strength_2=0.01, deconv_opt3_regu_strength_3=1, deconv_tune_epoch=20, deconv_tune_epsilon=0.0001, deconv_tune_learning_rate=0.01, dropout_prob=0.1, feature_AE_batch_size=12800, feature_AE_concat_prev_embed=None, feature_AE_dropout_prob=0, feature_AE_epoch=[500, 300], feature_AE_learning_rate=0.001, feature_AE_regu_strength=0.9, gat_hid_embed=64, gat_multi_heads=2, given_cell_type_labels=False, graph_AE_GAT_dropout=0, graph_AE_concat_prev_embed=False, graph_AE_embedding_size=16, graph_AE_epoch=200, graph_AE_graph_construction='v2', graph_AE_learning_rate=0.01, graph_AE_neighborhood_factor=0.05, graph_AE_normalize_embed=None, graph_AE_retain_weights=False, graph_AE_use_GAT=False, graph_change_threshold=0.01, load_LTMG=None, load_bulk_dataset='', load_cell_type_labels='', load_dataset_dir='/fs/ess/PCON0022/Edison/datasets', load_dataset_name='12.Klein', load_from_10X=None, load_rdata='/home/dengzhen/Documents/sce.RData', load_sc_dataset='', load_seurat_object=None, load_use_benchmark=False, output_dir='/home/dengzhen/outputs/', output_intermediate=True, output_preprocessed=False, output_rdata=False, output_run_ID=None, preprocess_cell_cutoff=0.9, preprocess_gene_cutoff=0.9, preprocess_top_gene_select=2000, run_LTMG=False, seed=1, total_epoch=31, use_CCC=False, use_bulk=False)
2022-11-11 12:56:27,381 -
> Loading data ...
2022-11-11 12:56:27,381 - --------> Loading from rdata ...
Traceback (most recent call last):
  File "scGNN_v2.py", line 212, in <module>
    X_sc_raw = load.sc_handler(args)
  File "/home/dengzhen/scGNN2.0/load.py", line 31, in sc_handler
    return load_rdata(
  File "/home/dengzhen/scGNN2.0/load.py", line 50, in load_rdata
    rdata = pyreadr.read_r(rdata_path)
  File "/home/dengzhen/miniconda3/envs/scgnnEnv/lib/python3.8/site-packages/pyreadr/pyreadr.py", line 66, in read_r
    parser.parse(filename_bytes)
  File "pyreadr/librdata.pyx", line 148, in pyreadr.librdata.Parser.parse
  File "pyreadr/librdata.pyx", line 177, in pyreadr.librdata.Parser.parse
pyreadr.custom_errors.LibrdataError: The file contains an unrecognized object

Secondly, I try counts.csv.
write.table(as.matrix(GetAssayData(object = sce, slot = "counts")),
'~/counts.csv',
sep = ',', row.names = T, col.names = T, quote = F)

python scGNN_v2.py --load_seurat_object ~/counts.csv     --output_dir ~/outputs/     --total_epoch 31 --feature_AE_epoch 500 300     --output_intermediate2022-11-11 13:02:02,936 -
> Loading Packages
2022-11-11 13:02:05,547 - Using device: cpu
2022-11-11 13:02:05,547 - Namespace(alpha=0.5, ari_threshold=0.95, cluster_AE_batch_size=12800, cluster_AE_dropout_prob=0, cluster_AE_epoch=200, cluster_AE_learning_rate=0.001, cluster_AE_regu_strength=0.9, clustering_embed='graph', clustering_louvain_only=False, clustering_method='KMeans', clustering_use_flexible_k=False, deconv_opt1_epoch=5000, deconv_opt1_epsilon=0.0001, deconv_opt1_learning_rate=0.001, deconv_opt1_regu_strength=0.01, deconv_opt2_epoch=500, deconv_opt2_epsilon=0.0001, deconv_opt2_learning_rate=0.1, deconv_opt2_regu_strength=0.01, deconv_opt3_epoch=150, deconv_opt3_epsilon=0.0001, deconv_opt3_learning_rate=0.1, deconv_opt3_regu_strength_1=0.8, deconv_opt3_regu_strength_2=0.01, deconv_opt3_regu_strength_3=1, deconv_tune_epoch=20, deconv_tune_epsilon=0.0001, deconv_tune_learning_rate=0.01, dropout_prob=0.1, feature_AE_batch_size=12800, feature_AE_concat_prev_embed=None, feature_AE_dropout_prob=0, feature_AE_epoch=[500, 300], feature_AE_learning_rate=0.001, feature_AE_regu_strength=0.9, gat_hid_embed=64, gat_multi_heads=2, given_cell_type_labels=False, graph_AE_GAT_dropout=0, graph_AE_concat_prev_embed=False, graph_AE_embedding_size=16, graph_AE_epoch=200, graph_AE_graph_construction='v2', graph_AE_learning_rate=0.01, graph_AE_neighborhood_factor=0.05, graph_AE_normalize_embed=None, graph_AE_retain_weights=False, graph_AE_use_GAT=False, graph_change_threshold=0.01, load_LTMG=None, load_bulk_dataset='', load_cell_type_labels='', load_dataset_dir='/fs/ess/PCON0022/Edison/datasets', load_dataset_name='12.Klein', load_from_10X=None, load_rdata=None, load_sc_dataset='', load_seurat_object='/home/dengzhen/counts.csv', load_use_benchmark=False, output_dir='/home/dengzhen/outputs/', output_intermediate=True, output_preprocessed=False, output_rdata=False, output_run_ID=None, preprocess_cell_cutoff=0.9, preprocess_gene_cutoff=0.9, preprocess_top_gene_select=2000, run_LTMG=False, seed=1, total_epoch=31, use_CCC=False, use_bulk=False)
2022-11-11 13:02:05,547 -
> Loading data ...
2022-11-11 13:02:05,547 - --------> Loading from seurat object ...
2022-11-11 13:02:05,547 - ----------------> Reading matrix (dense) ...
/home/dengzhen/scGNN2.0/load.py:198: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support sep=None with delim_whitespace=False; you can avoid this warning by specifying engine='python'.
  X = pd.read_csv(file_path, **kwargs)
2022-11-11 13:05:25,985 - ----------------> Matrix has 14594 cells and 27775 genes
2022-11-11 13:05:25,989 -
> Preprocessing data ...
2022-11-11 13:05:25,990 - --------> Preprocessing sc data ...
2022-11-11 13:05:25,990 - ----------------> Truncating genes and cells ...
2022-11-11 13:05:28,592 - ----------------> Sorting and selecting top genes ...
2022-11-11 13:05:28,964 - ----------------> Log-transforming data ...
2022-11-11 13:05:29,070 - --------> Preprocessed sc data has 6572 cells and 2000 genes, Removing 8022 cells and 25775 genes
2022-11-11 13:05:29,071 -
> Setting up data for testing ...
2022-11-11 13:05:29,123 - --------> Applying dropout for imputation testing ...
2022-11-11 13:05:29,972 -
> Preparing other matrices ...
2022-11-11 13:05:29,972 - --------> Loading LTMG matrix ...
Traceback (most recent call last):
  File "scGNN_v2.py", line 228, in <module>
    TRS = load.LTMG_handler(args) # cell * gene
  File "/home/dengzhen/scGNN2.0/load.py", line 140, in LTMG_handler
    os.path.join(dir_path, args.load_LTMG),
  File "/home/dengzhen/miniconda3/envs/scgnnEnv/lib/python3.8/posixpath.py", line 90, in join
    genericpath._check_arg_types('join', a, *p)
  File "/home/dengzhen/miniconda3/envs/scgnnEnv/lib/python3.8/genericpath.py", line 152, in _check_arg_types
    raise TypeError(f'{funcname}() argument must be str, bytes, or '
TypeError: join() argument must be str, bytes, or os.PathLike object, not 'NoneType'

Or use 10X

python scGNN_v2.py --load_from_10X ~/Documents/filtered_mtx/sham     --output_dir ./outputs     --total_epoch 31 --feature_AE_epoch 500 300     --output_intermediate
2022-11-11 13:16:57,031 -
> Loading Packages
2022-11-11 13:16:59,759 - Using device: cpu
2022-11-11 13:16:59,759 - Namespace(alpha=0.5, ari_threshold=0.95, cluster_AE_batch_size=12800, cluster_AE_dropout_prob=0, cluster_AE_epoch=200, cluster_AE_learning_rate=0.001, cluster_AE_regu_strength=0.9, clustering_embed='graph', clustering_louvain_only=False, clustering_method='KMeans', clustering_use_flexible_k=False, deconv_opt1_epoch=5000, deconv_opt1_epsilon=0.0001, deconv_opt1_learning_rate=0.001, deconv_opt1_regu_strength=0.01, deconv_opt2_epoch=500, deconv_opt2_epsilon=0.0001, deconv_opt2_learning_rate=0.1, deconv_opt2_regu_strength=0.01, deconv_opt3_epoch=150, deconv_opt3_epsilon=0.0001, deconv_opt3_learning_rate=0.1, deconv_opt3_regu_strength_1=0.8, deconv_opt3_regu_strength_2=0.01, deconv_opt3_regu_strength_3=1, deconv_tune_epoch=20, deconv_tune_epsilon=0.0001, deconv_tune_learning_rate=0.01, dropout_prob=0.1, feature_AE_batch_size=12800, feature_AE_concat_prev_embed=None, feature_AE_dropout_prob=0, feature_AE_epoch=[500, 300], feature_AE_learning_rate=0.001, feature_AE_regu_strength=0.9, gat_hid_embed=64, gat_multi_heads=2, given_cell_type_labels=False, graph_AE_GAT_dropout=0, graph_AE_concat_prev_embed=False, graph_AE_embedding_size=16, graph_AE_epoch=200, graph_AE_graph_construction='v2', graph_AE_learning_rate=0.01, graph_AE_neighborhood_factor=0.05, graph_AE_normalize_embed=None, graph_AE_retain_weights=False, graph_AE_use_GAT=False, graph_change_threshold=0.01, load_LTMG=None, load_bulk_dataset='', load_cell_type_labels='', load_dataset_dir='/fs/ess/PCON0022/Edison/datasets', load_dataset_name='12.Klein', load_from_10X='/home/dengzhen/Documents/filtered_mtx/sham', load_rdata=None, load_sc_dataset='', load_seurat_object=None, load_use_benchmark=False, output_dir='./outputs', output_intermediate=True, output_preprocessed=False, output_rdata=False, output_run_ID=None, preprocess_cell_cutoff=0.9, preprocess_gene_cutoff=0.9, preprocess_top_gene_select=2000, run_LTMG=False, seed=1, total_epoch=31, use_CCC=False, use_bulk=False)
2022-11-11 13:16:59,759 -
> Loading data ...
2022-11-11 13:16:59,759 - --------> Loading from 10X data ...
2022-11-11 13:16:59,759 - ----------------> Reading matrix (dense) ...
Traceback (most recent call last):
  File "scGNN_v2.py", line 212, in <module>
    X_sc_raw = load.sc_handler(args)
  File "/home/dengzhen/scGNN2.0/load.py", line 37, in sc_handler
    return load_from_10X(
  File "/home/dengzhen/scGNN2.0/load.py", line 86, in load_from_10X
    rows = np.zeros(X.shape[0])
NameError: name 'X' is not defined
@chthub
Copy link
Collaborator

chthub commented Nov 11, 2022

@Roger-GOAT
Thank you for using our software.

  1. We use pyreadr to load rdata using Python. The same issue here. S4 Objects and probably other kind of objects, including those that depend on non base R packages (Bioconductor for example) cannot be read.
  2. We have updated our program to solve the error in loading csv and 10X file, please update your local copy as well.

Please let me know if there’s any questions.

@chthub chthub closed this as completed Nov 13, 2022
@Roger-GOAT
Copy link
Author

Roger-GOAT commented Nov 13, 2022

@chthub Thanks for fixing it so quickly. However, other errors come out.

python scGNN_v2.py --load_from_10X ~/Documents/filtered_mtx/sham     --output_dir ./outputs     --total_epoch 31 --feature_AE_epoch 500 300     --output_intermediate
2022-11-13 22:59:15,989 -
> Loading Packages
2022-11-13 22:59:18,562 - Using device: cpu
2022-11-13 22:59:18,562 - Namespace(alpha=0.5, ari_threshold=0.95, cluster_AE_batch_size=12800, cluster_AE_dropout_prob=0, cluster_AE_epoch=200, cluster_AE_learning_rate=0.001, cluster_AE_regu_strength=0.9, clustering_embed='graph', clustering_louvain_only=False, clustering_method='KMeans', clustering_use_flexible_k=False, deconv_opt1_epoch=5000, deconv_opt1_epsilon=0.0001, deconv_opt1_learning_rate=0.001, deconv_opt1_regu_strength=0.01, deconv_opt2_epoch=500, deconv_opt2_epsilon=0.0001, deconv_opt2_learning_rate=0.1, deconv_opt2_regu_strength=0.01, deconv_opt3_epoch=150, deconv_opt3_epsilon=0.0001, deconv_opt3_learning_rate=0.1, deconv_opt3_regu_strength_1=0.8, deconv_opt3_regu_strength_2=0.01, deconv_opt3_regu_strength_3=1, deconv_tune_epoch=20, deconv_tune_epsilon=0.0001, deconv_tune_learning_rate=0.01, dropout_prob=0.1, feature_AE_batch_size=12800, feature_AE_concat_prev_embed=None, feature_AE_dropout_prob=0, feature_AE_epoch=[500, 300], feature_AE_learning_rate=0.001, feature_AE_regu_strength=0.9, gat_hid_embed=64, gat_multi_heads=2, given_cell_type_labels=False, graph_AE_GAT_dropout=0, graph_AE_concat_prev_embed=False, graph_AE_embedding_size=16, graph_AE_epoch=200, graph_AE_graph_construction='v2', graph_AE_learning_rate=0.01, graph_AE_neighborhood_factor=0.05, graph_AE_normalize_embed=None, graph_AE_retain_weights=False, graph_AE_use_GAT=False, graph_change_threshold=0.01, load_LTMG=None, load_bulk_dataset='', load_cell_type_labels='', load_dataset_dir='/fs/ess/PCON0022/Edison/datasets', load_dataset_name='12.Klein', load_from_10X='/home/dengzhen/Documents/filtered_mtx/sham', load_rdata=None, load_sc_dataset='', load_seurat_object=None, load_use_benchmark=False, output_dir='./outputs', output_intermediate=True, output_preprocessed=False, output_rdata=False, output_run_ID=None, preprocess_cell_cutoff=0.9, preprocess_gene_cutoff=0.9, preprocess_top_gene_select=2000, run_LTMG=False, seed=1, total_epoch=31, use_CCC=False, use_bulk=False)
2022-11-13 22:59:18,562 -
> Loading data ...
2022-11-13 22:59:18,562 - --------> Loading from 10X data ...
2022-11-13 22:59:18,562 - ----------------> Reading matrix (dense) ...
2022-11-13 22:59:19,719 - ----------------> Matrix has 7694 cells and 31053 genes
2022-11-13 22:59:19,719 -
> Preprocessing data ...
2022-11-13 22:59:19,719 - --------> Preprocessing sc data ...
2022-11-13 22:59:19,719 - ----------------> Truncating genes and cells ...
Traceback (most recent call last):
  File "scGNN_v2.py", line 216, in <module>
    X_sc = preprocess.sc_handler(X_sc_raw, args)
  File "/home/dengzhen/scGNN2.0/preprocess.py", line 25, in sc_handler
    X_trunc = filter(X_raw, cell_cutoff=cell_cutoff, gene_cutoff=gene_cutoff)
  File "/home/dengzhen/scGNN2.0/preprocess.py", line 57, in filter
    cell_mask = (X_raw['expr'] > 0)[:,gene_mask].mean(axis=1) >= (1-cell_cutoff) # retain the cells where more than 1% of thegenes are expressed
  File "/home/dengzhen/miniconda3/envs/scgnnEnv/lib/python3.8/site-packages/numpy/matrixlib/defmatrix.py", line 193, in __getitem__
    out = N.ndarray.__getitem__(self, index)
IndexError: too many indices for array: array is 2-dimensional, but 3 were indexed

And

 python scGNN_v2.py --load_seurat_object ~/counts.csv     --output_dir ~/outputs/     --total_epoch 31 --feature_AE_epoch 500 300     --output_intermediate
2022-11-13 23:01:31,647 -
> Loading Packages
2022-11-13 23:01:34,219 - Using device: cpu
2022-11-13 23:01:34,219 - Namespace(alpha=0.5, ari_threshold=0.95, cluster_AE_batch_size=12800, cluster_AE_dropout_prob=0, cluster_AE_epoch=200, cluster_AE_learning_rate=0.001, cluster_AE_regu_strength=0.9, clustering_embed='graph', clustering_louvain_only=False, clustering_method='KMeans', clustering_use_flexible_k=False, deconv_opt1_epoch=5000, deconv_opt1_epsilon=0.0001, deconv_opt1_learning_rate=0.001, deconv_opt1_regu_strength=0.01, deconv_opt2_epoch=500, deconv_opt2_epsilon=0.0001, deconv_opt2_learning_rate=0.1, deconv_opt2_regu_strength=0.01, deconv_opt3_epoch=150, deconv_opt3_epsilon=0.0001, deconv_opt3_learning_rate=0.1, deconv_opt3_regu_strength_1=0.8, deconv_opt3_regu_strength_2=0.01, deconv_opt3_regu_strength_3=1, deconv_tune_epoch=20, deconv_tune_epsilon=0.0001, deconv_tune_learning_rate=0.01, dropout_prob=0.1, feature_AE_batch_size=12800, feature_AE_concat_prev_embed=None, feature_AE_dropout_prob=0, feature_AE_epoch=[500, 300], feature_AE_learning_rate=0.001, feature_AE_regu_strength=0.9, gat_hid_embed=64, gat_multi_heads=2, given_cell_type_labels=False, graph_AE_GAT_dropout=0, graph_AE_concat_prev_embed=False, graph_AE_embedding_size=16, graph_AE_epoch=200, graph_AE_graph_construction='v2', graph_AE_learning_rate=0.01, graph_AE_neighborhood_factor=0.05, graph_AE_normalize_embed=None, graph_AE_retain_weights=False, graph_AE_use_GAT=False, graph_change_threshold=0.01, load_LTMG=None, load_bulk_dataset='', load_cell_type_labels='', load_dataset_dir='/fs/ess/PCON0022/Edison/datasets', load_dataset_name='12.Klein', load_from_10X=None, load_rdata=None, load_sc_dataset='', load_seurat_object='/home/dengzhen/counts.csv', load_use_benchmark=False, output_dir='/home/dengzhen/outputs/', output_intermediate=True, output_preprocessed=False, output_rdata=False, output_run_ID=None, preprocess_cell_cutoff=0.9, preprocess_gene_cutoff=0.9, preprocess_top_gene_select=2000, run_LTMG=False, seed=1, total_epoch=31, use_CCC=False, use_bulk=False)
2022-11-13 23:01:34,219 -
> Loading data ...
2022-11-13 23:01:34,219 - --------> Loading from seurat object ...
2022-11-13 23:01:34,219 - ----------------> Reading matrix (dense) ...
/home/dengzhen/scGNN2.0/load.py:200: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support sep=None with delim_whitespace=False; you can avoid this warning by specifying engine='python'.
  X = pd.read_csv(file_path, **kwargs)
2022-11-13 23:04:44,059 - ----------------> Matrix has 14594 cells and 27775 genes
2022-11-13 23:04:44,064 -
> Preprocessing data ...
2022-11-13 23:04:44,064 - --------> Preprocessing sc data ...
2022-11-13 23:04:44,064 - ----------------> Truncating genes and cells ...
2022-11-13 23:04:47,795 - ----------------> Sorting and selecting top genes ...
2022-11-13 23:04:48,165 - ----------------> Log-transforming data ...
2022-11-13 23:04:48,268 - --------> Preprocessed sc data has 6572 cells and 2000 genes, Removing 8022 cells and 25775 genes
2022-11-13 23:04:48,270 -
> Setting up data for testing ...
2022-11-13 23:04:48,321 - --------> Applying dropout for imputation testing ...
2022-11-13 23:04:49,170 -
> Preparing other matrices ...
2022-11-13 23:04:49,170 - --------> Loading LTMG matrix ...
Traceback (most recent call last):
  File "scGNN_v2.py", line 228, in <module>
    TRS = load.LTMG_handler(X_sc, args) # cell * gene
  File "/home/dengzhen/scGNN2.0/load.py", line 139, in LTMG_handler
    return np.zero_like(X_sc['expr'])
  File "/home/dengzhen/miniconda3/envs/scgnnEnv/lib/python3.8/site-packages/numpy/__init__.py", line 311, in __getattr__
    raise AttributeError("module {!r} has no attribute "
AttributeError: module 'numpy' has no attribute 'zero_like'

I follow the answer in google. Uninstall and reinstall numpy, not working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants