Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

💡[Feature]: Enhance and merge email spam detection notebook with EDA and NLP improvements #1455

Open
4 tasks done
Niraj1608 opened this issue Oct 15, 2024 · 11 comments
Open
4 tasks done

Comments

@Niraj1608
Copy link
Member

Niraj1608 commented Oct 15, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Feature Description

Hi, I am Niraj. I've been reviewing the Email Spam Detection with Machine Learning project and noticed several areas where improvements can be made. Specifically, I propose:

Enhanced EDA: Adding more detailed charts and visualizations using Python libraries like Seaborn or Matplotlib will help in better understanding the data distribution and correlation between features. This could include heatmaps, pair plots, and distribution plots to visualize relationships and patterns in the data.
Advanced NLP Techniques: Incorporating more Natural Language Processing (NLP) techniques, such as advanced tokenization, lemmatization, and more sophisticated vectorization techniques like TF-IDF.
Data Cleaning: Introducing robust data cleaning methods to remove noisy data, handle missing values, and preprocess text data more efficiently will improve the model's accuracy.
This would enhance the overall performance of the spam detection model by making it more interpretable and efficient through better visualizations and data processing.
Dataset Issue: The project is missing the dataset required for running the notebook. I propose including a well-structured dataset to ensure reproducibility and ease of use for others.

If you like this idea, please assign this task to me, and I will add the corresponding improvements and charts to it.

Thank you for your time and consideration!

Use Case

Incorporating the enhanced EDA and advanced NLP techniques from the Spam Mail Predictor notebook will provide better insights into the dataset, leading to more accurate model training and predictions. This is crucial for users looking for deeper analysis and improved model performance.

Benefits

Improved EDA: The Spam Mail Predictor notebook features more detailed EDA, including additional visualizations and insights.
Enhanced NLP: It also includes more advanced NLP techniques, such as TF-IDF and more extensive text preprocessing steps.
Dataset Integration: Adding a clear, usable dataset to the notebook will ensure reproducibility and ease of use for others.

Add ScreenShots

@sanjay-kv once you assign me work i will create it .

Priority

High

Record

  • I have read the Contributing Guidelines
  • I'm a GSSOC'24 contributor
  • I want to work on this issue
@Niraj1608 Niraj1608 added the enhancement New feature or request label Oct 15, 2024
Copy link

Thank you for creating this issue! 🎉 We'll look into it as soon as possible. In the meantime, please make sure to provide all the necessary details and context. If you have any questions reach out to LinkedIn. Your contributions are highly appreciated! 😊

Note: I Maintain the repo issue twice a day, or ideally 1 day, If your issue goes stale for more than one day you can tag and comment on this same issue.

You can also check our CONTRIBUTING.md for guidelines on contributing to this project.
We are here to help you on this journey of opensource, any help feel free to tag me or book an appointment.

@sanjay-kv
Copy link
Member

its already there

Copy link

Hello @Niraj1608! Your issue #1455 has been closed. Thank you for your contribution!

@Niraj1608
Copy link
Member Author

@sanjay-kv
I understand your point, but I believe the current implementation lacks the advanced NLP techniques that could significantly improve the model's accuracy and text processing. By incorporating features like TF-IDF and advanced tokenization, we can enhance the spam detection’s robustness and efficiency. I've also added my code file as a PDF for reference
spam_mail_pridector.ipynb - Colab.pdf

@Niraj1608
Copy link
Member Author

@sanjay-kv Can it be alright to either replace an existing file with this or add it to the current one? Let me know what you prefer!

@sanjay-kv
Copy link
Member

add it to the current one

sanjay-kv added a commit that referenced this issue Oct 16, 2024
[Feature]: Enhance and merge email spam detection notebook with EDA and NLP improvements #1455
@Niraj1608
Copy link
Member Author

@sanjay-kv its level 2 issue and merged pr but got 10 points can you check pls :)

@sanjay-kv
Copy link
Member

share ss

@Niraj1608
Copy link
Member Author

Niraj1608 commented Oct 16, 2024

@sanjay-kv

Screenshot 2024-10-16 205631
#1459

@sanjay-kv
Copy link
Member

image

@sanjay-kv
Copy link
Member

updated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants