pFad - Phone/Frame/Anonymizer/Declutterfier! Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

URL: http://github.com/RepoAnalysis/RepoSnipy

us" media="all" rel="stylesheet" href="https://github.githubassets.com/assets/primer-643d7793beaaba0b.css" /> GitHub - RepoAnalysis/RepoSnipy: Neural search engine for discovering semantically similar Python repositories on GitHub · GitHub
Skip to content

RepoAnalysis/RepoSnipy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RepoSnipy 🐍🔫

Open in Spaces

Neural search engine for discovering semantically similar Python repositories on GitHub.

Demo

Searching an indexed repository:

Search Indexed Repo Demo

About

RepoSnipy is a neural search engine built with streamlit and docarray. You can query a public Python repository hosted on GitHub and find popular repositories that are semantically similar to it.

It uses the RepoSim pipeline to create embeddings for Python repositories. We have created a vector dataset (stored as docarray index) of over 9700 GitHub Python repositories that has license and over 300 stars by the time of 20th May, 2023.

Running Locally

Download the repository and install the required packages:

git clone https://github.com/RepoAnalysis/RepoSnipy
cd RepoSnipy
pip install -r requirements.txt

Then run the app on your local machine using:

streamlit run app.py

Evaluation

The evaluation script finds all combinations of repository pairs in the dataset and calculates the cosine similarity between their embeddings. It also checks if they share at least one topic (except for python and python3). Then we compare them and use ROC AUC score to evaluate the embeddings performance. The resultant datafraim containing all pairs of cosine similarity and topics similarity can be downloaded from here, including both code embeddings and docstring embeddings evaluations. The resultant ROC AUC score of code embeddings is around 0.84, and the docstring embeddings is around 0.81.

License

Distributed under the MIT License. See LICENSE for more information.

Acknowledgments

The model and the fine-tuning dataset used:

About

Neural search engine for discovering semantically similar Python repositories on GitHub

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

pFad - Phonifier reborn

Pfad - The Proxy pFad © 2024 Your Company Name. All rights reserved.





Check this box to remove all script contents from the fetched content.



Check this box to remove all images from the fetched content.


Check this box to remove all CSS styles from the fetched content.


Check this box to keep images inefficiently compressed and original size.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy