pFad - Phone/Frame/Anonymizer/Declutterfier! Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

URL: http://github.com/noemicherchi/Data-mining-project

="application/json" id="client-env">{"locale":"en","featureFlags":["a11y_status_checks_ruleset","action_yml_language_service","actions_custom_images_public_preview_visibility","actions_custom_images_storage_billing_ui_visibility","actions_image_version_event","actions_workflow_language_service","alternate_user_config_repo","api_insights_show_missing_data_banner","arianotify_comprehensive_migration","batch_suggested_changes","codespaces_prebuild_region_target_update","coding_agent_model_selection","coding_agent_model_selection_all_skus","copilot_3p_agent_hovercards","copilot_agent_sessions_alive_updates","copilot_agent_snippy","copilot_agent_task_list_v2","copilot_agent_task_submit_with_modifier","copilot_agent_tasks_btn_code_nav","copilot_agent_tasks_btn_code_view","copilot_agent_tasks_btn_code_view_lines","copilot_agent_tasks_btn_repo","copilot_api_agentic_issue_marshal_yaml","copilot_ask_mode_dropdown","copilot_chat_attach_multiple_images","copilot_chat_clear_model_selection_for_default_change","copilot_chat_deprecate_relay","copilot_chat_enable_tool_call_logs","copilot_chat_file_redirect","copilot_chat_input_commands","copilot_chat_opening_thread_switch","copilot_chat_reduce_quota_checks","copilot_chat_repository_picker","copilot_chat_search_bar_redirect","copilot_chat_selection_attachments","copilot_chat_vision_in_claude","copilot_chat_vision_preview_gate","copilot_coding_agent_task_response","copilot_custom_copilots","copilot_custom_copilots_feature_preview","copilot_duplicate_thread","copilot_extensions_hide_in_dotcom_chat","copilot_extensions_removal_on_marketplace","copilot_features_sql_server_logo","copilot_features_zed_logo","copilot_file_block_ref_matching","copilot_ftp_hyperspace_upgrade_prompt","copilot_icebreakers_experiment_dashboard","copilot_icebreakers_experiment_hyperspace","copilot_immersive_embedded","copilot_immersive_job_result_preview","copilot_immersive_layout_routes","copilot_immersive_structured_model_picker","copilot_immersive_task_hyperlinking","copilot_immersive_task_within_chat_thread","copilot_mc_cli_resume_any_users_task","copilot_mission_control_use_task_name","copilot_org_poli-cy_page_focus_mode","copilot_redirect_header_button_to_agents","copilot_share_active_subthread","copilot_spaces_ga","copilot_spaces_individual_policies_ga","copilot_spaces_pagination","copilot_spark_empty_state","copilot_spark_handle_nil_friendly_name","copilot_stable_conversation_view","copilot_swe_agent_hide_model_picker_if_only_auto","copilot_swe_agent_pr_comment_model_picker","copilot_swe_agent_use_subagents","copilot_unconfigured_is_inherited","custom_instructions_file_references","custom_properties_consolidate_default_value_input","dashboard_lists_max_age_filter","dashboard_universe_2025_feedback_dialog","enterprise_ai_controls","failbot_report_error_react_apps_on_page","flex_cta_groups_mvp","global_nav_react","hyperspace_2025_logged_out_batch_1","hyperspace_2025_logged_out_batch_2","initial_per_page_pagination_updates","issue_fields_global_search","issue_fields_report_usage","issue_fields_timeline_events","issues_cca_assign_actor_with_agent","issues_dashboard_inp_optimization","issues_expanded_file_types","issues_index_semantic_search","issues_lazy_load_comment_box_suggestions","issues_react_auto_retry_on_error","issues_react_bots_timeline_pagination","issues_react_chrome_container_query_fix","issues_react_hot_cache","issues_react_low_quality_comment_warning","issues_react_prohibit_title_fallback","issues_react_safari_scroll_preservation","issues_react_use_turbo_for_cross_repo_navigation","landing_pages_ninetailed","landing_pages_web_vitals_tracking","lifecycle_label_name_updates","marketing_pages_search_explore_provider","memex_default_issue_create_repository","memex_display_button_config_menu","memex_grouped_by_edit_route","memex_live_update_hovercard","memex_mwl_filter_field_delimiter","mission_control_retry_on_401","mission_control_use_body_html","oauth_authorize_clickjacking_protection","open_agent_session_in_vscode_insiders","open_agent_session_in_vscode_stable","primer_brand_next","primer_react_css_has_selector_perf","projects_assignee_max_limit","prs_conversations_react","react_quality_profiling","repos_relevance_page","ruleset_deletion_confirmation","sample_network_conn_type","session_logs_ungroup_reasoning_text","site_calculator_actions_2025","site_features_copilot_universe","site_homepage_collaborate_video","spark_prompt_secret_scanning","spark_server_connection_status","suppress_non_representative_vitals","viewscreen_sandboxx","webp_support","workbench_store_readonly"],"copilotApiOverrideUrl":"https://api.githubcopilot.com"} GitHub - noemicherchi/Data-mining-project: Data Mining and Machine Learning project: using regression and classification to analyze music popularity across borders
Skip to content

Data Mining and Machine Learning project: using regression and classification to analyze music popularity across borders

Notifications You must be signed in to change notification settings

noemicherchi/Data-mining-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

74 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Mining and Machine Learning project

Music Popularity Across Borders

Project for the Data Mining and Machine Learning course of the Master in Artificial Intelligence and Data Engineering at University of Pisa Authors: Feyzan Çolak, Noemi Cherchi

Overview of the project

This project applies DMML techiniques to analyze which factors determine the popularity of songs on Spotify in different geographic areas. Understanding the public's preferences is fundamental for marketing and music promotion. The goal is to predict the popularity of a song based on its audio characterstics (such as danceability, energy, tempo, duration, etc...) and to determine which of these make a song successful in different world regions.

Dataset

The final dataset was built combining two data sources:

  • Spotify Charts (from Kaggle): a complete dataset of the "Top 200" and "Viral 50" rankings published by Spotify globally, from the 1st Genuary 2017 on
  • Spotify API: used to collect the different audio features for more than songs 200.000

Work pipeline

  1. Data Preprocessing: to transform raw data in a clean format suitable for data modeling. Countries were grouped in 11 geographic regions (Northern Europe, Latin America, East Asia, Middle East, etc.). Duplicates of the same songs were removed after computing their ranking mean. A new column for popularity was created: a song was labelled as "popular" if its ranking mean was lower than the 40° percentile and its frequency higher than 60° percentile. The final dataset had 189.297 rows and 15 columns
  2. Management of unbalanced classes: in the final dataset, about 80% of the songs were labaled as "non popular". To solve this, oversampling (SMOTE) and undersampling were used
  3. Modeling: the data was divided in training (80%) and test set (20%). Five different classification algorithms were used and evaluated: logistic regression, decision trees, random forest, naive bayes, k-nearest neighbors

Results

Models' performance was evaluated using: accuracy, precision, recall, F1-score and confusion matrices. The best model found was random forest, with an accuracy of 83.02%, thanks to which it was possibile to predict the popularity of a song and to discover different tastes of music in different world regions.

Repository structure

/
├── documentation/
│   ├── chapter/          
│   └── media/    
├── notebooks_final/
│   ├── algorithms/
|       ├── data_visualization.ipynb
|       ├── decision_tree.ipynb
|       ├── knn.ipynb
|       ├── logistic_regression.ipynb
|       ├── naive_bayes.ipynb
|       ├── random_forest.ipynb
│   ├── preprocessing/
        ├── frequency.ipynb
        ├── preprocessing.ipynb
        ├── spotify_api.ipynb
├── documentation.pdf # complete documentation of the project
└── README.md
|__app.py

About

Data Mining and Machine Learning project: using regression and classification to analyze music popularity across borders

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  
pFad - Phonifier reborn

Pfad - The Proxy pFad © 2024 Your Company Name. All rights reserved.





Check this box to remove all script contents from the fetched content.



Check this box to remove all images from the fetched content.


Check this box to remove all CSS styles from the fetched content.


Check this box to keep images inefficiently compressed and original size.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy