GitXplorerGitXplorer
M

dsg17

public
2 stars
0 forks
0 issues

Commits

List of commits on branch master.
Unverified
afa26deff0b7a5b1842f133a1fa754e163fd2bb1

Update DSG_qualifs.ipynb

MMarcSzafraniec committed 8 years ago
Unverified
9f5d8f94fb73aac0c555af37cca143a7188f0a00

generalized correct_median to any choice of names

VVincentBt committed 8 years ago
Unverified
4578d726b81044ed958acc90b29ef48c778e479f

better than replacing with median

VVincentBt committed 8 years ago
Unverified
749b7521cb26c7717e784ddcca8fdd1f43f41610

Added PCA

committed 8 years ago
Unverified
e511f7495fad045c7e6c8d47fefcb92dec3e2689

Addedd clustering

committed 8 years ago
Unverified
6223b59512ff678a0c67c45666a11bdd4e5b098a

Merge branch 'master' of https://github.com/MarcSzafraniec/dsg17

committed 8 years ago

README

The README file for this repository.

DataScienceGame 2017

data: https://drive.google.com/drive/folders/0B5_VOL6s8O6KWUd0dlI5OVRjYlk
progress: https://docs.google.com/document/d/19pQ7JyAXz3eZqGScN1xRQMPy_O5toEOogACMAC68Hkk/edit
description: https://docs.google.com/document/d/17dUl1nUFY0xhoZRMrhk3FI0uY5JJRTaIU-bs9tWob0c/edit
Kaggle: https://inclass.kaggle.com/c/dsg17-online-phase/

How to process it?

  • Genre_id, media_id, album_id, user_id, artist_id -> aggregate (e.g. count)
  • Ts_listen, release_date: date under 2 different formats -> put to same format
  • Context_type -> one-hot-encode: 74 values from 0 to 73
  • Platform_name, platform_family -> one-hot encode? Aggregate? (only 3 values each)
  • Media_duration -> this one seems simple, keep as is
  • Listen_type -> probably keep as is, but not sure
  • User_gender -> keep as is (sexism!)
  • User_age -> keep as is

Other ideas:

  • compute mean length for an album, an artist, a genre, mean of is_listened for each user, each artist, etc using the date
  • Using the date, we can compute the number of songs he listened in a row

I think the key here is correctly using the information about artist, etc…

Solutions:

  • XGBoost
  • Neural Networks
  • Reduce dimensions selecting only important features?
  • Question: how to use the .json file?