Tutorial/Opinion: Finding a dataset for your project
If you’re an up and coming data scientist or student you may want to be building your portfolio. If that’s the case here are some quick suggestions on datasets you may want to work with.
- Don’t pick a dataset that is common. If I was a hiring manager and saw you do an analysis on the iris dataset, I would not be impressed. Aim for something that isn’t on the UCI Machine Learning Respository.
- Try to create your own dataset, this is way more impressive. Here are how you can do that.
- Run an experiment and collect the data yourself
- Scrape the web for one yourself. Check out this link to get started on that.
- Be able to call an API and get a dataset that way. For personal projects in the past I’ve used propublica’s API, twitter’s API, and Spotify’s API.
- If you do download a dataset you got online. Try sticking it into a database just to learn how that works and prove to others you can work with databases. In college I was mostly working with CSVs so its best to diversify your data pipeline skills.