# Choosing a dataset

Projects can be configured to use a DataWars Playgrounds data source automatically. Here's a step by step guide to do so.

In this document

  • How to use Playground Datasets in your projects
  • Examples of projects using datasources

# How to use Playground Datasets in your projects

# 1. Find the data source to use

Data Sources can be either file-based datasets (a CSV, a directory containing image files) or a "device", for example, a MySQL Database.

Let's try to add both.

Go to the Datasets section of Playgrounds and find the datasets you want to add. In this case I'll add:

  • Pagila: a PostgreSQL database containing movie rentals
  • Hollywood Movies Database: a CSV file containing hollywood movies data

What you'll need to do is:

  1. Go to datasets
  2. Find the desired dataset (searching or using the filters on the right)
  3. Hover over the logo and you'll see the ID
  4. Copy the ID of the data source using the button

In this case, here are the IDs:

  • Pagila: 7742f868-902a-4262-bfbe-00497ac27468
  • Hollywood Movies Database: de5011a1-61fb-45f2-b067-bd363ba012d2

# 2. Configure your Project's docker-compose.yml

Now you have to define in your project's docker-compose.yml. There's a new section at the end of the YAML file that is called x-datasources:

# your docker-compose.yml
version: "3.9"
services:
    # list of devices of your project
x-datasources:
    name-of-the-datasource-1:
        id: $ID
    name-of-the-datasource-2:
        id: $ID

You can list as many datasources as you want. If the datasource is a "device" type (like the Postgres Pagila one), it'll be automatically added to your project. But, if your dataset is a "file type" (a CSV, a Directory), you also have to configure it in the service that will have access to those files.

Here's a working example with comments:

version: "3.9"
services:
  jupyter:
    build:
      args:
        NB_BASE_IMAGE: datawars/data-analysis-nb7-3.11:v1
      context: notebooks
    image: datawars/lab-9059fbffa-example-datasource-project:v1
    x-config: python-jupyter
    x-datasources:
            - hollywood_data # this project will have access to this datasource
x-datasources:
  pagila:
    id: 7742f868-902a-4262-bfbe-00497ac27468
  hollywood_data:
    id: de5011a1-61fb-45f2-b067-bd363ba012d2

# 3. Finding the data in the lab

If the data source is a "device type", it'll be in the final lab once you import it, easy.

If the data source is of file type, it'll be mounted in the /data directory.

# Examples of working projects

# Python + Pagila (postgres)

This project (Practice accessing Postres from Python using Pagila) has two devices:

Here's its docker-compose.yml:

version: "3.9"
services:
  jupyter:
    build:
      args:
        NB_BASE_IMAGE: datawars/data-analysis-nb7-3.11:v1
      context: notebooks
    image: datawars/lab-0d9f2dd-basic-selects-pagila-python:v1
    x-config: python-jupyter
x-datasources:
  pagila:
    id: 7742f868-902a-4262-bfbe-00497ac27468 # list datasource