GitXplorerGitXplorer
m

cxr-data

public
0 stars
0 forks
0 issues

Commits

List of commits on branch main.
Unverified
43e09bdb32c1084e4733d0efc4bff2b08c21a48e

shorten kaggle url

mminho42 committed 14 days ago
Unverified
bd1003ef6bcdc9c37b7f6ed8648dd5c5a63ed391

put back the box csv

mminho42 committed 14 days ago
Unverified
9abb1990bfe89dfee1ccbda656c9922a1975b6f4

links

mminho42 committed 16 days ago
Unverified
07e5aed21e67bf48f72eee1b967073002a3e45e0

distinct disease labels

mminho42 committed 17 days ago
Unverified
2d716d0d6606e45ba2387a88eed206d761851d42

don't ignore db.sqlite3

mminho42 committed 17 days ago
Unverified
35508a94d76e0accd7864ab1171b01a6e629c554

add sqlite making

mminho42 committed 17 days ago

README

The README file for this repository.

🩻cxr-ios

steps

  1. Download data from: nihcc or kaggle

  2. All images from sub directory of images are copied into all_images/ folder with copy_images.py.

  3. change csv file name from Data_Entry_2017_v2020.csv to data.csv

  4. make a simpler data2.csv with simplify_csv.py: remove columns, rename columns, ignore "No Finding" in "Finding Labels"

    <!-- original columns -->
    Image Index,Finding Labels,Follow-up #,Patient ID,Patient Age,Patient Gender,View Position,OriginalImage[Width,Height],OriginalImagePixelSpacing[x,y],
    
    <!-- columns removed -->
    Image Index,Finding Labels,Patient Age,Patient Gender,View Position
    
    <!-- renamed -->
    name,label,age,gender,position
    
    <!-- image names changed from png to jpg -->
    
  5. make db.sqlite3 form data2.csv with csv_to_sqlite.py

  6. 1024x1024 png images from all_images/ that exists in data2.csv ignoring the extension of the image files are resized to 512x512 jpg images with resize_images.py into resized_images/.

    ⚠︎ this takes very long time e.g. 783s