In the computer vision field, three most common operations which we perform i.e. image classification, object detection and image segmentation. In the computer vision field, people usually confused with these three terms.Let’s start with understanding what is image classification,object detection and image segmentation.
Image Classification : image classification, a topic of pattern recognition in computer vision, is an approach of classification based on contextual information in images. "Contextual" means this approach is focusing on the relationship of the nearby pixels, which is also called neighbourhood.

Object Detection : Object detection is a computer technology related to computer vision and image processing that deals with detecting instances of semantic objects of a certain class in digital images and videos. Well-researched domains of object detection include face detection and pedestrian detection.

Image segmentation : We can divide or partition the image into various parts called segments. It’s not a great idea to process the entire image at the same time as there will be regions in the image which do not contain any information. By dividing the image into segments, we can make use of the important segments for processing the image. That, in a nutshell, is how Image Segmentation works.An image, as you must have known, is a collection or set of different pixels. We group together the pixels that have similar attributes using image segmentation

I hope you now have a clear understanding of what is Image Classification, Image Localization, Object Detection and Image Segmentation,now comes over TFOD API.
What is an API? Why do we need an API?
API stands for Application Programming Interface. An API provides developers a set of common operations so that they don’t have to write code from scratch.
TensorFlow Object Detection API :
The TensorFlow object detection API is the framework for creating a deep learning network that solves object detection problems.
There are already pretrained models in their framework which they refer to as Model Zoo. This includes a collection of pretrained models trained on the COCO dataset, the KITTI dataset, and the Open Images Dataset. These models can be used for inference if we are interested in categories only in this dataset.
How to setup the TFOD framework?
Below is the step-by-step process to follow on local syatem for you to just visualize object detection easily with the help of TFOD.
STEP-1 Download the following content
before extraction, you should have the following compressed files in a single folder.

STEP-2 Extract all the above zip files into a tfod folder and remove the compressed files-
After extracting all the zip files now you should have the following folders -

STEP-3 Creating virtual env using conda-
Commands
for specific python version : conda create -n your_env_name python=3.6
for latest python version :
conda activate your_env_name

STEP-4 Install the following packages in your new environment-
for GPU
pip install pillow lxml Cython contextlib2 jupyter matplotlib pandas opencv-python tensorflow-gpu==1.14.0
for CPU only
pip install pillow lxml Cython contextlib2 jupyter matplotlib pandas opencv-python tensorflow==1.14.0
STEP-5 Install protobuf using conda package manager-
conda install -c anaconda protobuf
STEP-6 Change protobuff to .py file-
we convert protobuf file into python file becasue python compiler does not understand protobuf files. In this object detction we have written most of the file into a prtobuf file so we covert that into a python file.
Open command prompt and cd to research folder.
Now in the research folder run the following command
For Linux or Mac
protoc object_detection/protos/*.proto --python_out=.
For Windows
protoc object_detection/protos/*.proto --python_out=.
STEP-7 Install setup.py for object detection-
Install setup.py file which is available in your research folder.For this go over your anaconda prompt change your directory to research and run below command:
python setup.py install
STEP-8 varify your object detection model-
To varify your object detection model you have to run .ipynb file which is reside in your models/research/object detection folder i.e. object_detection_tutorial.ipynb

STEP-9 Paste all content present in utils into research folder-
Following are the files and folder present in the utils folder-

STEP-10 Paste faster_rcnn_inception_v2_coco_2018_01_28 model or any other model downloaded from model zoo into research folder-
Now cd to the research folder and run the following python file-
python xml_to_csv.py
STEP-11 Run the following to generate train and test records-
from the research folder-
python generate_tfrecord.py --csv_input=images/train_labels.csv --image_dir=images/train --output_path=train.record
python generate_tfrecord.py --csv_input=images/test_labels.csv --image_dir=images/test --output_path=test.record
STEP-12 Copy from research/object_detection/samples/config/ YOURMODEL.config file into research/training-
The following config file shown here is with respect to faster_rcnn_inception_v2_coco_2018_01_28. So if you have downloaded it for any other model apart from faster_rcnn_inception_v2_coco_2018_01_28 you'll see config file with YOUR_MODEL_NAME as shown below-

Hence always verify YOUR_MODEL_NAME before using the config file.
STEP-13 Update num_classes, fine_tune_checkpoint ,and num_steps plus update input_path and label_map_path for both train_input_reader and eval_input_reader-
Changes to be made in the config file are highlighted in yellow color. You must update the value of those keys in the config file.
# SSDLite with Mobilenet v1 configuration for MSCOCO Dataset.
# Users should configure the fine_tune_checkpoint field in the train config as
# well as the label_map_path and input_path fields in the train_input_reader and
# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that
# should be configured.
model {
ssd {
num_classes: 6
box_coder {
faster_rcnn_box_coder {
y_scale: 10.0
x_scale: 10.0
height_scale: 5.0
width_scale: 5.0
}
}
matcher {
argmax_matcher {
matched_threshold: 0.5
unmatched_threshold: 0.5
ignore_thresholds: false
negatives_lower_than_unmatched: true
force_match_for_each_row: true
}
}
similarity_calculator {
iou_similarity {
}
}
anchor_generator {
ssd_anchor_generator {
num_layers: 6
min_scale: 0.2
max_scale: 0.95
aspect_ratios: 1.0
aspect_ratios: 2.0
aspect_ratios: 0.5
aspect_ratios: 3.0
aspect_ratios: 0.3333
}
}
image_resizer {
fixed_shape_resizer {
height: 300
width: 300
}
}
box_predictor {
convolutional_box_predictor {
min_depth: 0
max_depth: 0
num_layers_before_predictor: 0
use_dropout: false
dropout_keep_probability: 0.8
kernel_size: 3
use_depthwise: true
box_code_size: 4
apply_sigmoid_to_scores: false
conv_hyperparams {
activation: RELU_6,
regularizer {
l2_regularizer {
weight: 0.00004
}
}
initializer {
truncated_normal_initializer {
stddev: 0.03
mean: 0.0
}
}
batch_norm {
train: true,
scale: true,
center: true,
decay: 0.9997,
epsilon: 0.001,
}
}
}
}
feature_extractor {
type: 'ssd_mobilenet_v1'
min_depth: 16
depth_multiplier: 1.0
use_depthwise: true
conv_hyperparams {
activation: RELU_6,
regularizer {
l2_regularizer {
weight: 0.00004
}
}
initializer {
truncated_normal_initializer {
stddev: 0.03
mean: 0.0
}
}
batch_norm {
train: true,
scale: true,
center: true,
decay: 0.9997,
epsilon: 0.001,
}
}
}
loss {
classification_loss {
weighted_sigmoid {
}
}
localization_loss {
weighted_smooth_l1 {
}
}
hard_example_miner {
num_hard_examples: 3000
iou_threshold: 0.99
loss_type: CLASSIFICATION
max_negatives_per_positive: 3
min_negatives_per_image: 0
}
classification_weight: 1.0
localization_weight: 1.0
}
normalize_loss_by_num_matches: true
post_processing {
batch_non_max_suppression {
score_threshold: 1e-8
iou_threshold: 0.6
max_detections_per_class: 100
max_total_detections: 100
}
score_converter: SIGMOID
}
}
}
train_config: {
batch_size: 24
optimizer {
rms_prop_optimizer: {
learning_rate: {
exponential_decay_learning_rate {
initial_learning_rate: 0.004
decay_steps: 800720
decay_factor: 0.95
}
}
momentum_optimizer_value: 0.9
decay: 0.9
epsilon: 1.0
}
}
fine_tune_checkpoint: "ssd_mobilenet_v1_coco_2018_01_28/model.ckpt"
from_detection_checkpoint: true
# Note: The below line limits the training process to 200K steps, which we
# empirically found to be sufficient enough to train the pets dataset. This
# effectively bypasses the learning rate schedule (the learning rate will
# never decay). Remove the below line to train indefinitely.
num_steps: 20000
data_augmentation_options {
random_horizontal_flip {
}
}
data_augmentation_options {
ssd_random_crop {
}
}
}
train_input_reader: {
tf_record_input_reader {
input_path: "train.record"
}
label_map_path: "training/labelmap.pbtxt"
}
eval_config: {
num_examples: 8000
# Note: The below line limits the evaluation process to 10 evaluations.
# Remove the below line to evaluate indefinitely.
max_evals: 10
}
eval_input_reader: {
tf_record_input_reader {
input_path: "test.record"
}
label_map_path: "training/labelmap.pbtxt"
shuffle: false
num_readers: 1
}
STEP-14 From research/object_detection/legacy/ copy train.py to research folder-
legacy folder contains train.py as shown below -

STEP-15 Copy deployment and nets folder from research/slim into the research folder-
slim folder contains the following folders -

STEP-16 Now Run the following command from the research folder. This will start the training in your local system-
copy the command and replace YOUR_MODEL.config with your own model's name for example faster_rcnn_inception_v2_coco_2018_01_28 and then run it in cmd prompt or terminal. And make sure you are in research folder.
python train.py --logtostderr --train_dir=training/ --pipeline_config_path=training/YOUR_MODEL.config
Note : Always run all the commands in the research folder.