Object & Label Detection API

Tag objects, scenes, activities, and concepts in any image with one REST call. The detect-labels service returns a ranked, scored list of labels as JSON — ready for image search, auto-tagging, and content discovery.

Published 2026-05-21

A photorealistic urban street scene with cyan bounding boxes drawn around a yellow taxi, a cyclist, a small dog, a tree, and a cafe table — illustrating an object and label detection API.
One image in, a ranked list of labels with confidence scores and optional bounding boxes out — the same JSON contract for one object or one hundred.

TL;DR. To detect objects in an image with an API, POST the image as multipart/form-data to Pixicular's image analysis API with services=detect-labels. The response is one JSON document with a ranked array of labels — each carrying a name, confidence score, parent taxonomy, and an optional bounding box for localised objects. One call, no SDK, structured JSON back.

How does object and label detection work?

Label detection is the inverse of image search. Image search takes a label and finds matching images; label detection takes an image and returns the labels that describe it. Under the hood, the Pixicular detect-labels service runs a short pipeline on every upload: image decoding, a multi-label classifier that scores the frame against a broad taxonomy, an object detector that locates concrete things in the scene, and a final assembly step that bundles everything together into a ranked JSON response.

The output mixes three categories of label on purpose. Objects are the concrete things present in the image (taxi, bicycle, dog, coffee mug). Scenes describe where the photo was taken (city street, beach, kitchen). Activities and concepts describe what is happening or what the image conveys (cycling, cooking, sunset). A single photo can — and usually does — pick up labels from all three categories in the same response, which is what makes the service useful for search, recommendations, and discovery features rather than just literal object tagging.

Pipeline diagram: an input image flows into a detection and localization stage that draws bounding boxes around detected items, and the result is returned as a structured JSON response.
Image upload, detection and localisation, JSON assembly — one round-trip from raw pixels to structured labels.

What does the label detection API return?

A request returns one JSON document with a top-level key per requested service. For detect-labels the value contains a labels array sorted from highest to lowest confidence. Each entry carries the label name, a numeric confidence between 0 and 1, an optional parents array describing where the label sits in the taxonomy, and — when the label corresponds to a localised object — a normalised boundingBox expressed as fractions of the original image dimensions.

{
  "detect-labels": {
    "labels": [
      {
        "name": "Taxi",
        "confidence": 0.97,
        "parents": ["Vehicle", "Transportation"],
        "boundingBox": { "x": 0.12, "y": 0.42, "width": 0.28, "height": 0.32 }
      },
      {
        "name": "Bicycle",
        "confidence": 0.94,
        "parents": ["Vehicle", "Transportation"],
        "boundingBox": { "x": 0.41, "y": 0.38, "width": 0.14, "height": 0.40 }
      },
      {
        "name": "Dog",
        "confidence": 0.92,
        "parents": ["Animal", "Mammal"],
        "boundingBox": { "x": 0.66, "y": 0.66, "width": 0.07, "height": 0.12 }
      },
      {
        "name": "Tree",
        "confidence": 0.90,
        "parents": ["Plant", "Nature"]
      },
      {
        "name": "Outdoor",
        "confidence": 0.88,
        "parents": ["Scene"]
      },
      {
        "name": "City Street",
        "confidence": 0.86,
        "parents": ["Scene", "Urban"]
      },
      {
        "name": "Cycling",
        "confidence": 0.81,
        "parents": ["Activity", "Transportation"]
      }
    ]
  },
  "meta": {
    "processingTimeMs": 312,
    "requestId": "req_8a2c5f01"
  }
}

Bounding boxes are fractional (0–1) so they survive any client-side resize. The parents field is the hook for hierarchical search and faceted navigation — you can roll up "Taxi" into "Vehicle" into "Transportation" without storing the relationship yourself. The full field catalogue, error schema, and combined multi-service response shape are documented in the API documentation.

What kinds of labels can the API detect?

The taxonomy covers the categories most product teams reach for when building image search, auto-tagging, and content discovery. The table below lists 20 representative labels grouped by theme. It is a sample, not the complete vocabulary — the live taxonomy is broader and grows over time as new concepts are added.

ThemeExample labelWhere developers use it
ObjectsCoffee MugCafe and kitchen photography, lifestyle blogs, e-commerce stock
ObjectsLaptopRemote-work shots, product reviews, office stock libraries
PeoplePersonGroup photos, portraits, crowd estimation in event coverage
PeopleChildFamily content, education imagery, age-aware moderation routing
VehiclesCarClassifieds, insurance claims, traffic and parking analytics
VehiclesBicycleMobility apps, sport content, urban-scene tagging
AnimalsDogPet marketplaces, social feeds, animal welfare datasets
AnimalsCatPet content, adoption sites, stock imagery libraries
FoodPizzaDelivery apps, restaurant menus, recipe sites and UGC food feeds
FoodCoffeeHospitality content, cafe directories, breakfast photography
ScenesBeachTravel sites, holiday-rental listings, lifestyle stock photography
ScenesCity StreetUrban analytics, location-aware content, street photography
ActivitiesCyclingSport and fitness apps, route content, event coverage
ActivitiesCookingRecipe platforms, home content, instructional video thumbnails
NatureMountainTravel and adventure content, weather imagery, outdoor gear sites
NatureForestTravel listings, environmental content, hiking and trail apps
IndoorLiving RoomReal-estate listings, furniture e-commerce, interior design feeds
IndoorKitchenReal-estate photos, recipe content, appliance product imagery
OutdoorGardenReal-estate listings, gardening content, landscaping portfolios
ConceptsSunsetStock libraries, travel content, mood-based image search and filters
Label taxonomy tree branching from image content into six top-level themes — people, vehicles, animals, food, indoor, outdoor — beside a horizontal bar chart suggesting ranked confidence scores.
Labels are hierarchical. A specific label rolls up into a parent theme so you can build faceted navigation without storing the taxonomy yourself.

What do the confidence scores actually mean?

Each label carries a confidence score between 0 and 1. It is the classifier's posterior probability that the label applies to the image — not a statistical guarantee, not a ground-truth score, and not a calibrated p-value across every possible image. Treat it as a ranking signal first and a decision input second. A label at 0.97 is the model telling you it is very sure; a label at 0.55 is the model telling you it is plausible but contested.

The pragmatic pattern is to define three bands rather than a single global threshold. Keep labels above 0.7 for display and search indexing. Treat labels between 0.5 and 0.7 as soft signals — useful for personalisation, related-content recommendations, or low-stakes filters, but not as primary tags. Drop anything below 0.5 for end-user display unless you specifically need a long tail of fallback hints. Persist the request ID so you can re-evaluate the original response when your thresholds change.

How do you call the label detection API?

Authentication is a bearer token in the Authorization header. Images upload as multipart/form-data; JPEG, PNG, and WebP are accepted up to 10 MB per request. Pass the service name in the services field, or a comma-separated list if you want labels alongside moderation, OCR, or other services in the same call.

curl

curl -X POST https://api.pixicular.com/detect \
  -H "Authorization: Bearer $PIXICULAR_API_KEY" \
  -F "image=@./street-scene.jpg" \
  -F "services=detect-labels"

TypeScript — typed auto-tagging with confidence thresholding

// Auto-tag an uploaded image with the Pixicular label detection API.
interface DetectedLabel {
  name: string;
  confidence: number;
  parents?: string[];
  boundingBox?: { x: number; y: number; width: number; height: number };
}

interface LabelResponse {
  "detect-labels": { labels: DetectedLabel[] };
  meta: { processingTimeMs: number; requestId: string };
}

async function tagImage(file: Blob): Promise<string[]> {
  const body = new FormData();
  body.append("image", file);
  body.append("services", "detect-labels");

  const res = await fetch("https://api.pixicular.com/detect", {
    method: "POST",
    headers: { Authorization: `Bearer ${process.env.PIXICULAR_API_KEY}` },
    body,
  });

  if (!res.ok) throw new Error(`Label detection failed: ${res.status}`);
  const data = (await res.json()) as LabelResponse;

  // Keep high-confidence labels for display; surface borderline ones for review.
  return data["detect-labels"].labels
    .filter((l) => l.confidence >= 0.7)
    .map((l) => l.name);
}

Combine detect-labels with other services for richer workflows — for example labels plus moderation on seller-uploaded photos, as covered in the e-commerce product moderation guide, or labels plus OCR on insurance claim photos in the insurance claim photo analysis guide. The image is decoded once and routed to each service in parallel.

Where the API fits in a developer stack

Three audiences reach for label detection most often. Teams building image search use labels as the index — every uploaded photo becomes a bag of weighted tags that a search query can match against directly, without any human curation step. Teams building auto-tagging on user-generated content use labels as a starter set of tags that the uploader can confirm or refine, cutting the cold-start cost of empty metadata. Teams building content discovery — for example a feed that surfaces recipe-like images to users who engaged with cooking content — use labels as the feature vector for downstream similarity and recommendation.

The service composes naturally with the other capabilities of Pixicular's image analysis API. Pair labels with content moderation flags for safety-aware tagging, with OCR text blocks for caption-and-keyword indexing, or with face emotion scores for mood-tagged discovery — all in the same request, with one billing line. See the pricing page for per-call and volume tiers.

Limits, edge cases, and what is not supported

The label model is general-purpose, which means it is wide but not deep. It will tell you that an image contains a dog with high confidence; it will not reliably tell you the exact breed, the dog's name, or that this is your dog specifically. It will recognise a car; it will not return the make, the model year, or the licence plate. It will detect a shop sign; it will not extract the text on the sign — that is the detect-text service's job, and the two compose in a single call.

A few categories are deliberately out of scope. Named individuals are not returned — the label model does not do celebrity recognition. Specific brand logos and exact product SKUs are not returned — for product catalogues, use labels alongside your own internal product database keyed by uploader context. Legally regulated categories such as medical diagnoses, named landmarks at street level, or fine-grained species identification on edge cases are also out of scope. For NSFW and explicit-content signals, request the detect-moderation service in the same call rather than trying to infer them from labels.

Accuracy degrades gracefully on harder inputs rather than failing silently. Extreme close-ups, heavy occlusion, motion blur, very low resolution, abstract or artistic compositions, and rare or visually ambiguous concepts all lower confidence scores. That is what the confidence field is for: design around the noise by thresholding rather than expecting the top label to always be correct.

Frequently asked questions

How do you detect objects in an image with an API?

POST the image as multipart/form-data to the Pixicular API and request the detect-labels service. The API decodes the image once, runs a multi-label classifier and an object detector across it, and returns a JSON document containing a ranked array of labels. Each label carries a name, a confidence score between 0 and 1, an optional parent category for taxonomy navigation, and — when the label corresponds to a localised object — a normalised bounding box. One request, one JSON response, no SDK required.

What does the label detection API return?

The response contains a labels array sorted from highest to lowest confidence. Every label has a name (for example coffee mug, beach, cycling, indoor), a confidence score, and an optional parents array describing where the label sits in the taxonomy. Localised objects also include a normalised boundingBox so you can highlight the detected region in your UI. A top-level meta object carries processingTimeMs and a requestId for log correlation.

How accurate is image label detection?

Common objects, mainstream scenes, and everyday activities routinely score above 0.85 confidence on clear, well-lit photos at adequate resolution. Confidence drops on heavily occluded objects, motion blur, extreme close-ups, abstract or artistic compositions, and visually rare concepts. The recommended pattern is to threshold on confidence — keep labels above 0.7 for display, between 0.5 and 0.7 for soft signals, and below 0.5 only as fallback hints — rather than relying on a flat keep-or-discard rule.

What is not supported by the object detection API?

The label model is general-purpose, so it does not identify named individuals, specific brand logos, exact product SKUs, or legally regulated categories like medical diagnoses. It also does not perform fine-grained species or breed identification on edge cases, and it will not return location-specific labels such as named landmarks at street level. For document text use detect-text instead; for explicit-content flags use detect-moderation; for per-face age estimates use detect-age. Combine multiple services in a single call when you need more than labels.

Can I request label detection together with other image services?

Yes. Pass a comma-separated list to the services field — for example services=detect-labels,detect-moderation,detect-text — and the image is decoded once and routed to each requested service in parallel. The response contains one top-level key per requested service, so a multi-service request is meaningfully cheaper and lower latency than several round-trips. This is the recommended pattern for auto-tagging, content discovery, and moderation pipelines.

Add object and label detection to your pipeline

The fastest way to evaluate the detect-labels service is to POST a handful of representative images with a single curl command and look at the JSON. Pick a plan on the pricing page, follow the API documentation for authentication and rate limits, and combine this label guide with the e-commerce product moderation and insurance claim photo analysis pages for use-case-specific schema detail.