Skip to main content

Whitelisting Nosto's Crawler Bot for Product Image Crawling

How can i whitelist the Nosto crawler for my closed dev environment?

Written by Dan Macarie

Overview

Nosto's image crawler feature that fetches product images directly from your store as an alternative to the product API. If you're experiencing sync issues with the product API, you can enable the crawler by contacting Nosto support.

For the crawler to work, it must be able to access your product images and pages. If your store or CDN blocks unknown bots, you may need to whitelist Nosto's crawler. This article explains how to unblock Nosto's IP addresses and User Agent so the crawler can function properly.

Whitelisting

Depending on what is being blocked, you may need to whitelist one or both of the following:

IP Addresses — for product image access

If Nosto's crawler is blocked from fetching **product images**, whitelist the following IPs at whatever layer sits in front of your image URLs (CDN, load balancer, web server — see below).

Always fetch the authoritative list from:

https://api.nosto.com/meta → "crawler" property

User Agent — for product page access

If you need to allow Nosto's crawler to access product pages directly (similar to how you'd whitelist a search engine spider like Googlebot), use the following User Agent string to identify it:

Mozilla/5.0 (compatible; NostoCrawlerBot/1.0; +http://my.nosto.com/tagging)

Where to whitelist (IP addresses)

The request from Nosto's crawler travels through your network stack from the outside in. You need to whitelist at whichever layer is actually enforcing the block — typically the **first** one that applies access controls:

Internet → CDN / DDoS protection (Cloudflare, Fastly, Akamai)

→ Load Balancer (AWS ALB, nginx upstream)

→ Web Server (nginx, Apache)

→ Application

If you're unsure where the block is happening, start at the outermost layer (usually your CDN or DDoS protection) and work inward.

Whitelisting by platform

Cloudflare

1. Go to Security → WAF → Custom Rules and create a new rule.

2. Match on the Nosto crawler IPs:

ip.src in {18.209.181.40 34.233.200.247 34.238.228.86}

3. Set the action to Allow / Skip all remaining custom rules.

4. Place this rule above** any bot-blocking or rate-limiting rules.

> If you have **Bot Fight Mode** or **Super Bot Fight Mode** enabled, the WAF custom rule above is still required — NostoCrawlerBot is not in Cloudflare's verified bot list and will be challenged without it.


If you need to allow Nosto to access product pages instead, create a separate rule matching the User Agent:

http.user_agent contains "NostoCrawlerBot"

nginx

Use `satisfy any` to let the request through if *either* the IP matches *or* valid credentials are provided:


```nginx

location ~* \.(jpg|jpeg|png|gif|webp|svg)$ {

satisfy any;


allow 18.209.181.40;

allow 34.233.200.247;

allow 34.238.228.86;

deny all;


auth_basic "Restricted";

auth_basic_user_file /etc/nginx/.htpasswd;

}

```

Reload after changes: sudo nginx -t && sudo systemctl reload nginx

Apache

```apache

<FilesMatch "\.(jpg|jpeg|png|gif|webp|svg)$">

<RequireAny>

Require ip 18.209.181.40

Require ip 34.233.200.247

Require ip 34.238.228.86

Require valid-user

</RequireAny>

</FilesMatch>

```

AWS WAF (ALB or CloudFront)

1. Go to AWS WAF → IP sets and create a set named NostoCrawlerIPs (IPv4):

```

18.209.181.40/32

34.233.200.247/32

34.238.228.86/32

```

2. In your Web ACL, add an IP set match rule with action Allow and set its priority above any blocking rules.

Fastly

sub vcl_recv {
if (req.http.X-Forwarded-For ~ "18\.209\.181\.40|34\.233\.200\.247|34\.238\.228\.86") {
return(pass);
}
}

HTTP Basic Authentication

If your store or staging environment is protected with HTTP Basic Authentication, Nosto's crawler supports this natively. Contact Nosto Support and provide your credentials — they will configure them directly on the crawler.

Did this answer your question?