168 lines
5.3 KiB
Markdown
168 lines
5.3 KiB
Markdown
# Sitemap and Robots.txt Setup
|
|
|
|
## Overview
|
|
|
|
A sitemap.xml and robots.txt have been generated for the Shopify AI App Builder to help with SEO indexing in Google Search Console and other search engines.
|
|
|
|
## Files Created
|
|
|
|
### 1. sitemap.xml
|
|
Located at: `chat/public/sitemap.xml`
|
|
|
|
The sitemap includes only publicly accessible pages that search engines should index:
|
|
|
|
- **/ (Home page)** - Priority 1.0, daily updates
|
|
- **/features** - Priority 0.9, weekly updates
|
|
- **/pricing** - Priority 0.9, weekly updates
|
|
- **/affiliate** - Priority 0.8, monthly updates
|
|
- **/affiliate-signup** - Priority 0.7, monthly updates
|
|
- **/docs** - Priority 0.8, weekly updates
|
|
- **/terms** - Priority 0.3, yearly updates
|
|
- **/privacy** - Priority 0.3, yearly updates
|
|
|
|
### 2. robots.txt
|
|
Located at: `chat/public/robots.txt`
|
|
|
|
The robots.txt file:
|
|
- Allows all search engines to crawl public content
|
|
- Disallows crawling of authenticated and admin areas:
|
|
- `/admin` - Admin dashboard
|
|
- `/apps` - User dashboard (requires auth)
|
|
- `/builder` - App builder (requires auth)
|
|
- `/settings` - User settings (requires auth)
|
|
- `/affiliate-dashboard` - Affiliate dashboard (requires auth)
|
|
- `/api/` - API endpoints
|
|
- Includes a reference to the sitemap location
|
|
|
|
### 3. Server Routes Updated
|
|
Modified `chat/server.js` to serve both files with proper caching headers:
|
|
- **Content-Type**: Correct MIME types (application/xml for sitemap, text/plain for robots.txt)
|
|
- **Cache-Control**: 24-hour cache (86400 seconds) to reduce server load
|
|
|
|
## Configuration Required
|
|
|
|
Before deploying, you need to update the domain name in both files:
|
|
|
|
### Step 1: Update sitemap.xml
|
|
Replace `https://your-domain.com` with your actual domain name in all URL entries.
|
|
|
|
Example:
|
|
```xml
|
|
<!-- Before -->
|
|
<loc>https://your-domain.com/</loc>
|
|
|
|
<!-- After -->
|
|
<loc>https://shopify-app-builder.example.com/</loc>
|
|
```
|
|
|
|
### Step 2: Update robots.txt
|
|
Replace the sitemap URL reference with your actual domain.
|
|
|
|
Example:
|
|
```txt
|
|
# Before
|
|
Sitemap: https://your-domain.com/sitemap.xml
|
|
|
|
# After
|
|
Sitemap: https://shopify-app-builder.example.com/sitemap.xml
|
|
```
|
|
|
|
### Step 3: Environment Variable (Optional)
|
|
The server uses the `PUBLIC_BASE_URL` environment variable to determine the base URL. If set, the domain in the sitemap should match this value.
|
|
|
|
```bash
|
|
export PUBLIC_BASE_URL=https://shopify-app-builder.example.com
|
|
```
|
|
|
|
## Submitting to Google Search Console
|
|
|
|
1. Go to [Google Search Console](https://search.google.com/search-console)
|
|
2. Add your property (your domain)
|
|
3. Verify ownership
|
|
4. Navigate to "Sitemaps" in the left sidebar
|
|
5. Enter `sitemap.xml` in the "Add a new sitemap" field
|
|
6. Click "Submit"
|
|
|
|
## Why These Pages Are Excluded
|
|
|
|
The following pages are intentionally excluded from the sitemap:
|
|
|
|
### Authenticated Pages (require login)
|
|
- `/apps` - User's personal app dashboard
|
|
- `/builder` - Individual app building interface
|
|
- `/settings` - User account settings
|
|
- `/affiliate-dashboard` - Affiliate dashboard
|
|
|
|
### Admin Pages
|
|
- `/admin` - Admin dashboard and management pages
|
|
- `/admin/accounts` - User account management
|
|
- `/admin/login` - Admin login page
|
|
|
|
### Functional/Technical Pages
|
|
- `/login`, `/signup` - Authentication entry points
|
|
- `/verify-email` - Email verification flow
|
|
- `/reset-password` - Password reset flow
|
|
- `/api/*` - API endpoints (not meant for indexing)
|
|
- `/uploads/*` - User-uploaded files
|
|
|
|
These pages are either:
|
|
1. Behind authentication (search engines can't access them)
|
|
2. Functional pages that don't provide value to search users
|
|
3. Administrative interfaces
|
|
4. Temporary/stateful pages (like verification flows)
|
|
|
|
## Maintaining the Sitemap
|
|
|
|
### When to Update
|
|
- When you add new public pages (e.g., blog posts, landing pages)
|
|
- When you change the URL structure
|
|
- When you make significant content updates
|
|
|
|
### Updating Lastmod Dates
|
|
The current `lastmod` date is set to 2025-01-08. Update this when making significant changes to pages.
|
|
|
|
### Priority Guidelines
|
|
- **1.0**: Homepage and most important pages
|
|
- **0.9**: Key marketing pages (features, pricing)
|
|
- **0.8**: Secondary marketing pages (affiliate, docs)
|
|
- **0.7**: Tertiary pages (affiliate-signup)
|
|
- **0.3**: Legal pages (terms, privacy)
|
|
|
|
### Change Frequency Guidelines
|
|
- **daily**: Homepage (content changes frequently)
|
|
- **weekly**: Features, pricing, docs (regular updates)
|
|
- **monthly**: Affiliate pages (occasional updates)
|
|
- **yearly**: Legal pages (rare changes)
|
|
|
|
## Testing
|
|
|
|
To verify the sitemap is working correctly:
|
|
|
|
```bash
|
|
# Test sitemap endpoint
|
|
curl https://your-domain.com/sitemap.xml
|
|
|
|
# Test robots.txt endpoint
|
|
curl https://your-domain.com/robots.txt
|
|
|
|
# Validate sitemap XML structure
|
|
curl https://your-domain.com/sitemap.xml | xmllint --format -
|
|
```
|
|
|
|
## Security Notes
|
|
|
|
- The sitemap and robots.txt are publicly accessible by design
|
|
- No sensitive information is exposed in these files
|
|
- Caching headers help reduce server load
|
|
- The robots.txt properly blocks sensitive areas from indexing
|
|
|
|
## Additional SEO Recommendations
|
|
|
|
1. **Add meta tags** to each page for SEO
|
|
2. **Create structured data** (JSON-LD) for rich snippets
|
|
3. **Optimize page titles and meta descriptions**
|
|
4. **Create a blog** and add blog posts to the sitemap
|
|
5. **Generate sitemap indexes** if you have multiple sitemaps (e.g., separate sitemaps for different content types)
|
|
6. **Use canonical URLs** to prevent duplicate content issues
|
|
7. **Implement Open Graph tags** for better social media sharing
|