Automating Your IP Tracking: Biter GeoIp to MySQL Pipeline Network administrators, security analysts, and developers frequently need to convert raw IP addresses into actionable geographic data. Manually looking up locations slows down threat response and analytics. By automating this process, you can enrich your logs in real time. This guide shows you how to build an automated data pipeline that fetches IP intelligence from the Biter GeoIp API and stores it directly into a MySQL database. Why Automate IP Intelligence?
Storing geolocation data locally unlocks several operational advantages:
Faster Queries: Local MySQL joins eliminate external API latency during dashboard rendering.
Cost Efficiency: Caching results minimizes repetitive API calls for identical IP addresses.
Enhanced Security: Automated tracking helps detect anomalous login locations instantly.
Compliance Data: Visualizing user distribution simplifies regional compliance auditing. Pipeline Architecture Overview
The pipeline operates on a lightweight, three-stage scheduling model:
[ Target IP List ] ──> [ Python Extraction Script ] ──> [ Biter GeoIp API ] │ [ MySQL Database ] <── [ Structured Data Load ] <─────────────┘
Extraction: A script reads new or un-enriched IP addresses from your system logs.
Transformation: The script queries the Biter GeoIp API and parses the JSON response.
Loading: The structured geolocation data is written into a dedicated MySQL table. Step 1: Preparing the MySQL Target Table
First, set up a database table optimized to hold the geographic attributes returned by the API. Open your MySQL terminal and execute the following schema:
CREATE DATABASE IF NOT EXISTS ip_intelligence; USE ip_intelligence; CREATE TABLE IF NOT EXISTS geo_tracks ( ip_address VARCHAR(45) PRIMARY KEY, country_code CHAR(2), country_name VARCHAR(100), region_name VARCHAR(100), city VARCHAR(100), zip_code VARCHAR(20), latitude DECIMAL(10, 8), longitude DECIMAL(11, 8), isp VARCHAR(150), last_updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP ); Use code with caution. Step 2: Writing the Python Pipeline Script
Python serves as the glue for this pipeline. You will need the requests library for API communication and mysql-connector-python for database interaction. Install the prerequisites: pip install requests mysql-connector-python Use code with caution.
Create a script named geoip_pipeline.py with the following code:
import requests import mysql-connector from mysql-connector import Error # Configuration API_KEY = “YOUR_BITER_GEOIP_API_KEY” API_URL = “https://biter.io” # Update to match the exact Biter API endpoint DB_CONFIG = { ‘host’: ‘localhost’, ‘database’: ‘ip_intelligence’, ‘user’: ‘your_db_user’, ‘password’: ‘your_db_password’ } def fetch_geo_data(ip): “”“Queries the Biter GeoIp API for a given IP address.”“” try: response = requests.get(f”{API_URL}{ip}“, params={“key”: API_KEY}, timeout=5) if response.status_code == 200: return response.json() print(f”API Error for IP {ip}: Status Code {response.status_code}“) return None except requests.RequestException as e: print(f”Network error fetching IP {ip}: {e}“) return None def save_to_mysql(data): “”“Inserts or updates the geolocation data in the MySQL table.”“” if not data: return query = “”” INSERT INTO geo_tracks (ip_address, country_code, country_name, region_name, city, zip_code, latitude, longitude, isp) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s) ON DUPLICATE KEY UPDATE country_code=VALUES(country_code), country_name=VALUES(country_name), region_name=VALUES(region_name), city=VALUES(city), zip_code=VALUES(zip_code), latitude=VALUES(latitude), longitude=VALUES(longitude), isp=VALUES(isp); “”” # Safely extract fields, providing defaults if keys are missing record = ( data.get(‘ip’), data.get(‘country_code’), data.get(‘country_name’), data.get(‘region’), data.get(‘city’), data.get(‘zip’), data.get(‘latitude’), data.get(‘longitude’), data.get(‘isp’) ) connection = None try: connection = mysql-connector.connect(DB_CONFIG) if connection.is_connected(): cursor = connection.cursor() cursor.execute(query, record) connection.commit() print(f”Successfully tracked IP: {data.get(‘ip’)}“) except Error as e: print(f”MySQL Error: {e}“) finally: if connection and connection.is_connected(): cursor.close() connection.close() if name == “main”: # Example test batch of IPs to process sample_ips = [“8.8.8.8”, “1.1.1.1”] for target_ip in sample_ips: geo_payload = fetch_geo_data(target_ip) if geo_payload: save_to_mysql(geo_payload) Use code with caution. Step 3: Automating the Execution
To turn this script into a true automated pipeline, schedule it to process your raw log sources continuously. On Linux (Cron Job) Open the crontab configuration: crontab -e Use code with caution. Add a line to run the pipeline every hour:
0 * * * * /usr/bin/python3 /path/to/geoip_pipeline.py >> /var/log/geoip_pipeline.log 2>&1 Use code with caution. On Windows (Task Scheduler) Open Task Scheduler and select Create Basic Task. Set the trigger to Daily or Hourly. Choose the action Start a Program.
Type python in the Program/script box, and add the full path to geoip_pipeline.py in the Arguments field. Optimizing for Production
If you plan to scale this pipeline to handle millions of records daily, implement these performance optimizations:
Batch Processing: Modify the script to read pending IPs from a staging table, fetch data using batch API endpoints if available, and use MySQL’s executemany() to insert multiple rows at once.
Error Queueing: If the API limits your rate or goes down temporarily, log failed IPs to a retry queue rather than dropping the data.
Index Optimization: Ensure your ip_address column remains the primary key to prevent duplicate lookups and keep query speeds sub-millisecond.
To tailor this pipeline to your specific infrastructure, let me know:
What operating system and programming language do you prefer for your automation tools?
What is the volume of IP addresses you need to process daily?
How are your raw IPs currently stored (e.g., Nginx log files, an existing database table, an application API)?
I can provide optimized code blocks or deployment scripts based on your setup.
Leave a Reply