Home Job Listings Categories Locations

Data Engineer

📍 Indonesia

Software Development Aetosky

Job Description

Company Description

Aetosky develops secure software platforms designed for defense and dual-use institutions to harness geospatial data for critical decision-making. By providing interoperable tools tailored to mission-critical environments, Aetosky supports operations such as battlefield intelligence, infrastructure protection, disaster response, and climate security. Focused on real-time operations and strategic foresight, our technologies empower partners to act with precision, speed, and confidence in sensitive, air-gapped environments. We collaborate with government and enterprise customers to advance geospatial intelligence capabilities in modern defense and multi-domain operations.

About the role

The Data & NLP/AI Engineer owns the full data journey within Aetosky's Multi-INT Fusion Platform -from scraping raw open-source content off the internet, through statistical filtering and semantic analysis, to orchestrating LLM-powered deep intelligence processing. This is a combined Data Engineering and NLP/AI Engineering role with end-to-end ownership: you build the ingestion infrastructure, deploy the vector database, implement anomaly detection and clustering algorithms, and design the prompt orchestration layer for agentic AI analysis. AI-assisted development (GitHub Copilot, Cursor, Claude Code, or equivalent) is the standard workflow - not optional - and will be directly assessed during the hiring process.

Responsibilities

Data Infrastructure Responsibilities

•⁠ ⁠Design and build automated data collection pipelines (web scrapers, API integrations) for target platforms including X, Facebook, local forums, Instagram, TikTok, and Reddit. •⁠ ⁠Deploy and manage the vector database (PostgreSQL with pgvector extension) with indexing optimized for semantic similarity search at scale. •⁠ ⁠Implement pipeline monitoring and alerting: heartbeat checks, record-count validation, dead-letter queues, and golden-record unit tests to prevent silent data loss. •⁠ ⁠Manage infrastructure scaling during surge events (sudden data volume spikes during geopolitical crises). •⁠ ⁠Complete secure enclave provider assessment based on target client security requirements.

NLP / AI Engineering Responsibilities

•⁠ ⁠Implement the first-stage statistical filter using TF-IDF with configurable anomaly thresholds against 30-day rolling baselines. •⁠ ⁠Build semantic clustering using lightweight vector embedding models, grouping near-duplicate content into representative cluster centroids for efficient analyst review. •⁠ ⁠Implement bot-detection tripwires: velocity anomaly detection (timing-based coordinated inauthentic behavior) and lexical duplication detection (copy-paste spam arrays). •⁠ ⁠Design and manage the prompt orchestration layer for the second-stage LLM processor: intent extraction, relationship mapping, and structured output generation within a secure cloud enclave. •⁠ ⁠Implement cost-cap logic with graceful degradation: dynamic threshold escalation at budget warning levels, automated pause at cap, and manual triage fallback.

Collaboration & Tuning Responsibilities

•⁠ ⁠Collaborate with the Full-Stack Software Developer on data contracts, API schemas, and query optimization for frontend consumption. •⁠ ⁠Lead the daily filter tuning cycle during the post-launch stabilization period (first 30–60 days): analyze false positive rates, processing costs, and output quality metrics. •⁠ ⁠Document pipeline architecture, filter logic, and prompt templates to enable future team onboarding and sovereign AI transition.

Classifications / Qualifications

Required

•⁠ ⁠3+ years of combined experience spanning data engineering and applied NLP/machine learning. •⁠ ⁠Demonstrated daily proficiency with AI-assisted development tools (GitHub Copilot, Cursor, Claude Code, or equivalent) - this will be assessed in the technical evaluation. •⁠ ⁠Strong Python and SQL skills with hands-on experience in PostgreSQL (pgvector a plus), Elasticsearch, or similar. •⁠ ⁠Experience building web scrapers that handle anti-bot protections, rate limiting, proxy rotation, and DOM structure changes. •⁠ ⁠Hands-on experience with text embedding models (sentence-transformers, OpenAI embeddings, or equivalent), vector similarity search, and clustering algorithms. •⁠ ⁠Demonstrated LLM prompt engineering: designing prompts, managing context windows, evaluating output quality, and controlling inference costs. •⁠ ⁠Familiarity with monitoring and observability tools (Prometheus, Grafana, Datadog, or equivalent).

Preferred

•⁠ ⁠Experience with multilingual NLP. •⁠ ⁠Experience with real-time data streaming technologies (Kafka, Redis Streams, or similar). •⁠ ⁠Background in influence operation detection, disinformation analysis, or social media intelligence. •⁠ ⁠Demonstrated LLM cost optimization techniques (batching, caching, token management). •⁠ ⁠Familiarity with government cloud environments (FedRAMP, ISO 27001, or equivalent regional certifications).

Ready to Apply?

Don't miss this opportunity! Apply now and join our team.

Job Details

Posted Date: March 13, 2026
Job Type: Software Development
Location: Indonesia
Company: Aetosky

Ready to Apply?

Don't miss this opportunity! Apply now and join our team.