IntXeger¶

IntXeger (pronounced “integer”) is a Python library for generating strings from regular expressions. Some of its core features include:

Support for most common regular expression operations.
Array-like indexing for mapping integers to matching strings.
Generator interface for sequentially sampling matching strings.
Sampling-without-replacement for generating a set of unique strings.

Compared to popular alternatives such as xeger and exrex, IntXeger is an order of magnitude faster at generating strings and offers unique functionality such as array-like indexing and sampling-without-replacement.

Installation¶

You can install the latest stable release of IntXeger by running:

pip install intxeger

Quick Start¶

Let’s start with a simple example where our regex specifies a two-character string that only contains lowercase letters.

import intxeger
x = intxeger.build("[a-z]{2}")

You can check the number of strings that can be generated from this regex using the length attribute and generate the ith matching string using the get(i) method.

assert x.length == 26**2 # there are 676 unique strings which match this regex
assert x.get(15) == 'ap' # the 15th unique string is 'ap'

Furthermore, you can generate N unique strings which match this regex using the sample(N) method. Note that N must be less than or equal to the length.

print(x.sample(N=10))
# ['xt', 'rd', 'jm', 'pj', 'jy', 'sp', 'cm', 'ag', 'cb', 'yt']

Here’s a more complicated regex which specifies a timestamp.

x = intxeger.build(r"(1[0-2]|0[1-9])(:[0-5]\d){2} (A|P)M")
print(x.sample(N=2))
# ['11:57:12 AM', '01:16:01 AM']

You can also print matches on the command line.

$ intxeger --order=desc "[a-c]"
c
b
a
$ python3 -m intxeger -0 'base/[ab]/[12]' | xargs -0 mkdir -p
$ tree base/
base
├── a
│   ├── 1
│   └── 2
└── b
    ├── 1
    └── 2

To learn more about the functionality provided by IntXeger, check out our documentation!

Benchmark¶

This table, generated by benchmark.py, shows the amount of time in milliseconds required to generate N examples of each regular expression using xeger and intxeger.

regex	N	xeger	exrex	intxeger
[a-zA-Z]+	100	7.36	3.17	1.09
[0-9]{3}-[0-9]{3}-[0-9]{4}	100	11.59	6.25	0.8
[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4}	1000	208.62	91.3	18.28
/json/([0-9]{4})/([a-z]{4})	1000	133.36	107.01	12.18

Have a regular expression that isn’t represented here? Check out our Contributing Guide and submit a pull request!

IntXeger¶

Installation¶

Quick Start¶

Benchmark¶

Indices and tables¶