OpenVPN auth-user-pass-verify example

One of my recent task to enable authentication over OpenVPN. auth-user-pass-verify is one of the way (is it the only way) to enable authentication OpenVPN. When user connects to vpn, the server will write the username and password to a temporary file then execute the script with the file path as argument, the exit code will be used to determine whether authentication is success or not. It is a weird protocol but it is how it works …

To enable authentication, you will need to change your openvpn config to:

Make sure your script is executable. Below is an example bash script is as below

In the example,¬†i simply check if the password is “bao” ūüėÄ . You should replace with your own authentication. Also note that for security reason openvpn has some constraints¬†on the username and password, check here for more detail.

Tensorflow: exporting model for serving

Few days ago, I wrote about how to retrieving signature of an exported model from Tensorflow,  today i want to continue with how to export a model for serving. Particularly, exporting a model and serve it with TFServing. TFServing is a high performance tensorflow serving service written in C++. I am working on building a serving infrastructure, so i have to spend a lot of time on exporting tensorflow model and make it servable via TFServing.

The requirement for an exported model to be servable by TFServing is quite simple: you need to define inputs and outputs named signatures. The inputs signature will define the shape of the input tensor of the graph, and the outputs signature will define the output tensor of the prediction.

Exporting from a tensorflow graph
This is straight forward. If you build the graph yourself, you will have the inputs and outputs tensor. You will just need to create a Saver and an Exporter, then call with the right arguments.

Please see here for a complete example.

Exporting from a tf.contrib.learn Estimator
This is actually more tricky. Even the estimator provides export() API but the documentation is not helpful, and by default it won’t export a named signature so you can not use it directly. Instead, you will need to:

  • Define a input_fn to return the shape of the input. You can reuse your input_fn for data feeding if you have already did that during training.
  • Define a signature_fn as below
  • Make sure you pass input_feature_key and use_deprecated_input_fn=False when you call the export function.

Below is an example of exporting the classifier from this tutorial. Note: this is only for tensorflow 0.11. For 0.12 and 1.0 the api may be different.

Some explanation: in the input_fn you defined the features of your estimator, it will return a dict of tensors to represents your data. Usually this will return a tuple of features tensors and labels tensor, but for¬†exporting you can skip the label tensor. You can refer to here for detail documentation. The above input_fn returns a feature tensor with feature name is empty string (“”). That’s why we also need to add¬†input_feature_key="" to the export function.

Once the model is exported, you can just ship it to TF Serving and start serving it. I will continue with this series next few days on how to run the serving service and sending requests into it.

Tensorflow: Retrieving serving signatures from an exported model

Simple question, but it took me many hours of digging into the code to figure out :-(. Someone should have added a document or something.

Basically, just import the meta graph, then unpack the protobuf object from¬†serving_signatures collection. I really don’t understand why it is not added to¬†signature def. Anyway, later you can just call¬†read_serving_signature(path/to/export.meta) to retrieve the exported signatures. It will be very helpful if you want to implement a generic serving interface for tensorflow.

I also made a gist here for reference.

 

CouchDB – my first try

I recently have an interesting task about data ingestion. The use case is we have hundreds GB of CSV files, each file contains 10k – 10m records with around 20 – 40 fields, with some common schema. And the goal is to ingest all the files into a single data storage so we can query the data in these files efficiently.

My initial thinking is to use MongoDB because there is no well defined schema, and also my experience with MongoDB. But this time I want to give CouchDB a try this time. Why did i choose CouchDB? Mainly just because i have heard and read about it for a while, but i have never tried for some real use case. To be clear, i want to try CouchDB so I have some hand-on experience with it so that I can use it in the future.

The goal is quite simple:  import all csv files into CouchDB and then make some simple filter queries. I first use Docker to start a single CouchDB using an Docker image on Dockerhub. First impression is the api interface is quite simple, all you need to make a POST call. This is a good document for starter. There is also a nice couchdb client with very straight forward client api. It also has a very nice built in UI. However, I quickly stumped into issues.

The first issue¬†is insertion speed is quite slow. I was doing insert row-by-row. I searched for bulk insert, there is a REST API¬†but I can’t find the equivalent python implementation (hmmm). I did some google and find mpcouch python package, which utilize multiple threads for insertion and it did improve the insertion speed a lot. However, after one hour only 9m records were inserted.

The second issue is disk usage. I was quite surprised that CouchDB consumed a lot of disk space. Just after the first 10 thousands, it was already used 7MB, while the actual CSV is only 440KB.

After 3 hours, I managed to ingest 25m records, but it consumed 15.2GB of disk. The actual sample is 18GB of CSV files, and there are 280m of records. Proportionally, dumping all the csv files to Couchdb it could take more than 100GB of storage. I also did some more research, and disk usage is really an issue with CouchDB. So if you plan to use CouchDB for production, make sure you also plan for plenty of storage.

And CouchDB views are not straight forward to query and aggregate the data. You have to write a javascript function to create view. And when you create view, it will actually consume a lot of disk as well. Another thing is when I search for CouchDB, there are not a lot of recent articles, and there are some articles but they are quite old. This gives me a feeling that CouchDB community is not so active.

So in summary, for this particular case, CouchDB is not the answer. Slow ingestion and disk consumption is my main concern. There are surely many other use cases where you can use CouchDB. But going through this exercise helps me to understand more about CouchDB and how it works. Hopefully i will have a chance to come back with it in future.

From NPM to Yarn

Recently Facebook introduced a new package management for nodejs called Yarn. I was hesitate to try out from the beginning but now i am completely switch to Yarn for nodejs package management. You can learn everything about Yarn on the website, but here are my biggest benefits from switching to Yarn

  • Faster: It is really faster. Yarn package resolution is fast, and it caches package locally, so next time you run yarn again it will use cache from your local disk. The packages are also download in parallel too. In my case, I see a 2x speed up!
  • Frozen dependencies by default: For npm, you need to run shrinkwrap everytime you add a new package. But for yarn, it’s default. This means no dependencies mis-match between me and my teammate anymore. Plus, the yarn.lock file is easier to diff when there is a conflict.
  • Nicer output:¬†Yarn console output is much cleaner ūüėÄ

There are also other benefits, but only these 3 are enough for me to drop npm!